Cell Painting: Establishing the platform




I am working in the laboratory of Thomas B. Poulsen at Department of Chemistry, Aarhus University, Denmark

We are currently working on establishing a Cell Painting platform. In this regard we have some questions for the image analysis and if we can utilize previous data. I will split our questions into three topics here on the forum to make it more manageable.

We would like to use the Cell Paining assay to profile compounds we synthesize ourselves and compare them against profiles of compounds with a known mode-of-action. Since we already have data available from previous work (eg. Bray et al 2017, GigaScience), we hope that we do not have to generate the images with known compounds ourselves but that we can simply compare to the available profiles.

Do you foresee any issues with using the available profiles from eg. the 2017 Bray et al paper in GigaScience? We are not sure whether the different microscope optics and camera will influence the data so that the profiles are not comparable. We have acquired a CellDiscoverer 7 automated microscope from Zeiss to facilitate the image acquisition.

We are planning to profile some of the compounds already present in the previous data as a control. We would then expect these compounds to cluster closely to themselves. We expect this would work as assay validation. Could there be other or better ways to validate our approach?

With Kind Regards
Esben B. Svenningsen, Ph.D. Student
Aarhus University, Denmark
Department of Chemistry, Thomas B. Poulsen

Bray, M. A. et al, Gigascience, 2017, 6, 1–5.
Bray, M. A. et al, Nat. Protoc, 2016, 11, 1757–1774.

Cell Painting: Establishing the platform (3)
Cell Painting: Establishing the platform (2)

Hi Esben,

(Moving to a single thread for the sake of my sanity)

We definitely observe batch-to-batch variation with the same compounds, so I think directly comparing your profiles with ours may not be the most effective when taken from different compound sources, on different lines, on different scopes, etc. It might work, I’m just not super optimistic. Certainly if you’re going to use this I would use CP 2.2 or 2.3, as there were major algorithm changes (especially for texture features (whose calculation is still really slow in CP3.X, something we’re actively working to fix by our next release)) between 2 and 3 which will make direct comparisons much harder. (This should answer your question 2 Cell Painting: Establishing the platform (2))

I think your idea of profiling the compounds and seeing if the relative networks are the same is a good one- you may find they’re similar enough, or you may find a “transfer function” you can use to compare one to the other.

re your third question: Cell Painting: Establishing the platform (3)

The previously published scripts will definitely work, but new developments are always at this point pushed to cytominer so it’s likely worth your time to learn to use it- see below for an example repo.


Hi Beth

Thank you for your answers. No problem moving to a single thread.

We will attempt to extract profiles for a subset of compounds from your dataset and compare to profiles produced on our platform using the relevant CP version, and from this evaluate whether the approach is feasible.
We can see that you have changed microscope during your studies, from the ImageExpress Micro XLS to the Opera Phenix. Have you attempted to correlate profiles of identical compounds from the two different microscope systems?

Also a new question has come up:
It is not clear to us at which point the compound/RNAi treatment is merged together will the per-well data. Are you using some custom scripts to output a CSV file with the per-well profiles and compound data?
Basically: How to you merge per-well profiles and what compound is in each well? Obviously CellProfiler does not need this data, and it is not needed before the whole analysis is completed.

In the same line: I have had trouble identifying how you treat replicates: Are they averaged and SD calculated? All DMSO per-cell controls must be averaged and used for per-plate-normalization (of the per-cell profiles) as far as I can tell? Are the normalized per-cell profiles then used to generate per-well profiles and then averaged among the four replicates?

I hope my questions are not too confusing, please ask for clarification if needed.

With kind regards
Esben Svenningsen

Aggregation of per-cell data to per-well data

Hey Esben,

I try to answer your last questions regarding the aggregation to well or treatment level profiles. We use our cytominer package to summarize the single cell features created by CellProfiler. To make cytominer available on the command line, we use custom scripts as you assumed, see https://github.com/broadinstitute/cytominer_scripts. If you want to have a look at our standardized workflow you can review https://cytomining.github.io/profiling-handbook/create-profiles.html ( this is optimized for our environment, so I am not sure how easy it would be adapt it to your needs; probably a good starting point).

Reg. your question how we merge compound ids with well level data: for each experiment we have a platemap that connects well position with the used compound + metadata. The CellProfiler csv’s contain the position of the well; this information is used to merge the compound id with the morphological profiles using the script annotate.R, see annotate.R

Regarding our workflow:

  1. We first aggregate the single cell data to per-well-level (we calculate the mean of the features)
  2. Next, we normalize the profiles: to normalize by-plate we calculate the mean and sd of the DMSO control wells and z-transform all well-level profiles ( subtract mean and divide by sd).
  3. we aggregate to perturbation level, i.e. we create the final profiles by calculating the mean of all available replicates.

Regarding step 3: if you use cytominer you can calculate the final profiles as mean, median, or mean+sd, see aggregate.R.

Hope these answer help to understand our workflow!



Hi Tim

This sounds like a great starting point for us, thank you very much for clarification!

We will work from here (our first plate is being treated as I write!), and I will reply if we face any more challenges we are unable to resolve!

If you have any other tips or resources we would definitely like to hear more from you.

With kind regards
Esben Svenningsen


Hey Esben,

(Tim again, switched accounts). I forgot to mention the review article " Data-analysis strategies for image-based cell profiling" that Juan Caicedo wrote. You can find it here: https://www.nature.com/articles/nmeth.4397

It is a very good introduction to cell profiling!

Hope this helps,


Hi Beth

We are progressing slowly but surely through the protocol.

I have a question:
Can you, or can you help me reach someone, who can confirm that the pipelines associated with the GigaScience/GigaDB paper (http://gigadb.org/dataset/100351) are the correct ones?
They seem to use the “'Load Images” module for the Illumination Correction and Analysis Pipelines, which seems to differ from what I expected, since the pipelines provided in the “Cell Painting” article (Nat Protoc.) does not use these.
I have succeded in running the Quality Control for our own images but were looking into how to incoorporate the results from CellProfiler Analyst into the Analysis Pipeline (and perhaps also remove aberrant images before calculating the illumination correction functions, even though there seems to be some discrepancy whether or not this is to be done…).
I was not sure whether to use the Flag Images module, and how to exactly implement it; by saving creating ‘gate tables’ in CPA and using these or by noting the min/max values for “PercentMaximal” and “PowerLogLogSlope”. To me, using the values in the Flag Images module seems like the better solution, since we don’t have enormous amount of data at this point.

But: I am still not sure the pipelines supplied in the GigaScience paper are correct? For now I will utilize the ones supplied in the Nat. Protoc. paper.


Hi Esben,

The pipelines in the GigaScience paper are older than the ones from the NatureProtocols paper IIRC; I’d stick with those.

With regards to QC, the QualityControl tutorial here is a good resource showing exactly how we do this ourselves in the lab!