Aggregation of per-cell data to per-well data

I am currently attempting to set up a Cell Profiler platform, I have previously posted here.
We have now prepared the first plate with treatments, imaged it, and run it through quality control, illumination correction and per-cell feature extraction with export to a MySQL database.

However I am facing major issues with the aggregation of single-cell data to per-well data.
I have attempted to follow the workflows described in various places:
Cytominer Workshop
Cytominer R Vignette
Supporting info from Bray et al. 2016, Nat. Protoc.

For the last approach I am running into the same issues as Axel Pahl described here, but which was never really resolved (@shntnu). I am not sure if it is required to have annotated treatments/controls for this approach to work?

For the Cytominer approach I have attempted to merge available descriptions. I am using a MySQL-database, as this is what was described in the Cell Painting article. To me it seems that SQLite is now preferred for using with Cytominer, so would it be better to make the switch now?
When following the workflow from the Cytominer Workshop (loading the MySQL-database with RMariaDB in R instead) i receive an error at

> profiles %<>% collect()
Error in result_fetch(res@ptr, n = n) : [0]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

It seems that the joining of the tables works, i can call columns from object with eg. select(object, Cells_AreaShape_Extent)

The next step is the aggregation with Cytominer, which does not yield any errors, but the output seems alittle weird and I cannot call eg. select(profiles, Cells_AreaShape_Extent) which returns:
Error in result_create(conn@ptr, statement, is_statement) :
Too many columns [1117]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

@Tim_1, @cells2numbers
The workflow you linked to me in is difficult to decipher for us, as we are not so proficient in (or working in) Linux, and have our data stored locally at the moment. I had hoped we would be able to perform the aggregation fairly easily and then annotate wells to compounds and normalize all in R, using or with inspiration from the available R-scripts.

To sum up:
We are currently at a place where we only need to aggregate the per-cell data to per-well data. We can then normalize to DMSO control and start working on the other statistics to evaluate the generated profiles. But we have not found any functioning method perform this aggregation. Is there anyone from the Carpenter lab, or other labs who is working with Cell Painting who in any way can help us solve these challenges?

With kind regards
Esben Svenningsen
Aarhus University

Followup question:
Is there any reason that “Calculate the per-well mean values of object measurements” in Cell Profiler (CP2.1.1) is not used to generate the per-well profiles?

I do recommend your using ExportToSpreadsheet and saving your data as CSVs, then using cytominer-database to ingest into a SQLite backend. The structure of your CSV’s should look as documented here and here An example of what the data would look like is in this tests directory of cytominer-database
So indeed, going forward, please switch over to ExportToSpreadsheet.

But to address the problem with your current setup – I think the issue you are facing is your final table (profiles) has too many columns for MySQL. One solution is to first collect each table (Cells, Cytoplasm, Nuclei; assuming you extracted those 3) and then join. But before that, can you share the code leading up to profiles %<>% collect()?

1 Like

Hi Shantanu
Thank you for a rapid reponse!

I compressed my .R script to a .rar to be able to upload it:
Database_test.rar (909 Bytes). But I guess you are right about the MySQL.

I will attempt with ExportToSpreadsheet instead, and then following your infrastructure to see if that can solve my issues. Otherwise I will keep you updated on the progress on here!


Cytominer-database has to run from “command-line”. Does this mean eg. an Anaconda prompt or can windows CMD work? Or must an Ubuntu windows terminal be installed?

You can pip install cytominer-database from Windows Command Line. I’d imagine you can do the same from Anaconda prompt, but never tried it. You don’t need Ubuntu for this.

Does this work?
If it does, then there might be a path forward

Hi @shntnu

The script you uploaded worked!
Is the solution to calculate them one at a time (for nuclei, cells and cytoplasm) and then merge in the end?

Hi Shantanu

You write:

Is it possible to get Cell Profiler to output the data in a similar structure when running it locally? As far as I can tell there is no options in the ExportToSpreadsheet module?

Yes, because R data frames don’t complain about too many columns, by MySQL and other relational db’s do (including SQLite).

Now that that worked, I’d recommend adapting this code for your needs:

Here’s a pipeline that does that. See the settings of the ExportToSpreadsheet module.

analysis.cppipe (26.8 KB)

Hi @shntnu

First of all thank you for your fast responses!!

I unfortunately get the same error as before when running it on the nuclei object:

> profiles.nuclei %<>% collect()
Error in result_fetch(res@ptr, n = n) : [0]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

It works fine for cells and cytoplasm. Do you have a suggestion to where the error could be?

Please share the full code – that might provide some clues

Here it is:

I have changed database and plate from previous (the other one was a subset).

Changing line 33 from:
tbl(src = db.analysis, "plate00001_per_nuclei")
tbl(src = db.analysis, "plate00001_per_cytoplasm")
Makes the aggregation work (same for cells).

Okay, I had the same settings except for “Add image metadata columns to your object data file?” was set to yes. The calculation is currently running with the updated ExportToSpreadsheet.

Pipeline is here for reference 211analysis.cppipe (40.7 KB)

I could not find all settings present when i open the file in a text editor (notepad), but I guess some of them only show up if you change eg. “Create a GenePattern GCT file?” to yes.

Is the CP version critical? I run in 2.1.1

This might be slow, but what happens you do object %<>% collect() at line 34? (when usingplate00001_per_nuclei in the previous line)

That works fine, I get table with 15859 obs. of 504 variables out

ok, next, do object %<>% collect() after the line object %<>% inner_join(image, by = c("ImageNumber"))
what happens?

Also works, gives 15859 obs. of 1129 variables

Ok. I’m puzzled why that works, and yet you get an error if you collect after aggregating, which you reported here:

In any case, if you’re ok with the memory overhead of doing object %<>% collect() after doing the inner join, then you’re all set. Pretty sure this will now work with nucleus, too, but go ahead and try it out. That is, in your code, add object %<>% collect() after the inner join, then run the whole code (for nucleus as the object) and LMK what happens