CellProfiler: Combined Image and Objects Table

Hello! @bcimini @agoodman @Minh

I was wondering whether you could add one more option to ExportToSpreadsheet?

Namely:

Add image measurement columns to your object data files

Currently this is only possible for image metadata, but we need it very often for all image measurements. Of course one can do this with post-processing scripts, but for convenience it would be very nice if this could be avoided (and since the checkbox exists already for image metadata, it looks like you kind of have the code already?! ;-).

Do you think that would be feasible?

Best wishes, Christian (@Alex_H, @VolkerH)

Tischi, the required post processing is very little effort, just one command with an outer join of the two tables.
In python you would use pandas pd.merge (see a good tutorial here https://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/) and I’m sure it isn’t any more difficult in R.
Keeping them seperate leads to less data duplication, as you would simply have many columns that are identical for each object.

I know, e.g. in R, it is a one-liner:

merged.table <- merge( image.table, object.table )

It is more about avoiding to run extra code outside CellProfiler, which, for non programming people, can be nice thing. Also, it means you need R or python installed.

I also agree here, that’s why I am asking for an opt-in checkbox.

Ok, nevermind then. I made the wrong assumption that you wanted to use such a table as input to your HTM Explorer where you have R.

Can you give me an idea of the use case? Given that we give you in-pipeline options to do comparisons/normalizations between object measurements and image measurements (CalculateMath), I feel like that’s better for your hypothetical non-programming person (who also might not like the size of their object tables to double!), but I fully admit I may not have appreciated them all.

One concrete use-case is a browser tool which enables clicking on an object in a table and then the browser tool would open up the corresponding image. To do so, it needs to know the paths to the saved images, which are only available in the images table.

I just naively thought, since there already is a check box

Add image metadata columns to your object data files

it would be consistent and not a big deal to also add another one

Add image measurement columns to your object data files

If that raises too many conceptual concerns, that’s of course totally fine, as it is trivial to do the merging if needed at a later step.

It’s not a conceptual concern so much as a practical one- there are often hundreds or thousands of image measurements generated, some of which might be useful to have on an object sheet (example- an intensity metric, if you imagine trying to normalize object measurements to an image measurement of background per-image) and some probably never useful (example-how long in seconds did the CorrectIlluminationApply module take to run?).

For a “computationally naive” user, who might ALREADY get shocked by how many columns are present in an object spreadsheet, I think potentially doubling the number of columns might cause more issues than it solves. Willing to discuss more, definitely though.

All that being said, I think adding Path and FileName to the “image metadata” exported to the object spreadsheet is a 100% great idea that we should do (do you mind making a GH issue for this?). Did you have other use cases in mind for this, or is that the major one?

1 Like

Done here: https://github.com/CellProfiler/CellProfiler/issues/3739
Thanks for the discussion!

1 Like

@agoodman
out of curiosity: why are you storing the folder and file name in separate columns rather than just one column with the full path?

I would guess it’s because we allow users to use the path information to define metadata and such.

And I suspect CellProfiler Analyst may need path info? Just guesses though!