Dealing with multidimensional images

Dear all,

I just begun using CellProfiler for a project (I usually use more custom build algorithm in Java or Matlab), and while I appreciate how nice a tool it is, I must admit I am having trouble understanding how CP deal with multidimensional data. As I understand it, CP is doing a loop on 2D images and those 2D images can be grouped in various ways if need be (time frame of a movie, or z-stack plane, say). But I could not find where/how CP keeps track of that, i.e. concatenating objects from all image in a group for further analysis in a module or analyzing all images of a group at once, for example.

I had a look into the tracking example provided in the web site, and as I understand the memory is kept within the tracking module itself, not in the workspace, i.e the tracking module is the one keeping track (hum) of what he saw in the previous 2D images, the workspace would only know about the current 2D image. I am getting it right? For what I had in mind I do have to have access to all objects in a group (i.e. objects from all time points), the way to do that would be inside my own module?

More generally, how is CP dealing with data up to 5D, with x-y-z-time-channels (that I usually have in other projects)? Unless I missed something, that would be tricky to do with the one level of grouping currently available. I would think having nested loop and/or dealing with more than 2D would become needed at some point, but it might go against the simplicity and usability principle of CP?..

Many thanks for the answer (and sorry if I just missed it in the doc)…

Hi ac,
We’re planning more extensive support for N-D, possibly starting work sometime in the next year. Right now, you’re right, the multidimensional support in CellProfiler is somewhat limited. The workspace only contains the image planes for a single 2-D timepoint / z-stack location. For N-D, we often use grouping to create one group of all images for a site, typically a 3-d time series or z-stack, but possibly N-D. You could maintain all planes in a group within your own module by storing them within the module dictionary (CPModule.get_dictionary()). If the planes don’t fit in memory and you’re working with the development branch, you could use the HDF5 file that we use to store measurements to store the planes (workspace.measurements.hdf_dict.hdf_file), although this is somewhat unsupported and could break in a later version.

We’re trying to stay away from looping constructs in CP, aside from grouping. I think your intuition is correct that looping would break many assumptions in the code and would make things complex for both the users and the code maintainers. We are considering moving grouping into the modules that do aggregation over groups: for instance, TrackObjects would define a group based on time series, but MakeProjection would define a group based on Z-stacks. We could then perform a dependency analysis of the pipeline and run all image sets in a group for all modules up to and including TrackObjects (or MakeProjection), then call TrackObjects.post_group to allow TrackObjects to make the tracking associations and then run all of the image sets in the group for all modules after TrackObjects. This would let downstream modules use the output of aggregation modules and it would let a user perform aggregation operations along any number of dimension axes within the same pipeline. The downside of this is that all intermediate images produced by modules prior to the aggregation step that are used in modules after the aggregation step either need to be saved or need to be recalculated for each image set in the group - possibly very costly in terms of storage space or execution time.

If I were writing a module to do N-D analysis with the current version of CellProfiler, I’d collect the image planes that I needed in the module’s “run” method, using the dictionary from CPModule.get_dictionary() as storage and then do the N-D processing in the “post_group” method. For things like an N-D segmentation, it may be necessary to run one pipeline to produce the segmentation and a second to operate on the segmentation results. I have done some experiments with 3-D segmentation of stacks of tens of megapixel images and the results can be calculated fairly quickly in core in 8GB of memory, so it’s certainly possible to operate on an N-D stack, even with the current version of CellProfiler.

–Lee

Thank you for the extensive answer; I can see the challenge in making something both powerful and user friendly… I’ll try using grouping as you advice, it should work for what I have to do.

ac

Adding Lee’s answer to the FAQ.