CellProfiler quirks

Hello,
We’re in the process of migrating our image segmentation tasks from our current home-built matlab code to CellProfiler. The primary reason for this is the flexibility that CP2 offers in identifying objects. I’ve been very impressed with CP2, so thank you. The interface is clean and intuitive, and the help manual is well documented. I have encountered a few issues, which I describe below. Some of these may be my mistakes.

I initially made the mistake of putting a hyphen in the “name this loaded image” field in the loadimages module. CellProfiler didn’t warn me (and I forgot) that hyphens aren’t allowed in the field name of a matlab struct. The pipeline even appears to run and a mat file is produced, so I didn’t discover the error until I loaded the mat file.

The command line capabilities are great, but there seems to be no built-in way to run the same pipeline on multiple folders in batch. Instead, I wrote a little python code to do this.

Suppose you have a single image that you want to use for correctilluminationapply against a long series of images. This process is inefficient in CP2 because rather than performing loadsingleimage once at the beginning of the analysis, CP2 performs loadsingleimage at every iteration.

I’m performing morphological opening on my nuclei, because identifyprimaryobjects sometimes gives jagged edges that aren’t in the actual image, despite my efforts playing around with the parameters. After I have my rough nuclei by identifyprimaryobjects, I use convertobjectstoimage, using type binary. Then I use morph (open with scale 3), then I use identifyprimaryobjects with a manual threshold between 0 and 1. Surprisingly (for me at least), the size of the (opened) nuclei now depends on the exact value of the manual threshold that I use, e.g., the objects are smaller with a manual threshold of 0.9 than with 0.5. A manual threshold of 0.5 seems to produce the correct size, which I suppose makes sense, but I did not expect such behavior.

When using saveimages and “select the type of image to save” is “objects”, there seems to be no option to specify saving the image as uint16 (suppose I may have more than 255 objects). Instead, I have to use convertobjectstoimages with uint16, followed by saveimages with tif (or tiff) and image bit depth of 16. Unfortunately, saving as 16-bit png is not possible, so each of my object images is 2 MB instead of 20 KB. I’m currently using imagemagick to batch convert all the tifs to pngs after each analysis finishes.

The trackobjects module leaves a bit to be desired, in my opinion. I know you’ve got the LAP algorithm in there, but its various parameters don’t seem very intuitive. At this point, we’re using CP2 to segment our images, measure intensities, etc., and spit out a mat file, then we use a separate particle tracking algorithm (physics.georgetown.edu/matlab/) to track our cells. I’ll naively suggest that you make that algorithm available in CP2.

Cellprofiler cannot handle the situation in which a “saveimages” module is followed by a “loadimages” module that will load the previously saved images, because the images don’t exist when the analysis is started. If the images exist from a previous analysis (even though they will be overwritten), it works. I realize this is not something you should usually have to do.

This is a good point. We will include a warning in the next release.

There are a couple of ways to handle this, depending on what you want to do:

  • If you want to process all sub-folders in a root directory, you can set the default input folder as the root folder, and set “Analyze all subfolders within the selected folder” in LoadImages to “All” or “Some”
  • For more fine-grained control, you can use LoadData to process a .csv of image paths and filenames. We use this routinely for our cluster runs in which thousands of images are processed.

That’s understandable. I’ll put this in as a feature request; however, we have found that this step is a small fraction of the time used performing the remaining analysis modules, so it won’t be a high-priority issue.

I believe this is because this a small amount of blurring done under the hood, aside from that done as specified by the settings; I imagine that this would distort a binary image. If so, this is a bug that we’ll correct.

The help in SaveImages for the type of object to save mentions that the module will use an 8-bit .tif file if there are fewer than 256 objects and will use a 16-bit .tif otherwise, so ConvertObjectsToImage shouldn’t be necessary; have you found this not to be the case?

A possible workaround is to use RunImageJ based on the comments here: imagej.1557.n6.nabble.com/16-bit … 95138.html

We very much agree. Part of the problem is that the settings don’t translate terribly well into physical parameters. Clarifying these items is already on our to-do list (among other things)

Correct; CP builds the file list pre-run and assume it is static. Far too many nasty things could occur if we assumed otherwise. We typically break such operations up into two pipelines.

As an aside, since you’ve been thinking about this, you might want to take a look at these forum threads and give us your thoughts:

Cheers,
-Mark

[quote=“jakejh06”]The command line capabilities are great, but there seems to be no built-in way to run the same pipeline on multiple folders in batch. Instead, I wrote a little python code to do this.

[quote=“mbray”]There are a couple of ways to handle this, depending on what you want to do:

  • If you want to process all sub-folders in a root directory, you can set the default input folder as the root folder, and set “Analyze all subfolders within the selected folder” in LoadImages to “All” or “Some”

[/quote]

  • For more fine-grained control, you can use LoadData to process a .csv of image paths and filenames. We use this routinely for our cluster runs in which thousands of images are processed.
    [/quote]

I have tried the “analyze all subfolders” setting. The problem was that although CP can analyze the individual subfolders ok, I couldn’t get it to save the output in individual subfolders, so it just continually overwrites the output for each subfolder. I haven’t tried the LoadData module.

[quote=“jakejh06”]Suppose you have a single image that you want to use for correctilluminationapply against a long series of images. This process is inefficient in CP2 because rather than performing loadsingleimage once at the beginning of the analysis, CP2 performs loadsingleimage at every iteration.

[/quote]

I agree it wouldn’t save much time for correctilluminationapply. For a module like trackobjects however, it might make more of a difference. Conceptually, trackobjects should be run on a complete series of object images. Suppose that objects are allowed to disappear for a couple frames, for example.

[quote=“jakejh06”]When using saveimages and “select the type of image to save” is “objects”, there seems to be no option to specify saving the image as uint16 (suppose I may have more than 255 objects). Instead, I have to use convertobjectstoimages with uint16, followed by saveimages with tif (or tiff) and image bit depth of 16. Unfortunately, saving as 16-bit png is not possible, so each of my object images is 2 MB instead of 20 KB. I’m currently using imagemagick to batch convert all the tifs to pngs after each analysis finishes.

[/quote]

I hadn’t noticed that. That’s good, although I would prefer that all my saved images have the same bit depth. Regardless, it would be very helpful if CP could save 16-bit pngs.

This issue has been resolved in CellProfiler 2.1 and later, which can now be downloaded from http://cellprofiler.org.