Non-linear scaling of analysis time with image count

cellprofiler
#1

After upgrading to CP3.1.8 and Win10 (from CP2.2.0 and Win7 on a different machine) I’ve noticed that analysis time doesn’t scale linearly with the number of images sent to the pipeline.

On a very simple pipeline (ID primary objects in 2 separate channels, output count to CSV), a single 384 plate (2 channels, 4 fields per well) takes about 40 minutes on my desktop and 2 plates take 80 minutes. But 6 plates take over 7 hours and adding more plates can bump the rate to 2hrs/plate according to the timer in the CP window lower right corner.

These times are by loading the images using the drag/drop interface and the images/metadata/Names&Types modules.

If I use the legacy LoadData module instead, the read times improve slightly: 32 min per plate for up to 2 plates, but then the per plate time increases to over an hour per plate when listing 6 plates worth of image sets in the CSV input file.

When I was using CP2 on my older machine, I regularly analyzed 36 plates at a time with no drop in per plate throughput.

My workstation is a 6core i7 CPU (12 threads)/16GB RAM, and CellProfiler is set to use 8 workers with 1024MB RAM for Java, which I think are the defaults from when I installed CP on this machine.

Is this degradation expected? Or are my CP3 settings wrong? Or is there a new way I’m supposed to be running multiple plates?

#2

You sound very experienced so I suspect this is not the case but my first thought is to check whether the later plates in the stack have many more cells/objects than the first couple of plates. That is a common reason for major differences in processing speed.

If not that, then IIRC, you can see in an exported Experiment file (or maybe Images file) the amount of time it takes for each module to process… can you see which module is behaving nonlinearly? Or is the time increase happening outside the typical modules (and thus more likely in other parts of the processing)? Certainly it takes longer to open/write the exported files at the end of each image cycle, the larger the batch of images. But I don’t know if that alone explains the behavior you are seeing.

I doubt I can help answer the Q but this info may help someone else to answer!

#3

Thanks for the hints. I ran several tests on two computers (one with CP3.1.8, an older one with CP2.2.0).
The first plot below shows the results of analyzing 4 plates individually (#10, 12, 14, and 20 in my plate stack), the same 4 plates analyzed together, and 12 total plates (#10-21 in my sequence) analyzed together.
I also simplified the protocol so there wouldn’t be an issue of the number of objects. Now it’s just:

  1. LoadData (2 images [channels] per set)
  2. Resize both images
  3. MeasureImageIntensity for a single image

In the results, the solid columns are CP3 data, the dotted data is CP2 data. CP2 is a bit slower in resizing the images, but that’s probably most likely due to the older CPU (with only 8GB ram vs 16GB in the CP3 machine). However, the measurements are fairly consistent between different run sizes (single plate, 4 plates, 12 plates). The error bars are larger in the LoadData step because the first several images are significantly slower to load, but then normalizes for the rest of the plate (I can plot the individual execution times for one plate if requested)

This 2nd plot shows the time per plate for an entire run. I took the total analysis time and divided it by the number of plates in the run. The results are very non-linear.
image
A single plate by itself gets analyzed in ~25 minutes. When I analyze 12 plates together, each individual plate now takes 4-5 times longer to complete.

1 Like
#4

So I expanded my tests a bit more, and used the grouping module. If I group by well, then the per plate times drop dramatically when analyzing multiple plates:

For CP3, while a single plate took 23minutes in my basic timeline, when grouping by well (there are 4 fields/well if that matters) the analysis time drops to 15 minutes. For 4 plates, the time per plate drops from 39 minutes to 16 minutes. And the change is even more dramatic for 12 plates where the per plate time drops from 87 minutes to 23 minutes (~4X improvement in throughput).

Is this expected? My understanding of grouping was that it was useful for things that were logically related and needed each other for analysis, but in this case, each image is completely independent of the other.

#5

Whoa, that is 100% not expected! Indeed grouping is just a convenient way to track results relative to each other (e.g. tracking movies over time, or running illumination correction on each batch separately) and I cannot think of any reason why it would change processing time! The only thing I can think of would be if the pipeline is set to save output files at a different interval (ie by groups) but it sounds like your trimmed down pipeline isn’t even exporting. And besides it’s not like you have tons of measurements that would make writing such a file take a while.

You’ve already done so much but would it be possible to make an example set of images and pipelines (slow vs fast) for debugging? If you want to share the entire thing (privately) that is fine or if you can make a smaller example that is fine too).

Disclaimer: We don’t have a lot of engineer time right now so I can’t promise we would be able to diagnose and solve this. But having this info makes it more likely!

#6

My original trimmed outline was exporting to spreadsheet as the very last step, but if I disable the module it has no effect on the timing (both w/out and with grouping).

It’ll be tough to share a set of actual images for testing since it’s so large (36,864 images). Alternatively, I can send the CSV picklist used by the LoadData module. It would be trivial to use something like PipelinePilot, Matlab, or R to parse that and generate a generic set of images to run through the minimal pipeline. I don’t think the actual images are the cause, but just the size of the dataset (which is actually on the smaller side of our screening runs). If I run the pipeline on small JPEG thumbnails (6KB vs 8GB), the problem is the same (4 plates with no grouping = 2.5 hrs; same 4 plates with well grouping = 35 minutes).

Would attaching the picklist and pipeline be sufficient?

1 Like
#7

That sounds reasonable. We have plenty of images here so most likely we could reproduce it with just your pipeline alone now that you’ve carefully described the behavior.

#8

Attached are two pipelines and a zipped CSV file. The two pipelines are identical in what they do (load images, resize, calculate image intensity), but one loads the images using the “Images”, “Metadata”, and “NamesAndTypes” modules, and the other uses the LoadData module with the CSV for the file locations and metadata.

My data are IN Cell 6000 files, with this file name style:

A - 08(fld 4 wv Blue - FITC).tif
A - 08(fld 4 wv Green - dsRed).tif

In my set, there are 12 plates, 384 wells, 4 fields/sites per well, and 2 channels (18,432 sets). The speed slowdown occurs when image sets are not grouped by any metadata. Massive improvement in throughput is seen when I group by Well (I haven’t tried other metadata).

My workstation is a Dell OptiPlex 7060 with a single CPU (6 cores) and 16 GB RAM. CP preferences are 8 workers and 1024 MB RAM for Java in case this doesn’t replicate in a cluster or multi CPU system.

large set scaling direct load.cpproj (760.8 KB)
large set scaling LoadData.cpproj (71.3 KB)
20190418_list12plt.zip (166.6 KB)

1 Like
#9

Thanks so much. I’ve filed an issue on Github, which you can follow here: https://github.com/CellProfiler/CellProfiler/issues/3743

#10

Have you checked in task manager when running your pipeline without groups that all CPU cores are used?

I had one windows machine a few years ago that wouldn’t actually make use of multiple workers unless I grouped my data.

#11

@Swarchal, thanks for the suggestion.
Resource Monitor screenshots when running 4 plates through CP3:


My machine has 1 CPU with 6 cores and 12 threads. But even if the problem was that only a single thread was being used, that wouldn’t explain why a single ungrouped plate takes ~25 minutes/plate, while 12 ungrouped plates take 80 minutes/plate.

#12

Have you tried seeing if the same thing happens if you use ExportToDatabase (in SQLite mode) rather than ExportToSpreadsheet? ExportToDatabase writes data as it goes, whereas ExportToSpreadsheet has to dump it all into a temporary file and then write it out at the end- I’m wondering if space and/or throughput of the temporary file is messing with you as the amount of data in it gets large.

#13

The slowdown occurs even if there is no ExportTo* module present. The issue also happens if I just have a Resize module activated (no analysis, no measurements, no export).

1 Like
#14

Super weird; we’ll look into it. Thank you for such super-thorough diagnostics!