CellProfiler 4 freezes near end of list of image sets

Hi,

It seems CellProfiler 4.04 freezes when processing a pipeline that CP 3 completed successfully.

Screen shots and the debug log are here: link.

How to recreate the error:

  1. In CP4 create a pipeline with measurement steps followed by an “ExportToSpreadsheet” step (see screenshot of pipeline in link above)
  2. Add about 30 image sets, each with 2 large images.
  3. Run the pipepline; wait for about 20 minutes to get near the end of the image sets.
  4. CP 4 will freeze (see screenshot at above link). Here “freeze” means the CP output log stops generating new messages. The last two message it will show before freezing will look like this, or some variation on this with different numbers:
    Progress Counter({‘Done’: 56, ‘InProcess’: 4, ‘FinishedWaitingMeasurements’: 1})
    Progress Counter({‘Done’: 57, ‘InProcess’: 4})
  5. Wait an hour; nothing happens; CP stays frozen.
  6. Click “Stop Analysis”. The CP output log starts generating messages again (see log snapshot after clicking cancel at above link) while it cancels things, but it doesn’t export the spreadsheet.

One of the symptoms is that in resource manager, the amount of RAM being used keeps growing by about 1-2 GB with each successive image set processed, even though CP is completing groups. Why can’t it save to the spreadsheet at the end of processing each group, to free up RAM?

Another symptom is that the last two lines before it freezes are always some variation of this:
Progress Counter({‘Done’: 56, ‘InProcess’: 4, ‘FinishedWaitingMeasurements’: 1})
Progress Counter({‘Done’: 57, ‘InProcess’: 4})

Another symptom is that it only freezes when it’s processing the last few image sets in a set of a list of images (e.g. it will freeze at image set 28 out of 31; in this case each image set has two images).

Another symptom is that it freezes while there is still >10 GB out of 48 GB free. It only increases the RAM use by about 1-2 GB per image set, so it’s strange that it freezes with relatively so much RAM left free

Another symptom is that after it freezes, if you click “Stop Analysis”, then it is able to unfreeze, but it doesn’t save the spreadsheet. The files here contain snapshots of the CP output log before and after clicking “Stop Analysis”.

Maybe workers are running out of their own memory buffers, but why do they have these limits, if there’s lots of RAM left? In CP3 one could assign the amount of RAM available to Java. Does CP4 have this feature, and could it help here?

Is there a way to make CP save to the spreadsheet after each group completes, rather than waiting for all of the groups to complete, so it does not need to keep all of that data in memory until the end of the entire batch?

Things I’ve tried:

  • using fewer workers (4 instead of 16, it still freezes when it gets to the last few image sets)
  • different image set; it still froze at the last few image sets
  • cropping the images to about 1/50th of their original size: this worked; it made it all the way through the image sets; however, CP3 was able to run the pipeline with the full-sized images

What am I doing wrong?

System:
Windows 10
48 GB RAM

Thank you.

Best regards,
John

1 Like

Hmmm, that’s very unusual, and we’re sorry to hear it! We’ll try to reproduce with some image sets on our side; if we can’t, we may need to touch base with you to get your image sets.

Hi @johnramunas,

It looks like the files you uploaded aren’t public, so I can’t view them. However, you’re right that ExportToSpreadsheet tries to write data after the run, rather than after each group or image set.

A possible solution might be to use the ExportToDatabase module instead. This module writes as the run progresses and so shouldn’t hit memory issues.

However, the program shouldn’t be freezing instead of completing. Pressing “Stop analysis” should indeed cancel the writing process entirely so that’s expected. We’ll take a look into this.

Hi @bcimini,

I’ve recently started running into a similar issue as well with CP 4.07, with pretty much the exact symptoms that John described in his initial post. The pipeline processes all images until the very last image, and then just hangs. The Progress Counter just shows the second to last image InProcess. What’s curious is that this seems to happen ~90% of the time, but every once in awhile it’ll be able to finish up an image set. The same pipeline causes this issue across 3 different computers. Switching to Export to Database as David suggested above led to the pipeline eventually slowing down about halfway through, and then just extending in length indefinitely.

Re-creating the pipeline in CP 3 worked fine, even with 5000 images writing to spreadsheets at the end.

Did you guys ever get a chance to follow up on John’s initial issue? Happy to provide my pipeline/images if that help.

Thank you.

Do you have a sense of the number of images (and what size, and ballpark how many objects per image- 10, 100, 1000, etc) that are necessary to trigger this issue in your hands- >1000 images, 2000, etc?

I ask because we haven’t been able to verify it in our hands (and typically folks can’t/don’t want to share thousands of images to help us verify - if you have a minimal image set that causes this issue on your machine, please do share it!), which makes it hard to debug and fix in the absence of an error message pointing to a specific issue. Typically we would switch to running on a cluster after more than a couple of hundred images. I can attempt to let this run on a single local machine though if you have a sense of how much data is needed to cause the issue.

It would also be great to have a look at your pipeline, to see if there’s anything unusual there compared to what we would typically run in our hands that might be why we haven’t seen it ourselves.

I am also having this problem on some of my pipelines, stuck on the export to spreadsheet step forever without showing any error. Only solution I found was running on an earlier CP version…

I am also having the exact same problem, even with 24 images. The progress counter stops at 22. Upon removal of 4 images (set is now 20), the counter stops at 18. Running an older version of CellProfiler is not an option for us, since we made a custom module for version 4…

I have the exact same issue. One image works fine but when running multiple it doesn’t finish. I have also tried multiple computers etc and had no issues of this kind in CP3. It seems to only be affecting complex and long pipelines (70+ modules), some of our smaller ones worked fine. When monitoring the run each worker seem to be using increasingly much memory until running out (for me that usually happens around 10GB memory per worker when running 7 workers). Could there be a memory leak somewhere?

@ckuijl @August_Lundquist @amycaitriona are any of you using Windows? If so, try selecting the console window when it freezes and pressing ctrl+c. If an error message then appears that’d be super helpful.

Otherwise if someone could upload a problem pipeline and image set we might be able to replicate this on our end.

Thanks

I resized all my images before putting them into the pipeline so they were all a big smaller and the same size, this seemed to stop the problem so I guess it is a memory issue?

When I try using this version though for 3D pipelines I get the same problem, but its stuck at at 0 out of all images. It runs through in test mode, only when the eyes are closed but in analysis mode it is stuck. Not sure if it is the same issue…

I am experiencing the exact same issue on CellProfiler 4.0.6 on Windows 10.
Each image set consists of 3 images. One with DAPI the other two with membrane markers. I have first tried it with 118 image sets and went down to 64 afterwards with the same issue.

The images are .tiff with a resolution of 1080x1080 pixels and between 5 and 200 objects per image.
I have built Cellprofiler from source and I am also using custom modules. However, the error only occurs during a run of built-in modules. For the set of 64 image sets it occurs during ExportToDatabase. For the set of 118 image sets, it occurs during IdentifySecondaryObjects. Other runs without IdentifySecondaryObjects and the same custom modules work just fine. Memory usage does not increase towards the end of the run.
My guess is that it has something to do with how measurements are stored.
No error message are returned, neither with pressing ctrl+c in the command line terminal, nor by just pressing stop analysis.

Are there any news on how to fix this?

Thanks, this is very helpful information.

In order to fix this we need to be able to replicate the problem on our end. If someone could upload a pipeline and image set that triggers the error it’d really help us to figure out what’s going wrong here.

Sure, should I just attach it here together with a small image test set?

I have performed further trouble shooting in the meantime. The error also persists when going down to 13 image sets. After removing one of my custom modules (which identified the primary objects with a neural network implemented in tensorflow) from the pipeline it is also still there.
Switching from ExportToDatabase to ExportToSpreadsheet was also not helpful.

This is especially puzzling to me because I have analyzed these same images with a CellProfiler 3.1.9 pipeline without any custom modules and it worked just fine.

Here, or somewhere else and linked here would be great. We’ve never been able to replicate on our own end, nor get anyone to pass along a set that definitely should trigger the issue; having a set we can test on is going to be critical to the problem getting fixed!

Thanks @bcimini @DStirling for the prompt responses!

This is very odd, and I apologize if I have unnecessarily taken your time, but somehow some changes have resolved the issue on my side. I do not really understand why though.

I am posting my steps below and attach some files that might be helpful so others might be faster in solving the issue on their side:
As explained above I have reduced my input data to 13 images and replaced my custom module with IdentifyPrimaryObjects. Notably, I only deactivated it (same for the switch from ExportToDatabase to ExportToSpreadsheet) and did not remove it when I still had the issue. Afterwards, I have done the following:

Moved the input images from an external SSD mounted on E: to a folder in my user directory on C:.
Ran the pipeline again on the images on C: → the issue persisted.
Reduced the number of images to be measured by the MeasureObjectIntensity module by a third by only measuring illumination corrected images and not raw or gamma corrected images.
Ran the pipeline again - > the issue persisted
Removed the custom GammaCorrection module.
Saved the project under a different filename and removed all deactivated modules (ExportToDatabase which replaced ExportToSpreadsheet and my custom nuclei detection module).
Ran the pipeline (the newly saved project) → The issue was gone.
Re-introduced the custom GammaCorrection module (saved the pipeline under a different name) → no issues
Re-introduced my custom nuclei detection module (saved the pipeline under the same name) → no issues

I do not know what exactly caused the issue. Currently, I assume it has something to do with how deactivated modules are handled internally, or there is some hashing of data going on that I am not aware of.
I have attached the .cpproj file of the faulty pipeline for which the issue persists. I would prefer to not share the code for my custom module since it contains some hard coded paths, is prototype of an ongoing project and not fully up to my standards for code documentation and structure.upload_pipeline.cpproj (817.5 KB)

Pardon the late reply! Like Ben, I also managed to solve this issue through some rather odd change (and subsequently forgot about it…). I think that the issue was that I had an export to database module at the end of the pipeline where a previous measurement and/or image that I had removed or renamed was missing (without giving an error message). When I reintroduced some other measurement modules, reshuffled the images and created a new export to database module things started working again.

I am not sure we had the same issue from the start (I am not using any custom modules) and realise this might not be too useful feedback, but I thought it might be helpful to share a potential solution anyways. In general, I have found that sometimes minor changes in upstream modules (i.e. changing names on output objects or images) don’t always register errors in the export to database module (and also in filter object modules using cell profiler analyst generated rules), despite the program getting stuck on it and not producing output. I always re-select objects and images in these final modules after making upstream changes.