Cell Profiler - loop while running? (long delay)

Hi,
I am running the attached project to analyze sections of a midbrain organoid. When I try to run the pipeline, it takes a long time and eventually reaches a data limit (~50gb). I’ve tried removing modules and have isolated the problem to the MeasureObjectNeighbors and MeasureObjectIntensityDistribution (the MeasureColocalization is also quite heavy) modules. Could it be that the parameters are causing some sort of loop? If so, any ideas what that could be due to?

I’m using version 4.0.7 on Big Sur.

Thanks!

new images.cpproj (1.9 MB)

Hi @Samir_Gouin,

Could I ask how large the images in question are? You may simply be measuring so many objects that CellProfiler can’t store data for them all.

It looks like you’re only running one image set, but if there are more it may be worth switching from ExportToSpreadsheet to ExportToDatabase in SQLite mode, since this avoids having to store measurements in the internal buffer when running multiple image sets.

Otherwise, if you’re using an extremely large image, another approach might be to break it down into smaller tiles using something like the crop module.

1 Like

Hi @DStirling,

I’ve attached a screenshot with the image sizes. Although there is quite a bit of variation, do you think images < 20MB should run quickly? I am running it on 1 set for now but hope to expand to many after adjusting the pipeline.

Otherwise I will look into the tiling the images, thanks for the suggestion!

How many objects are being detected in your segmentation?

3 types of objects: nuclei ~ 6000, astrocytes ~ 6000 and organoid area 1

12,000 objects in a single image could probably be the cause of the problems here, yes. When you mention a ‘data limit’, do you see an error in the CellProfiler console? If so, it may help to know the full log. It’s possible that there may be a memory issue within one of the calculations, such that doing it 12,000 times to analyse each object then becomes a problem.

I did not see an error, it was a warning that the data had reached 50 gb and I stopped the program afterwards. The IdentifyPrimaryObjects modules work relatively quickly so it seems the issue is when combing information from many objects in the MeasureColocalization, MeasureObjectNeighbor and MeasureObjectIntensity modules.

I’ve ran similar pipelines on roughly the same number of objects w/o any issue.

I see, what exactly does this warning say then? I’d like to figure out whether this is coming from CellProfiler itself of Big Sur.

I did not screenshot the warning and do not want to run the program for that long again (data constraints) but here is a photo of the time before stopping. I checked with other pipelines and had no issue with Big Sur.

Thanks, are you able to run the image set in test mode without issues?

No, it gets stuck at MeasureObjectsNeighbors (if I remove this it gets stuck at MeasureObjectIntensity)

Ah, would this happen to be one of the brand new Macs that’s just been released?

It may also help if you could upload one of the image sets so that I can test this locally.

Thanks

No (unfortunately not) may I email you? Would prefer not to share the photos publicly
Thanks for all your help!

Thanks for clarifying that. It’s understandable if you’d rather not post images publicly, you could send me a private message via my profile if that’s better?

Thanks for sending those. Looking at the pipeline it’s pretty clear that it’s running out of memory in a way that we might expect. For the first image set you have two sets of objects, containing 64000 and 46000 objects. In MeasureObjectNeighbors you then ask it to measure the distance between the two sets. Because of that the program wants to generate a distance matrix which would be 64000x46000 in size, which instantly maxes out your system memory. On Windows (and I expect older OSX versions) Python would then display a warning about being unable to allocate enough memory for an array of that size, I’m not sure why that’s not appearing here.

It’s possible that with some development we might be able to come up with a more memory efficient way to calculate neighbours, but for the time being this is somewhat expected. These object sets are extremely large and so I expect the processing time in other modules reflects this too.

I think the best approach here might be to try breaking the images down into tiles before processing.

As another bit of advice, it looks like you use an initial IdentifyPrimaryObjects module to just try to find the whole area of tissue. This is an extremely intensive module to use for this, so you may be better served by trying the Threshold module followed by RemoveHoles, then perhaps ConvertImageToObjects to generate your ‘tissue’ objects. Alternatively you could enable IdentifyPrimaryObjects’ advanced settings and disable the ‘separate touching objects’ options. This should save a considerable amount of time.

2 Likes