Issue with Illumination Correction Pipeline

Hi everyone,

I am trying to recreate the workflow published in Bray et al. GigaScience 2017. When I run the Illumination correction I have the following issues.

  1. The pipeline takes around 70h for just 2304 datasets using 24 workers.
  2. When I run the pipeline with 6 data sets, the analysis finished in 12min (with 6 workers) but the output images were not saved to my output folder. The test analysis ran successfully and there are no errors that show up.

I already have corrected images, so I was wondering if I could simply just use the analysis pipeline from the paper but that will not work because the pipeline refers to the data generated in the illumination correction pipeline. Maybe someone can help me out…

I have attached a screenshot and the pipeline file.

Final2.0.cpproj (113 KB)

Hi @Cecile_Meier-Scherli,

Looking at your pipeline, it appears to be missing the SaveImages modules needed to export the resulting illumination correction functions. The original paper you mentioned looks like it supplies CellProfiler 2 pipelines, so it’s possible that some modules weren’t converted properly when upgrading to CellProfiler 3 for your project. If you’re happy with the current settings in your pipeline it should be straightforward to re-add the SaveImages modules. CellProfiler 2 would probably have these modules set up to save the images in the “.mat” format, but in CellProfiler 3 this was replaced with the “.npy” format which should work just fine.

Regarding time to process, it is possible to run CellProfiler in the cloud across multiple machines. Otherwise analysis speed will be limited by the machine you’re running it on. One thing to be careful of is the maximum number of workers you allow CellProfiler to use, which is set in the Preferences panel. This number should not exceed the number of physical CPU cores on your machine, otherwise workers will start to compete with each other for resources and may slow the analysis down. On Windows you can find your core count on the “Performance” tab of Task Manager. Note that the ‘Core’ count is what you want, not the ‘Logical Processors’ count.

Hope that helps!

Hi @DStirling

thank you very much for your feedback. To escape the issue that I had previously, I combined all pipelines into one (attached). Now, when I enter Analyse Images the pipeline starts running and a message appears that everything is being saved to a batch file. After a few seconds the analysis is done. When I open the batch file it contains weird signs and numbers (attached). I am wondering if there is an issue when I download all the pipelines from the paper which results in pipelines being lost…

Project_with_all_pipelines.cpproj (195.0 KB)

Hi @Cecile_Meier-Scherli,

The batch file is designed to allow you to run your analysis on a computing cluster. It’s essentially a list of instructions for which images to analyse with your pipeline. If you want to run the actual analysis normally on your local machine, you need to remove the CreateBatchFiles module from the pipeline.

Hi @DStirling
thank you for that advice. I have removed the CreatBatchFile Module. However, the analysis is still taking a very long time. I decided to run the analysis on a Remote Desktop Connection which has 24 CPU Processors as well as directly on my Laptop (also 24 CPU Processors). I still have 246GB of memory available and am only using 3-4% of CPU on my Remote Desktop.

The error message (attached) I get when running the pipeline on my laptop says that there is an issue with NamesandTypes (screenshpt attached).
Project_with_all_pipelines.cpproj (195.0 KB)

The error message I get when running the program on the Remote Desktop (attached) appears as soon as the module ExporttoDatabase is running. Before this error occurs, it shows me outputs up to SaveImages #31,imagecyle #1. However, the outputs that poop up while running the program are different to the ones that are saved in my output file. I have attached the files that get saved to my output folder.

Project_with_all_pipelines.cpproj (1.1 MB)

I am not sure how to speed the analysis up and am also not sure what the errors mean… In the end I would run the pipelines on the Remote Desktop btu for debugging I have also used my laptop.

Hi @Cecile_Meier-Scherli,

On your laptop, it appears that CellProfiler isn’t able to access some of the file list you have loaded. If the files are being loaded from a network location it could be related to folder permissions, particularly if you’re working remotely.

Regarding ExportToDatabase, this is a known issue in the current version when working with a MySQL database. Turning off the display window by clicking the ‘eye’ icon on the pipeline list should let things run normally. Alternatively, running the module in SQLite mode should be fully functional.