I am currently working with the pipeline published in Bray et al. Nature Protocols (2016) and trying to apply the illumination pipeline to almost 3500 images from a single plate. This step, with a filter size of 200, takes almost 1h 30’ per image, which is a lot of computing time. As far as I have understood, the CorrectIlluminationCalculate Module has to be applied to all the images of a plate on a single batch, so using this with the module CreateBatchFiles and splitting this task in 250 computing nodes that we have available at our cluster is not an option.
I have tried to decrease the filter size, which speeds the computing time but decreases the quality of the illumination correction and the resulting images are less bright.
I have also tried to invoke cellprofiler headless from the command-line and with the GUI. Cellprofiler seems to run on a single thread headless, but when I run this pipeline from cellprofiler GUI a single thread is launched, although in my preferences I have 12 set as the maximum number of workers.
My questions are:
Is it incorrect to divide the CorrectIlluminationCalculate task of a single plate into several batches to decrease the required computing time, as suggested in this post? In case it is correct, should I store the Illumination Functions computed for each batch or only the last one?
Is there a way to multithread this task?
Am I missing something or doing something wrong? It is the first time I use this software and it feels like the computing time I am getting for CorrectIlluminationCalculate is too high.
I’m using ClusterProfiler 4, and here you can find the pipeline I am currently using:
illum_CP4.cppipe (16.5 KB)
Please, if you need any other information to help me just let me know