Multithreading - Cellprofiler 2.2 & 3.0

cellprofiler
cores-cpu

#1

Hi All,

I started using Cellprofiler after small break. Sometime ago cellprofiler was able to use all cores in a processor and run multiple workers in parallel. I saw that this function is still maintained in 2.2 and 3.0 version of the program.

However, now I tested 2.2 and 3.0 versions (in GUI versions) and the program “says” that it runs multiple workers, but based on a CPU usage and Java window shows that only one set of images is being analysed. Which is quite disappointing since this slows down analysis very significantly. I tested this on three different PC in the lab and results are exactly the same. I am sure that not only I have problem, unless all PC here are configurated the same way.

My images are not very big, its only 9MB each, one set consist of six images like this. Thus I am confident that PC has more than enough of resources to handle more than one worker. I am using a workstation with 16 cores and 32GB of RAM memory. I split all the images into 10 groups and ran 10 individual Cellprofiler programs, which PC handled perfectly fine, however I feel that this is not a good solution.

I tried to change preferences in how many workers to run, from 1 to 16, but results are exactly the same all the time - only one worker running.

Therefore I am wondering is there anything I can do in order for Cellprofiler to use multiple cores, since this seems possible? Do I need to adjust any setting in Windows (I use Windows 10) or Cellprofiler?

Thank you in advance for the help!

Mindaugas


#2

I’m pretty sure that it works in 2.2.
Perhaps you have something specific in pipeline which blocks multiprocessing?
Try a simple pipeline with just one identify primary objects and check if it works.


#3

Hi Mindaugas,

I’m using 3.0 on my PC and don’t have that problem.
Just one idea - When you start the analysis, CellProfiler always runs through the whole pipeline once with a single worker before running with the maximum. Have you allowed it to go through the whole pipeline a couple of times?
Hope this helps,
Francesca


#4

Thank you for your responses and suggestions.

I tried running a simplified pipeline and there are no changes, it still runs one worker. I attached an image demonstrating that. So fafafft and nichollsfj, you observe that all (whatever it is) workers are being analysed at the same time? Since in my case its one after another, which is quite slow and doesn’t use full processor resources.

Let me know if someone has any suggestions!

Thank you for your help!


#5

Hi Mindaugas,
Screen shot attached for comparison (I only run 3 workers as that’s what seems to work best on my desktop PC), so I’m afraid I’m not sure why yours isn’t doing the same. If you want to drop me an email (I think you have it - Francesca in the Lovestone lab!), I’m happy to compare settings etc to see if we can figure it out.


#6

Hi,

Both 2.2 and 3.0 should definitely Multithread. The one thing I noticed was that you had 21(!) CellProfiler processes open in addition to Matlab and other things- in your screenshot- is it possible you had only one CPU available? Can you reproduce the same behavior on a machine after a fresh reboot with nothing else running?


#7

Hi Francesca!

Yes, I see that Cellprofiler on your PC is using all cores! However, also it looks like that you have Windows 7? This might the be major difference, or are you using windows 10? I would definitely be interested in discussing Cellprofilers settings sometime!

Bcimini,

Today, I closed everything and ran only one CP 3.0 with 16 workers. However, as you can see attached, it still runs one set of images at the time. CP 3.0 was given an affinity to all cores. I also included picture of settings, which shows that 16 workers are selected in CP. This time it says it was doing tasks with 33 CP workers, I have to google what exactly it this means, and if number corresponds to the programs which are running or what it is happening.

Bcimini, are you using windows 10, and CP on your system utilises all cores?

Thank you for your help guys!!!


#8

Yes, I’ve used CP on Windows 10 in the past and it definitely multithreads (after the first image set, which always single threads). I’ve seen this on multiple Windows 10 machines (even some virtual machines). It should work right out of the box.

Have you noticed this issue before with other programs on this machine? Are all the machines you’ve tried it on configured by your university/company’s IT department, could a security protocol or something else they’ve set up be causing it? Without being physically there to test your machine, I’m struggling to come up with other things to suggest to try…


#9

I’m now having this same issue on our linux cluster - I made 20 CPUs available for CP to use but it seems to just be running one worker… When it generates the batch file, would it take any notice of the max number of workers I have selected in Preferences? Or will this likely be something to do with the cluster configuration?
Thanks,
Francesca


#10

I create the batch file on my Windows PC (where it happily multithreads 3 workers) before submitting to the Linux cluster - could this cause the issue?


#11

CellProfiler headless does not behave the same way as CellProfiler in GUI- it will not automatically multithread, you need to dispatch jobs to each node individually. It doesn’t note the workers from Preferences. Sorry!


#12

Ah ok, that makes sense! Thanks.


#13

I finally found out a problem, or issue, why CellProfiler is not using multiple cores. I used the same pipeline on my laptop, also I tested it on PC with Ubuntu 18.04.1 - and on non of those systems CellProfiler was able to multi-thread. Thus I started removing each component from my pipeline, since maybe one of it blocks CP of using multiple cores. Surprisingly, I found out that if pipeline has modules “Correct Illumination Module”/“Applying Illumination correction”, CellProfiler is not using multiple cores and run only one worker (throughout all image set). However, the same pipeline, without these two modules, able to run multiple workers. Thus I am wondering if this is known and are there any solutions to make CellProfiler analyse multiple sets of images, if modules “Correct Illumination Module”/“Applying Illumination correction” are in the pipeline.

I noticed that for new 3.1.5 CellProfiler Java is not required and there is no “java window”, thus I am wondering is it possible to see what steps are carried out at exact moment in 3.1.5 CellProfiler?

Last question, I saw somewhere on a forum, that if images are grouped, each group of images are being processed separately/in parallel/as multiple workers. However, I am struggling to set them in the separate groups, as I understand that it should be based on extraction on meta data and regular expression. How a regular expression should look like if I divide all images into separate folders and I would like that each folder would be separate group analysed separately?

Thank you for the information and advice!