Multithreaded CellProfiler on a cluster

As the next version of CP will support multiple threads; I’m wondering if cause problems for people running CP in command line mode on a cluster. On our cluster each “slot” is a single CPU on a node with 8-16 cores; if CP runs in multiple threaded mode by default this will cause us problems as nodes will become overloaded. For example if 12 CP jobs are launched on a node and each spawns 12 threads then this will mean there will be 144 threads which might overload the node.

Would it be possible for CP when launched from the command line to default to only using a single thread?

PS: multithreaded CellProfiler running on the desk top is fantastic and I’m looking forward to using it. :smile:

CellProfiler in headless mode is optimized to utilize a single core. The headless mode will start a few threads - these are needed to operate Bio-Formats across the Java bridge, but the overwhelming bulk of the processing is done by the single thread that runs the pipeline.

If you use ClassifyPixels (which uses the Ilastik pixel classifier), it will utilize all cores on each CellProfiler instance by default. Looking at their code, on Windows, defining the NUMBER_OF_PROCESSORS environment variable will limit the number of cores it uses and on Linux and the Mac, it uses the SC_NPROCESSORS_ONLN sysconf setting. When Ilastik spawns its workers, it checks the NUMBER_OF_PROCESSORS environment variable, independent of operating system so it should be possible to limit it to one by defining NUMBER_OF_PROCESSORS=1.

The upcoming multiprocessing branch will spawn multiple worker processes from the GUI, defaulting to the number of cores, but with a preference setting that will allow you to choose the number of worker processes to spawn. The headless mode will work the same for that release - optimized for a single core.

That’s great Lee, thank you for letting me know.