Plate grouping inactivates workers

Hi,

I’ve discovered a bug in the Cellprofiler 2.1.0 (rel 0c7fb94) software. If I analyse just one plate of data (1536 images), everything works fine - cellprofiler spawns several workers that each do their work. However, if I import several plates at the same time (using the exact same analysis profile) and use grouping to group each plate based on the metadata, Cellprofiler spawns the correct number of workers but only utilises one worker per plate. I.e. If I try to analyse two plates at the same time, I’ll have twelve workers active, but only two of them doing anything. I can see in the log output that the workers are there and they are listening for input but they never seem to receive any work to do. The IDs of the active workers seem random, i.e. sometimes it’s worker0 and worker5 that are active, sometimes others.

Bug seen under Windows 7 x64, Windows Server 2008 R2 x64, and Windows 8.1 x64.

cheers,
Karl

Hi Karl,

I’m having trouble replicating the problem you’re reporting. I’ve given it a try on the one example pipeline we have which enables groups, and it has the same # of workers as # of groups. Would you care to post your pipeline plus a couple of sample images so we can give it a try? I’d probably only need 1 or 2 images per group, enough to repeat the issue…

Regards,
-Mark

Hi Mark,

Thanks for your reply. Here’s a basic pipeline that shows the problem, at least on my Windows computer. I’ve included images from two of the image sets. If I turn off the grouping feature in the pipeline, all the workers are utilised, but if grouping is on, only two workers are used.

cheers,
Karl
test.zip (3.04 MB)

Hi Mark,

Reading your reply again, I realise I might have misunderstood how the grouping feature works. Is it actually supposed to allocate only one worker per group? I thought each group was processed independently, however each group would be processed by several workers.

cheers,
Karl

Each group is indeed processed independently, but one worker is allocated per group.

This is because grouping is intended for use in cases when information from one image cycle is carried over to the next for some reason, e., aggregate illumination correction, object tracking, etc. Since workers don’t communicate with each other as part of the image analysis, a single dedicated worker is needed for each group since it will be doing all the processing on its own.

So it makes sense that for your 2-plate grouping, only 2 workers are active; otherwise it defaults to the number of processors you have (which you can change under preferences, BTW). Your initial post sounded to me as if more than two workers were available, but only two were active…?

If the documentation would have been more helpful, could you mention where it needs clarification?
-Mark

Hi Mark,

Ok, so it’s not a bug - it’s a feature! :smiley:

Yes, the confusing thing was that Cellprofiler still launched 12 workers but 10 of them were sitting inactive, just listening for input from the main thread.

Perhaps just a note in the module notes about it would be handy?

Thanks for clearing it up,
Karl

Just talked to our lead software engineer. You are exactly right: the other ones are just inactive. The logic for worker creation under the hood is not that smart just yet :smile:

As an aside: The default is to set the max # of workers to the # of processors your computer has. Do you have 12 processors? For non-grouping work, that’s great, but if you really need to reduce overhead for grouping work, you can decrease the max # of workers in the preferences.
-Mark

Hi Mark,

Yes, it’s running on a computer with 12 processors, so the analysis is really fast 8) (now that I have disabled the grouping feature…) . Memory overhead hasn’t really been a problem, as the inactive workers only use about 50 mb of RAM each.

I really appreciate how you guys have enabled the simple way of multicore processing in the new version of CellProfiler. Great work, thanks for that!

cheers,
Karl