Different errors occurring on 2 out of 3 computer running same pipeline

Hello!

First, thanks so much for the awesome software and all of the work you guys put into CP! I’ve been using CP for all of my image analysis needs for the past 5 years, and everyone I introduce to it absolutely love it! I wasn’t sure if this should be posted in Bugs or Help, so I figured I’d start here.

Summary: My PhD work has involved developing a high throughput pipeline for studying stem cell responses to a variety of stimuli, where the output is grey scale fluorescence images of the cell populations living on top of small (~750 um diameter) pillars. I have set up 3 computers (2 desktops (here called “old” and “new”) and 1 laptop (“rog”)) installed with CP 2.1.1 that I am trying to use to process my image data using the attached pipeline. This same pipeline is being used on all 3.

While everything runs fine on “new”, I’ve been having trouble with “old” and “rog”. After trying to manually troubleshoot, googling, searching here on the forum, and reading the FAQ’s, I still have not been able to resolve the problems. Here’s my best shot at an in-depth description:

“old” - Running the pipeline ends up generating a “(Worker) ValueError: total size of new array must be unchanged” dialog (shown in old-error.png). It doesn’t crash the program, but while I can skip and continue processing, it ends up having to skip most of the remaining images.

“rog” - Running the pipeline generates a “Assertion failed: ok (…\src\mailbox.cpp:84)” in the terminal and crashes the program.

This is all very perplexing because, as mentioned, it works perfectly fine on “new”. Any insight you might be able to provide would be greatly appreciated, as I’ve got nearly 2 TB of data to process in the next few weeks as I try to wrap up my dissertation. Thanks so much once again for your time!

Always,
Sean

Attachments:
Pipeline -
MegaQuantProduction.cpproj (82.4 KB)

PC Specs -
old


new

rog

Error Screenshots -
old


rog

Sample Image 1 - c1


Sample Image 1 - c2

Sample Image 1 - c3

Sample Image 2 - c1

Sample Image 2 - c2

Sample Image 3 - c3

Hi Sean,

It’s hard to tell what the various issues are. Are they repeatable? Do the errors/warnings occur at the same cycle/image? Maybe one image is a different size in pixels than the others, i.e. corrupted?

In any case, if you have that much data I would suggest:

Hope that helps!

1 Like

Thanks so much for the message, David!

They are repeatable in that I can’t complete an entire run, but they do seem to differ a bit each time as to when things go afoul. I don’t think it’s a problem with the image data, as I’ve tried running separate sets that have worked fine on “new” and they run into the same sort of problems.

Downloading the beta now, and will also try the headless method. Thanks so much once again!

1 Like

Installed the beta on both “old” and “rog”. “old” gave me the same error as before (array size must be unchanged) whereas “rog” gave something different, but I think still related

Assertion failed: Connection reset by peer (bundled\zeromq\src\signaler.cpp:298)

When I had looked up the mailbox.cpp:84 error it was giving earlier, zeromq had been mentioned in the few posts (for other software giving a similar error) that I was able to find.

I’ll see if I can get anything to work better if I instead try to run it headless.

Hi again,

Zeromq is used for multi-core processing. The only unusual aspects of your pipeline are (a) that it has a lot of measurements – not many modules, but lots of combinations of measurements within each module, and (b) you have no Export module but only save the h5 output.

I think it is likely that you are just getting random issues overtaxing your machine, whether it be RAM or # of processes concurrently running, or both. You can limit the numbers of cores used simultaneously in CellProfiler menu -> Preferences, and I would suggest cutting them down from the max number of cores in your machine. This may slow down processing though. Or do headless and this is not an issue. You could also try not exporting the h5 file and use ExportToDatabase instead as there might be issues writing to the h5 file after awhile (just guessing though).

1 Like

Are there any special flags I need to pass for headless mode to get it to use all cores? I tried the following:

CellProfiler.exe cpOutput\test.h5 -c -p MegaQuantProduction.cpproj

It’s working so far, but it only seems to be utilizing one core at a time, becuase as I watch in the Command Prompt, I only am seeing one image number show up at a time. (see attached screenshot … I get the exception every time a new image is started, headless or not). When I run it with the GUI and watch the Terminal output, I see multiple images being processed at a time (i.e. there’ll be modules for Image #6, Image #3, Image #5, etc. all reporting intermingled with each other).

So I can confirm that running headless is more stable. I was never able to get beyond about 200 images on “rog” before having it crash out with a mailbox.cpp or zeromq assertion failure. Up to about 650 now. That said, it’s definitely only running one core. I’ve looked at the documentation and haven’t found anything saying I need specific flags to specify how many cores to run, but it’s definitely only running one at the moment (can tell by looking at the performance in task manager). Is there a way to try it headless with all 8? (or at least 4, or something?)

Let me know when you have the chance, and thanks so much!

Running on one core in headless mode is by design. You ought to be able to run multiple instances concurrently, one on each core. You can break up your batches using the -f and -l switches in order to send 1/8 of all your cycles to each core. E.g. if you have 64 sites imaged, then you could send 1-8 using ‘-f 1 -l 8’ to the first CP process, then ‘-f 9 -l 16’ to the second, etc.

In GUI mode it handles all the multiprocessing, but there is overhead memory needed for all the GUI and multi-threaded handling.

Does that help?

1 Like

Yes, this is perfect, and it actually seems to solve the issues I was having. Fingers crossed the run remains stable. Thanks so much!

1 Like