Cellprofiler in Linux 16.04

Hi,

I have gotten cellprofiler 3.1.8 successfully running in the cloud cluster. However, it seems that only one VCPU was activated. Does anyone know how to set this up?

Cheers,
Peter Lu

Hi,
I have followed the steps of setting the batches:
http://cellprofiler-manual.s3.amazonaws.com/CellProfiler-3.0.0/help/other_batch.html

However, I come across an error of “(worker) Memory error”?
29

I got 960 image sets, each of which has 5 images for each channel, size is 2757*2218.

The error would pop up when finishing around 20 image sets.
However, the pipeline runs well on my PC, though it takes 5 days to finish 960 image sets.

Cheers,
P L

Hi Peter,

Referring to the CP docs how are you doing this part? “A full image set would then need a script that calls CellProfiler with these options with sequential image set numbers, e.g, 1-50, 51-100, etc to submit each as an individual job.”

I don’t think you are submitting these as individual jobs. If so you would have one processor going per job.
-John

Hi John,

I think we have to submit groups of job separately. What I was doing below:
cellprofiler -p Batch_data.h5 -c -r -f 1 -l 60 -o batch1.out
cellprofiler -p Batch_data.h5 -c -r -f 61 -l 120 -o batch2.out

Have a look at this:
https://portal.biohpc.swmed.edu/content/guides/cellprofiler-biohpc/

Cheers,
P L

That’s a helpful link. So you submitted two batches and in the second top screen shot there are two processes going correct? Smaller batch size and more submissions is probably the answer to running it faster and solving the memory problem. Batch size smaller then 20 sounds the place to start… but maybe I’m not getting your problem.

I have script that calls cp for one set of images at a time. That way cp starts and stops and memory is cleared for the next job. The script refers to a list, basically the same as a batch file. If I want many processors going at once I break the list into pieces and start many instances of my script (via new terminal windows) each with its own list to work from. -John

Ps that colocalization module takes forever. I stopped using it when my images (slides) got bigger then a single field from an average cmos camera.

Ah, sry. I just pasted two lines of my code. Actually, I submitted 16 batches, each batch has 60 image sets. So you suggest I decrease image sets under 20?
It seems that two processes going correct in my second screenshot, but not. When we had error running cellprofiler, it will continue running at the background I think.

Right, if batches are failing around 20 images then definitely don’t run ones bigger then that. Much easier to deal that way then figure out memory errors.

I haven’t used this function of cp. When you submit a batch via the command prompt does the job start and does the prompt go back to normal ready to receive another command? Or does the job have to finish before another command? If it’s the latter do you open another shell and submit the next batch? Seems like each batch should be another process. What happens if you submit one batch then open another shell and submit the second one?

I was using screen function for each command:
https://help.ubuntu.com/community/Screen
Very similar to open another shell, but you’re able to detach the screen.

Cheers,
PL

Yes, fancier then the ‘nohup’ command I’m using. I’ll get there eventually…

So you are submitting these batch commands as separate processes, but all the batches are being run sequentially in one process?

I think each batch command would activate only one of the VCPUs. Now I am doing a smaller test for the error, as you can see, I only activate 8 VCPUs in parallel.

Ok, that makes sense. Am I reading this right; this system has 32GB of ram? To solve your memory problem you may have to also not start too many batches in addition to not having too many images in each batch.

It seems does not work.
Even I just run 8 VCPUs, the memory error would pop up when it finished 18 image sets.
11.
35

Ok, I’ll try a small batches with smaller image sets.

Thanks!

Cheers,
PL

Each batch you started ran only about two images sets? Right, I think you may need to be less ambitious. Try one or two and see if the memory maxes out.

No, each batch still has 60 image sets. As I ran it with GUI, so I could see it stopped at 18th.
Yes, just try 3 batches with 5 image sets, will see how is the memory going.

And I don’t think that my image dataset is that large, really want to know how those people running more than 10,000 images.

Cheers,
P L

you need to have a cluster that can adapt to increased requirements for many images. the next step beyond what you’ve got going here is a cluster with a proper job queue and the ability to start instances with their own memory/processors as needed. Not be limited to the 32GB of the system you are using. Yes you might have many ‘vcpu’'s, but you are memory limited.

Have you seen this? https://github.com/CellProfiler/Distributed-CellProfiler/wiki

also this: https://www.apeer.com/home

It’s a lot more complexity and cost, but it’s the way to do what I’m guessing you want.