CellProfiler: Error detected during run of module IdentifySe


We ran CellProfiler 2.1.1 in batch mode (using generated in GUI mode, Batch_data.h5 file) on a Linux (CentOS 6.5) machine with more than 220 GB of RAM. Below is the error log we see after 128m 54.783s.

CellProfiler: Version: 2014-11-04T15:34:33 2.1.1 / 20141104153433
Operating system: CentOS release 6.5 (Final); Linux bc112 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

===== Error Log ====
Running job cp on bc112
time /projects/mikem/cellprofiler/bin/python /projects/mikem/cellprofiler/src/CellProfiler/CellProfiler.py --jvm-heap-size=4g -p /projects/mikem/test1/output/Batch_data.h5 -c -r -f 1 -l 1
Version: 2014-11-04T15:34:33 2.1.1 / 20141104153433
Could not load cellprofiler.modules.ilastik_pixel_classification
Traceback (most recent call last):
File “/projects/mikem/cellprofiler/src/CellProfiler/cellprofiler/modules/init.py”, line 301, in add_module
m = import(mod, globals(), locals(), ‘all’], 0)
ImportError: No module named ilastik_pixel_classification
could not load these modules: cellprofiler.modules.ilastik_pixel_classification
Times reported are CPU times for each module, not wall-clock time
Thu May 14 15:36:48 2015: Image # 1, module LoadImages # 1: 86.34 sec
Thu May 14 15:37:40 2015: Image # 1, module IdentifyPrimaryObjects # 2: 6067.58 sec
Error detected during run of module IdentifySecondaryObjects
Traceback (most recent call last):
File “/projects/mikem/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py”, line 1799, in run_with_yield
File “/projects/mikem/cellprofiler/src/CellProfiler/cellprofiler/modules/identifysecondaryobjects.py”, line 683, in run
File “/projects/mikem/cellprofiler/src/CellProfiler/cellprofiler/objects.py”, line 413, in relate_children
histogram = self.histogram_from_labels(self.segmented, children.segmented)
File “/projects/mikem/cellprofiler/src/CellProfiler/cellprofiler/objects.py”, line 480, in histogram_from_labels
shape=(parent_count + 1, child_count + 1)).toarray()
File “/projects/mikem/cellprofiler/lib/python2.7/site-packages/scipy/sparse/coo.py”, line 239, in toarray
B = self._process_toarray_args(order, out)
File “/projects/mikem/cellprofiler/lib/python2.7/site-packages/scipy/sparse/base.py”, line 699, in _process_toarray_args
return np.zeros(self.shape, dtype=self.dtype, order=order)
Thu May 14 17:18:39 2015: Image # 1, module IdentifySecondaryObjects # 3: 1607.79 sec
Exiting the JVM monitor thread

real 128m54.783s
user 95m51.086s
sys 33m38.668s

Note, we have reserved/allocated 220 GB for this run (see cp_run.sh job submission script below in the P.S. section).

Your any feedback at your earliest convenience is greatly appreciated.


=== cp_run.sh ===

# -cwd # -S /bin/sh
# -l h_rt=400:00:00 # -j y
# -N cp # -pe thread 1
# -l bigbox # -l h_vmem=220G

echo “Running job $JOB_NAME on $HOSTNAME”

export LD_LIBRARY_PATH=$prefix/jdk/jre/lib/amd64/server
export LD_LIBRARY_PATH=$prefix/lib/wxPython-$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$base/usr/lib64:$LD_LIBRARY_PATH
export PATH=$prefix/bin:$prefix/jdk/bin:$PATH
APP="$prefix/bin/python $prefix/src/CellProfiler/CellProfiler.py"


echo “time $APP --jvm-heap-size=4g -p $DOF/Batch_data.h5 -c -r -f 1 -l 1”

time $APP --jvm-heap-size=4g -p $DOF/Batch_data.h5 -c -r -f 1 -l 1


Based on the long run-time for IdentifyPrimaryObjects (> 1.5 hrs), I’m guessing that an excessive number of primary objects were generated, which then crushed IdentifySecondary. A few questions:

  • How big are your images?
  • How many objects are you expecting?
  • Would you be willing to post the 1st image processed by this pipeline?


I believe this user addressed their questions offline to me (please reply here if this is a different issue).

It was solved by a few iterative fixes:

Then SaveImages gave them an error (Error detected during run of module SaveImages;MemoryError:):

[quote]Error detected during run of module SaveImages
MemoryError: Failed to allocate byte array of size -1467334261[/quote]

which our developer replied with:

and then

[quote]I’ve confirmed your problem. Unfortunately, I’m pretty sure you’re running into a limitation of Java which is that it can’t allocate arrays larger than 2 gigabytes of memory, even if you have much more memory available. We allocate memory for the uncompressed image which might be several times larger than the compressed version on disk and, if in color, will require three times the amount of memory of an uncompressed black and white image. I think you may have to resort to one of several work-arounds:

  • Remove the SaveImages module (if you can get by without saving the image)
  • Use the Resize module to rescale the image to 1/2 the resolution (a resizing factor of .5).
  • Use the Crop module twice to crop the image into the upper and lower or right and left halves (choose Rectangle for shape, Coordinates for method and then select “from edge” for the positions and then enter a midpoint e.g. 30,000 for a 60,000 x 60, 000 image as either a start or end position) and use two copies of SaveImages to save each half.
  • If images are in color - convert and save them in grayscale. You can use ColorToGray to save the red, green and blue components separately.

Processing images in tiles (in your case, breaking an image into an NxM grid of smaller images and running a pipeline on each of them in turn) is on CellProfiler’s roadmap, but a full solution is probably about a year away. The BioFormats library can write images in tiled pieces - it might be possible to devise a work-around before the full solution is available, but that would be more than a few hours of work and may have to wait a bit. We do have scripts that we run here to read large images such as yours and write out tiles that can be processed separately - if that would help, I can send you the script.[/quote]

(btw the tiling script lives here: broadinstitute.org/~leek/tileimg … encies.jar)