Errors Starting Batch Processing

Hi All,

I’ve recently been trying to port our CP pipelines onto our cluster system but have encountered a few problems breaking my images up in to batches. I can run all of my images on one core but this is obviously inefficient.

The first method I tried was following the --get-batch-commands switch. The return is:
CellProfiler -c -r -b -p ./Batch_data.h5 -g ImageNumber=X

The error from running this is:

Uncaught exception in CellProfiler.py
Traceback (most recent call last):
  File "/usr/cellprofiler/src/CellProfiler/CellProfiler.py", line 228, in main
    run_pipeline_headless(options, args)
  File "/usr/cellprofiler/src/CellProfiler/CellProfiler.py", line 717, in run_pipeline_headless
    initial_measurements = initial_measurements)
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1627, in run
    initial_measurements = measurements):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1737, in run_with_yield
    in group(workspace):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/pipeline.py", line 1655, in group
    ", ".join(grouping.keys()), ", ".join(keys)))
ValueError: The grouping keys specified on the command line (ImageNumber) must be the same as those defined by the modules in the pipeline ()

The second method I tried (and preferable one to me) was using the -f -l tags. The -f switch appears to work as seen in the example below.

[jcarlsonstevermer@opt-submit output]$ cellprofiler -c -r -p ./Batch_data.h5 -f 5
Version: 2014-01-24T15:02:55 2.1.0.Release / 20140124150255
Plugin directory doesn't point to valid folder: /mnt/ws/bionates/sahalab/home/jcarlsonstevermer/CellProfiler/output/plugins
Your pipeline was saved by a more recent version of CellProfiler (rev 6c2d896) but you are running CellProfiler rev 2.1.0.Release. Loading this pipeline may fail or have unpredictable results.
Batch file default output directory, "W:/CellProfiler/output", does not exist
Times reported are CPU times for each module, not wall-clock time
Mon Aug  4 09:45:55 2014: Image # 5, module Images # 1: 0.00 sec
Mon Aug  4 09:45:55 2014: Image # 5, module Metadata # 2: 0.00 sec
^C[jcarlsonstevermer@opt-submit output]$

However, when I add in the -l switch a new error is thrown.

[jcarlsonstevermer@opt-submit output]$ cellprofiler -c -r -p ./Batch_data.h5 -f 5 -l 10
Version: 2014-01-24T15:02:55 2.1.0.Release / 20140124150255
Plugin directory doesn't point to valid folder: /mnt/ws/bionates/sahalab/home/jcarlsonstevermer/CellProfiler/output/plugins
Failed during initial processing of /tmp/Cpmeasurements81t1yb.hdf5
Traceback (most recent call last):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__
    maxshape = (None, ))
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683)
  File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533)
TypeError: Object dtype dtype('object') has no native HDF5 equivalent
Error loading HDF5 ./Batch_data.h5
Traceback (most recent call last):
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 1703, in load_measurements
    image_numbers = image_numbers)
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/measurements.py", line 269, in __init__
    image_numbers=image_numbers)
  File "/usr/cellprofiler/src/CellProfiler/cellprofiler/utilities/hdf5_dict.py", line 293, in __init__
    maxshape = (None, ))
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/group.py", line 94, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "/usr/cellprofiler/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 76, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5t.pyx", line 1379, in h5py.h5t.py_create (h5py/h5t.c:12683)
  File "h5t.pyx", line 1451, in h5py.h5t.py_create (h5py/h5t.c:12533)
TypeError: Object dtype dtype('object') has no native HDF5 equivalent
So sorry. CellProfiler failed to remove the temporary file, /tmp/Cpmeasurements81t1yb.hdf5 and there it sits on your disk now.
Your pipeline was saved by a more recent version of CellProfiler (rev 6c2d896) but you are running CellProfiler rev 2.1.0.Release. Loading this pipeline may fail or have unpredictable results.

I’ve attached the pipeline I am using although I think this is an issue with the instantiation more than anything else. Any advice and/or workarounds you can think of would be greatly appreciated!

Cheers,
Jared
RBTest.cppipe (14.5 KB)

Hi Jared,
I’m not 100% sure, but it appears that this issue has come up before and was addressed here: Getting CP2 running a batch job on Amazon EC2. Might bringing in Lee’s fix or upgrading to 2.1.1 solve it for you?
-Mark