Pattern of export to database errors

Hello,

I have observed an error working with revision 9871 of Cellprofiler 2.0. We are using CP2.0 on a linux cluster typically with 96 image sets per run. We are running Cell Profiler headless with the following example command for each image set:
python-2.6.sh CellProfiler.py -p Batch_data.mat -c -r -b -f 1 -l 1
Each CP command runs independently, and we cat together the SQL_Object.CSV files to create a single file for output. The corresponding SQL_image.CSV file is created, but empty. The error occurs during a call to getmeasurement.py after the SQL_image.CSV file has been opened by the exporttodatabase module, but before output is written. The observed error only happens for jobs numbered 3-9, 15-21, 27-33, 39-45, etc… That makes 7 out of every 12 commands ( 3<=(n mod 12)<=9 ) in a systematic pattern that produce the following error:
Traceback (most recent call last):
File “/bio/tools/5.1/cellprofiler/9865/CellProfilerVirtualenv/CellProfiler/cellprofiler/pipeline.py”, line 1127, in post_run
module.post_run(workspace)
File “/bio/tools/5.1/cellprofiler/9865/CellProfilerVirtualenv/CellProfiler/cellprofiler/modules/exporttodatabase.py”, line 545, in post_run
self.write_data(workspace)
File “/bio/tools/5.1/cellprofiler/9865/CellProfilerVirtualenv/CellProfiler/cellprofiler/modules/exporttodatabase.py”, line 959, in write_data
value = measurements.get_measurement(cpmeas.IMAGE, feature, i)
File “/bio/tools/5.1/cellprofiler/9865/CellProfilerVirtualenv/CellProfiler/cellprofiler/measurements.py”, line 283, in get_measurement
return self.get_all_measurements(object_name,feature_name)[image_set_index]
File “/bio/tools/5.1/cellprofiler/9865/CellProfilerVirtualenv/CellProfiler/cellprofiler/measurements.py”, line 298, in get_all_measurements
assert self.__dictionary[object_name].has_key(feature_name),“No measurements for %s.%s”%(object_name,feature_name)
AssertionError: No measurements for Image.Metadata_Batch

The complete output is generated with no errors when running a full job (all images in a single batch) on the same systems. The same error has been observed when using batch size of 2 as well. We are able to reproduce the pattern when changing the order of images. The errors are not image-specific, but order-specific.

Chuck

Could you please post your pipeline, and one or two image sets (we can duplicate them to make a large enough test case).

Thanks,
Ray Jones

Hi, Thouis

Thanks for your reply. I have attached the pipeline. The image sets we use are 32MB each. I can compress it to 16MB, but this is still too large to upload. If you have a preferred method to transfer the files to you, let me know. Otherwise, I will come up with something and reply again by 5:00 pm today.

Chuck
production.cp (16.2 KB)

Does the bug happen if you scale down the images to something smaller? If that’s not easy to determine, feel free to post some scaled down images for us to test with. (If we can’t reproduce the bug with them, we might ask if it happens to you when they’re scaled down, as well.)

Hi, Ray

Here are addresses to download some image sets from the experiment that matches the pipeline already posted:
ittc.ku.edu/~chenry/WellA01.tar.gz
ittc.ku.edu/~chenry/WellA02.tar.gz

Also, I’ve taken those sets and resized them by 50% (along a linear dimension) and by 75% respectively. All the names of files are the same as in the previous two sets. Only the sizes of the .tif images have been changed.
ittc.ku.edu/~chenry/Wells50.tar.gz
ittc.ku.edu/~chenry/Wells75.tar.gz

Thanks for your help,
Chuck

I just got a chance to experiment with this. I believe the problem is that the regular expression for extracting metadata is not matching the failing files. I don’t have the full paths you’re using, so can’t see exactly how it’s failing to match. You could post your Batch_data.mat if you would like further help debugging, but I think you can probably figure it out using the regexp test mode and one of the failing paths.

The CP team discussed better ways of catching this error. Our current plan is to make LoadImages throw an error if the regexp doesn’t match, so CP will fail nearer to the problem in the pipeline. In the future, we might add code to detect this when the Batch files are created, rather than during runtime.

Hi, Ray

Thanks for your reply. I did find it. I can access all images by changing in the pipeline LoadImages settings the regexp string from
"Well (?P[A-P][0-12]{2})“
to
"Well (?P[A-P][0-9,10-12]{2})”

I had not noticed the regexp test under the LoadImages module in the GUI (the magnifying glass icon), but that was just what I needed to know to debug it. Thanks very much!
Chuck