GC Overhead Limit Exceeded and Java Heapspace Memory Limit?

CP team: I have been getting these errors when loading large image sets. Is this common? See below:
Traceback (most recent call last):
File “cellprofiler\gui\cornerbuttonmixin.pyc”, line 124, in on_corner_left_mouse_up
File “cellprofiler\gui\moduleview.pyc”, line 3875, in on_corner_button_clicked
File “cellprofiler\modules\metadata.pyc”, line 911, in update_table
File “cellprofiler\pipeline.pyc”, line 2637, in get_image_plane_details
File “cellprofiler\pipeline.pyc”, line 2603, in __prepare_run_module
File “cellprofiler\modules\metadata.pyc”, line 732, in prepare_run
File “cellprofiler\utilities\jutil.pyc”, line 785, in call
File “cellprofiler\utilities\jutil.pyc”, line 762, in fn
JavaException: Java heap space

Traceback (most recent call last):
File “cellprofiler\pipeline.pyc”, line 2094, in prepare_run
File “cellprofiler\modules\metadata.pyc”, line 732, in prepare_run
File “cellprofiler\utilities\jutil.pyc”, line 785, in call
File “cellprofiler\utilities\jutil.pyc”, line 762, in fn
JavaException: GC overhead limit exceeded

Seems to be related to Metadata extraction and image tracking in Metadata and Names and Types dialog. I also use groups to segregate data by plate. Currently the screens are about 12 384well plates with 9 images captured per well and in 2 colors. Is this too much for CP too handle? If so is there a workaround? Thanks for the help

UPDATE1: After checking improvements to CP 2.1.1 I see that you have added a preference for Java Memory Usage when loading and processing images. I have increased the Java Memory usage and this does seem to alleviate some creeping RAM usage when processing one plate (~6800 images). I have commented under CP Bugs> Usability Improvements>, that batch processing of images delineated by folders might help resolve the image processing efficiency and memory saturation problems when running on a single machine.

UPDATE2: I have reviewed your FAQ on memory management issues with large images/image stacks. I have implemented some of these measures but I cannot get more than one plate to sort to completion under groups. I will review your comments on Batch processing to see if I can get this to work and report back.

UDATE3: It seems batch processing is not a point and shoot process on a single machine. I have not met with much success in getting this to work. It seems I can only process 1 plate at a time using the CP 2.1.1, otherwise the process overwhelms the software/hardware. Will there be future resolution of these limitations in CP? Do you have any ideas that can alleviate this problem in the near term?

Again, Thanks for all of your help, Paul

Hi Paul,

Sorry for your troubles, and it’s been a little busy here, so sorry for the slow reply.

We highly suggest using LoadData as the input method for large screens or image sets. I’m not sure, are you? With LoadData, you control the input explicitly, including the file locations and also their metadata. One way to construct the initial file is to use the Images module to drag all your image folders in, wait for them to all load, setup the Input Modules as you see fit including Groups, and then File> Export > Image Set Listing. Now that CSV will be you saved input for LoadData when you run your analysis.

I will send along your comments/issues to our software engineer to see if he has any suggestions – thanks for reporting!

Hi Paul,
Could you post the pipeline? David is right about LoadData - if there’s any possibility of using LoadData on a screen like this, you should do it. It’s worth learning how to use CreateBatchFiles because that limits the startup time if you run several jobs headless to process your analysis (and even with LoadData, there’s a lot of time spent compiling the image set list before breaking it up into pieces).

For a screen with 80K images, you’re at a scale where you want to run on a cluster. You can find helpful strategies here:

github.com/CellProfiler/CellPro … nvironment

At the Broad, we run similarly-sized screens and generally use LoadData + CreateBatchFiles + ExportToDatabase with a MySQL database, but an alternative strategy is to partition your jobs using many small .csv files for LoadData image input, the ExportToSpreadsheet module for measurement output. A third strategy is to use the input modules and specify the file list on the command-line with the new (only in the trunk build) command-line switch “–file-list”. In that case, you’d specify one file list per job.