Java Heap Space error during save Images

I work with rather large images (15000 x 38200) that I process with the help of CP quite nicely. Lately I’ve been getting errors when trying to save the resulting images in RGB. The error I get is Java heap Space error. I tried alleviating by increasing the heap size
…CellProfiler.exe" --jvm-heap-size=10g (or up to 40g) but no joy.

If I shrink the image or make it grayscale it works. I thought it may be a file format problem but it applies to JPG and TIFs (perhaps others as well, haven’t tried) Could you please point me in the right direction on this one?

Thanks!

Bolek

Hi Bolek,

You say this has been occurring “lately” in your post above. Prior to experiencing this problem, have you made any changes to your pipeline and/or changed to a different version of CP? If so, what version?

Regards,
-Mark

Hi Mark,

The only thing that’s new is the species of the tissue I’m looking at which is larger than the one I was using earlier - so it’s most likely the image size that’s the cause - if I run the pipeline on the same images but first reduce the size to half (even in CP) it’s fine. The previous tissue images I used were in this smaller size range as well (typically 15000x7500).

Perhaps worth mentioning - I’ve redone and tested the analysis with older versions as I was thinking that perhaps the 2.1 trunk was the cause (btw: love the performance of 2.1 trunk!) Sadly earlier 2.0 trunks and even 2.0 release had the same problem.

To ensure it’s not my computer (48GB RAM windows 7 Workstation) I also tried it on another 64GB workstation to no avail (also with a java heap of ~50g). Trying to run conserve memory to clear all the unnecessary files doesn’t work either.

I’m happy to try anything to try and debug this or even upload the pipeline/sample image (quite large). Could this be a fileformat issue of the save image? some sort of TIF or JPG filesize limit? I know TIF has a maximum resolution but this is probably not it since its fine as a monochrome but suddenly crumbles in RGB.

Thanks again.
Bolek

Hi Bolek,
I think your images are likely just barely over the size limit. Arrays in Java are addressed with a Java int which has a maximum value of two billion. You have images that are about 1/2 billion pixels and I think we need four bytes per pixel to store RGB. That pushes you just barely over the limit. It’s an inherent limitation in Java that can’t be changed by increasing the heap size.

The upcoming release won’t alleviate this, but we are planning features for the release after that one. CellProfiler 2.2 will likely have a tiling feature that will work well with your images by analyzing them in tiles. We can use Bio-Formats to write the images in tiles as well - this will allow us to write color images with your format. The standard TIF file format can only support 2GB image sizes, but BioFormats can write the BigTIFF format and overcome this limit. Maximum JPEG image dimensions are 64K x 64K. The PNG specification supports up to 2G x 2G, theoretically.

It might be wisest to use a strategy that breaks your images into tiles and then processes each tile separately for now - I’m sure you will start running into more pitfalls with these large images and, even with enough memory to run without disk thrashing, I would bet that CP will run the tiled set faster than the single image.

–Lee

HI Lee,

Thank you for your thoughtful response and taking the time to explain - While tiling the images may resolve the issue for CP - the anatomical structures Im looking for are often 100s of pixels in diameter and their size and abruptly chopping them in half would have a very detrimental impact on the accuracy of the data - perhaps if I left an area of overlap then somehow managed to merge the tiled data sets while removing the overlap it might jive but I’ll leave this as plan c.

For now I think I’ll continue loading/processing the images at the high resolution where I need the pixels for accurate illastik pixel classification & CP based object ID - then once the objects are defined just shrink everything 50% to make java output happy. Sure the precision of the XY positions or volumetric measurements will be slightly decreased but I think this is a fair tradeoff to the potential complications due to tiling.

I’m actually curious this is a novel problem as the increasing prevalence of slide scanners readily allows scanning of tissue sections that are quite large in dimensions and often 12-16bits - analyzing these with cell profiler is extremely effective (especially in combination with ilastik) and allows quantifying features that were typically either only qualitative or measured “by hand” - layering these stacks in software like Amira or just analyzing with Matlab gives a fantastic 3D “whole organ” view of whatever features you’re after. I realize it’s called “cellprofiler” not “organ profiler” but I just thought you should know how great your software is in the latter as well!

Thanks again,
Bolek

Thanks for the nice feedback Bolek! Your approach sounds eminently reasonable and hopefully we can streamline it in the future.

David