CellProfiler JVM heap size problem with headless batch processing

Hi there -

I am encountering a weird jvm memory issue in CellProfiler 3.1.9:

I want to process a dataset headless on a single Linux machine (using a single docker container, 128 threads, 1TB RAM). This works flawless by calling the processes like:

cellprofiler -c -r -p results_benchmark/Batch_data.h5 -f 1 -l 50
cellprofiler -c -r -p results_benchmark/Batch_data.h5 -f 51 -l 100
.
.
.

This gives me about 60 Threads. If I now want to distribute further by making the batch size smaller some jobs get killed with the following Java error:

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (gcTaskThread.cpp:48), pid=308, tid=0x00007ffafb345700
#
# JRE version:  (8.0_242-b08) (build )
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /data/core or core.308
#

---------------  T H R E A D  ---------------

Current thread (0x00007ffb0800b000):  JavaThread "Unknown thread" [_thread_in_vm, id=20321, stack(0x00007ffafab46000,0x00007ffafb346000)]

Stack: [0x00007ffafab46000,0x00007ffafb346000],  sp=0x00007ffafb343cd0,  free space=8183k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xabba52]
V  [libjvm.so+0x4de2e7]
V  [libjvm.so+0x5c160f]
V  [libjvm.so+0x5c079d]
V  [libjvm.so+0x91d7f1]
V  [libjvm.so+0xa7de5a]
V  [libjvm.so+0xa7e155]
V  [libjvm.so+0x62b7bf]
V  [libjvm.so+0xa61b23]
V  [libjvm.so+0x6aeb11]  JNI_CreateJavaVM+0x61
C  [_javabridge.so+0x5c29a]  CreateJavaVM+0x2b
C  [_javabridge.so+0x18147]  __pyx_pf_11_javabridge_5JB_VM_4create+0x10c4
C  [_javabridge.so+0x17079]  __pyx_pw_11_javabridge_5JB_VM_5create+0x2b
C  [python+0xf9f45]  PyEval_EvalFrameEx+0x5645


---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )

Other Threads:

=>0x00007ffb0800b000 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=20321, stack(0x00007ffafab46000,0x00007ffafb346000)]

VM state:not at safepoint (not fully initialized)

VM Mutex/Monitor currently owned by a thread: None

heap address: 0x00000000e0000000, size: 512 MB, Compressed Oops mode: Non-zero based:0x00000000dffff000

Shouldn’t the JVM demand become smaller with smaller batch sizes? Interestingly if I do not submit all 60+ jobs, a small amount of jobs (with small batch sizes) gets executed w/o problems. So far I was assuming that each command creates its own JVM.

The default heap size is set to 512MB and with this the cutoff for ‘crashes’ is about 64 threads which would equal 32GB system memory - still way less than what I have available. But since it seems to be at 32GB I am wondering if there is another global setting I am missing here.

Thanks for any hints that help me to understand how the Java memory is handled on a single machine when submitting multiple cellprofiler jobs.

Tobias

Are these jobs dying right at the beginning, partway through the pipeline, near the end of the pipeline, etc?

If it’s not right at the beginning, and there’s, say, a memory intensive step that takes 16GB of memory, at >60 threads (especially if they’re all synchronized) would start crashing into each other and using up the memory.

If 32GB were indeed somehow a magic number, you could try to avoid it by rather than having one small docker subdividing into smaller dockers.