Accumulation of threads when classifying many images

Hello!
I am running beanshell script to classify lots of relatively small images (8-bit, ~600x600px). After processing 5-6 of them ImageJ-linux64 dies with exception:
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: unable to create new native thread
There is a free memory when it happens but if I monitor number of threads I get this:

while (true) do ps H | grep -c ImageJ-linux64; sleep 3; done
1 (no ImageJ yet)
24 (ImageJ started)
102 (classifier is loaded)
1798 (first image being classified)
3549 (second image being classified)
5200 (3rd)
7077 (4th)
8837 (5th)

and then it ImageJ dies on 6th image.
So I wonder if this accumulation of threads is supposed to happen. And how should I change my script to avoid exceptions. Beanshell script and command to execute it are below.
I would be really happy if someone can clarify this for me. Maybe, @iarganda, if you have time to look into this, since you are the creator and maintainer of this plugin. Thanks in advance!

until ~/.local/Fiji.app/ImageJ-linux64 --allow-multiple --headless --ij2 --console --run /scratch/data/tmp.bsh ; do echo -e "\nRETRYING ON EXCEPTION\n"; done

inDir="/scratch/data/in/";
outDir="/scratch/data/out/";
modelPath = "/scratch/data/hbw-teaching.model"; 
import trainableSegmentation.WekaSegmentation;
import trainableSegmentation.utils.Utils;
import ij.io.FileSaver;
import ij.IJ;
import ij.ImagePlus;
inputDir = new File(inDir);
outputDir = new File(outDir);
segmentator = new WekaSegmentation();
segmentator.loadClassifier(modelPath);
listOfFiles = inputDir.listFiles();
Arrays.sort(listOfFiles);
for ( i = 0; i < listOfFiles.length; i++ ) {
	outputFile = outputDir.getPath() + File.separator + listOfFiles[ i ].getName() + "-segmented.tif";
	if( (listOfFiles[ i ].isFile()) && (!listOfFiles[ i ].getName().endsWith("-segmented.tif")) && (!new File(outputFile).exists()) ) {
		image = IJ.openImage( listOfFiles[i].getCanonicalPath() );
		if( image != null ) {
			try {
				result = segmentator.applyClassifier( image, 0, false );
				result.setLut( Utils.getGoldenAngleLUT() );
				new FileSaver( result ).saveAsTiff( outputFile );
				result = null;
			} catch (e) {
				System.exit(1);
			}
		}
		image = null;
		System.gc(); 
	}
}
segmentator = null;
if (Boolean.valueOf(System.getProperty("java.awt.headless")))
	System.exit(0);

It seems surprising to me too. After each classification the garbage collector is called, so it should get rid of the unused threads (I believe). Although not very elegant, can you try duplicating the call to System.gc()?

Hi and thanks for help.
So now instead of single System.gc(); line i tried:

System.gc();
Thread.sleep(3000);
System.gc();

and it still doesn’t clean its threads and dies as before

Hi,
an update - I ran it on a new PC with 64GB of RAM, and it throws exception as before, so it’s not a memory issue for sure. OS is Ubuntu 18.04.01. Output of ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 256729
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 256729
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I’d be very happy if someone can figure what causes this exception

@iarganda, since I have to fix it whatever it takes I’m bumping this post again - the threads which did not terminate are called “pool-n-thread-m” where n=1…222 and m=1…8 (due to 8-core CPU I guess), .isAlive()==true, .getState()=="WAITING". I tried to call .interrupt() and .stop() for each of them, but it of course didn’t work.
This might be something related not to Weka, but to Ubuntu-specific HotSpot JVM issues. I could terminate the script after processing 3-4 images before it throws the exception, but maybe it’s due to some bug, so would be nice to figure out what’s going on instead of using dirty workarounds.

I agree. In principle TWS uses multithreading for creating the image features, running the default random forest classifier, and classifying new images once the features have been created. All of that is done using an ExecutorService, so there is no much room for debugging…

That being said, you can send me your images and classifier and I can give it a try on my machine (64GB of RAM and Ubuntu 18.04).

By the way, have you tried using the tiling script?

Then we might have very similar setups. So, here’s the folder with sample images, model, beanshell and shell scripts. Thanks for helping. But if you’d be able to confirm this - get ready to issues with many other programs that are running in the background - some of them will try to create threads when the pool is exhausted and thus terminate unexpectedly.

I do my own tiling because original image is huge and contains only ~20% of meaningful pixels. I extracts only those parts and then segment them.

OK, I can reproduce the error. I believe it has to do with the number of features you used. Let me back to you as soon as I find a solution.

@iarganda and I have identified the issue and will resolve it.

On the way, we also found a possible bug in net.imglib2.algorithm.fft2.FFTConvolution: when it creates an ExecutorService on its own, it seems that it doesn’t call shutdown() on it.

1 Like

See the issue here on FFTConvolution not shutting down its threads:

2 Likes

Guys, many thanks for fixing this!

@oleksii-nikolaienko, until a solution is released, you can avoid the problem by not using the Gabor features. Those are the only ones calling net.imglib2.algorithm.fft2.FFTConvolution.

@iarganda this if fine, I can wait. Now I’m just killing and restarting the whole thing after analysing 4-5 images. Inefficient, but works, and more importantly - problem is known and it’s just a matter of time to fix it. Thanks a lot for help.

The pull request for the imglib2-algorithms-gpl is here: https://github.com/imglib/imglib2-algorithm-gpl/pull/7

Now if only @StephanPreibisch or @tpietzsch would endorse it, it’d be merged in a heartbeat.

In the meanwhile, if you’d like to try the fix, do:

$ git clone https://github.com/imglib/imglib2-algorithm-gpl.git
$ cd imglib2-algorithm-gpl
$ git checkout -t origin/terminate-FFTConvolution-threads
$ mvn -Dimagej.app.directory=/path/to/Fiji.app/ clean install
$ cd ..
$ git clone https://github.com/fiji/Trainable_Segmentation.git
$ cd Trainable_Segmentation
$ git checkout -t origin/fix-fftconvolution-bug
$ mvn -Dimagej.app.directory=/path/to/Fiji.app/ clean install

Note that you’ll have to edit the path to the Fiji.app above, twice.
Then restart Fiji.

Hello @albertcardona and @oleksii-nikolaienko again,

I have found the way to fix the bug from the Trainable Weka Segmentation and released a new version of the plugin that should work now.

Thanks a lot for your help!

Hi,
works perfectly now, thank you!