Slow ilastik batch processing, best practice?

Hi there,

A question about batch processing speed and what the best practice is.

I’m working on 4D stacks from 10 to 30GB. I’ve trained an ilastik pixel classifier on a 30GB .h5 file and exporting the probabilities took many hours. I’ve been trying to batch process other datasets, either by using the GUI or the Fiji plugging but both are taking hours to run. It seems like it’s also only using one CPU core, at 100%.

Is this normal? What’s the best/fastest way to batch process these datasets? Is using the headless mode any faster and are there ways to parallelize the process?

I’ve looked up the ilastik documentation on Controlling CPU and RAM resources and I was wondering if changing the LAZYFLOW_THREADS and LAZYFLOW_TOTAL_RAM_MB commands would help?

Thank you very much!
Anjalie Schlaeppi

Hi @Anjalie,

30GB is some serious data. I would expect the fiji plugin to be slower than ilastik in gui mode. It writes temporary files for input and output (so with your data it will also consume quite the disk space, temporarily). Headless will be most efficient, however it will not give you dramatic improvements. It’s just more convenient for automation.

Per default ilastik already tries to use up all ram and cpu resources (usually up to 8 threads). Those parameters are often adjusted to smaller values if you want to run multiple ilastik instances in parallel (makes sense if you have a lot of small images).

I would definitely expect more than one core being active. Just a question about your data - I assume 4D meaning 3D + time? How large are the 3D bits in this case?
How did you measure CPU utilization (or, which operating system are you on?)

Cheers
Dominik

1 Like

Dear @k-dominik ,

Thank you for your help! I’m indeed working on 3D + time, each 3D is 120MB. I’m running ilastik on a linux server, I have 128GB of RAM available but I can push it to 256GB. I used the htop command to check cpu utilisation.

I checked again after a reset, it seems ilastik is now indeed using ~8 threads. We have 10 times that amount available, does it make sense to push ilastik to use more?

One solution could be to crop a bit more. But if I’ve trained on 1024x512 images, can I load other dimension images for batch processing?

I was also wondering if for speed, it changed anything if I used Tiffs or H5 files?

Thank you for your help!

Anjalie

Hi @Anjalie,

cropping is not a problem (well almost not. You have to make sure your the filters you’ve chosen still fit inside the image so if the largest sigma is 10, you need to have at least 35 pixels in any spacial dimension).

It can be pushed a bit, but that doesn’t necessarily result in faster processing. If you have multiple datasest you can scale up by running multiple ilastik’s at once, each running a different dataset. Depending on your scripting preference I’d suggest either Python or a little bash script. There is also the possibility to run with multiple ilastiks on a single dataset, but that’s more involved and does involve a final step to reassemble the data…

You should definitely feel a speed-up with h5 files. It allows us to read 3D blocks efficiently from the file whereas this is far less efficient with tiffs.

In case you run into problems setting up the parallelization of different jobs, I’m happy to help.