Ilastik not using all CPUs (machine dependent)

On ilastik 1.4.0b1, I’m benchmarking two computers using the same .ilp and dataset.

Server A (Win2012) has the .ilp and datafiles as local files in D: (same subfolder), which is a physical drive on Server A.
Workstation B (Win10) mounts D: from Server A as Y: Because the Y: mount point is not the same absolute path as the original D:, I had to refind the training image files when I open the ilp on Workstation B.

When I run ilastik on Workstation B, it uses all 12 cores (1CPU) on the workstation at almost full capacity.
When I run the same .ilp on Server A, it haphazardly uses cores from a single CPU (2 physical CPUs with 36 cores/CPU) and never to capacity.

Following the online manual, I created a .ilastikrc file in C:\Users\nieder\ on Server A with the following contents:


I’ve tested the ‘threads’ field with values ranging from 8 to 24, with no effect. My CPU usage stays around 10% and memory maxes out at 5.5GB (out of 32). Should I have some other setting to maximize the number of cores used on Server A?

Hi @nieder,

what kind of data are you processing (2D, 3D… approximate size) which workflow are you running, how much ram do the two machines have?

In general we have noticed that using a lot of cores doesn’t improve processing time linearly. The reason for this is yet to be found, but there are multiple hypotheses…
But we would always see full peak CPU usage. Having said that, I’ve never did any benchmarks on windows, and also not with more than 20 cores. For reference the linux benchmark

Server A (Win2012) has 32GB RAM, 2 physical CPUs with 36 cores/CPU.
Workstation B (Win10) has 16GB RAM, single physical CPU with 12 cores.

My dataset is 2D images (several thousand 2MByte 1024x1024 TIFFs) and I’m using the pixel classification workflow.

When running the identical .ilp (except for modifying the location of the training images due to mount point differences), workstation B was faster than server A in analyzing the same images.

Based on the results from that linux benchmark thread, I reduced the threads in .ilastikrc to 4. The CPU usage on Server A maxed out at 6% with only 4 cores showing any appreciable activity in Resource Monitor. Total memory usage peaked at 5.5GB as before (baseline memory usage is 4.8GB when server is idle). With threads in .ilastikrc set to 32, more cores were used according to Resource Monitor, but again, each active core was not fully utilized.

Time to do a single plate:
Workstation B (12 cores all working at full power): 1:15 hrs
Server A (4 cores not working at full power[see fig above]): 2:46 hrs
Server B (32 cores not working at full power): 1:48 hrs

So something in my workstation makes the cores get used at full capacity and uses all cores without any prompting.

1 Like

Thank you very much for the detailed report!

So I see. The very important bit here is the image size. Since you images are that small ilastik does not parallelize in these images (at all would be my guess) (at least not explicitly, some operations are still performed in parallel). In batch processing it will only process one image at a time. This is why you don’t see any parallelization. The problem is that the parallelization was optimized towards big images that have no chance at all to fit into ram.

as to why it is still faster on WorkstaionB… puzzling…

So, what could you do to overcome this? One way would be to start multiple ilastik instances and process a subset of the files in each instance.

Another, pretty hacky way would be to convert your images to a single time series. Then parallelization over time would kick in.

1 Like

I have seen a similar behavior. There is significant difference in CPU utilization going from 1CPU to multi-CPU servers.

also see this:

I am not sure if we can explain this behavior by image size, number of dimensions and memory availability. It would be great if @k-dominik and @ilastik_team can dive deeper into this issue because it really affecting performance on large servers.