Ilastik best file format for pixel classification

Hi folks,
I’ve large tiff formats from stitched SEM images that I want to evaluate coherently. Now, after multiple segmentation attempts in FIJI, I came across Ilastik here on the forum. I think that Ilastik can solve my segmentation problems in a more precise way. However, my individual tiff formats are about 25 megabytes each. So I followed the Ilastik manuals and created an .h5 file for each image and open them with ilastik in a new project. While I’m training with my .h5 files (6-10 files, each about 45 mb) the program runs slowly or crashes partially. I am a beginner in image editing and wonder if compressed tiff formats could distort the results in the end?

Hi @hofmanpa,

I assume your data is 2D? What is roughly the size in pixels along each dimension?

You also inquire about compressed tiffs and whether those distort the analysis. Tiffs can be compressed using different methods (something like gzip, lzw, jpeg…). This depends. If compression is lossless (-> all data can be recovered) then there is no problem at all. Lossy compression, on the other hand, should in general never be used when you’re planning to analyse the data.

The hdf5 that you (I assume) generated with the ilastik import/export Plugin for Fiji saves data completely uncompressed per default (compression level=0). The compression algorithm we use is gzip, so it’s lossless compression.


@k-dominik yes, it’s a 2D data set. Every image has a size with 6868 x6868 pixels. As you mentioned I used the export/import plugin in fiji to generate hdf5 files. I was just wondering about the huge size of 45 mb for each image.

@k-dominik would you recommend to preprocess images with different background noise in fiji before training the pixel classifier in ilastik?
Some examples:

thanks for your help!

Hi @hofmanpa,

thank you for providing some data. I had a look.

I think in principle ilastik should learn away the noise. Having said that, if you some physically justifiable method to get rid of it, it wouldn’t hurt…

The size of 47 mb is exactly what you expect for an image of that size if saved uncompressed. The compressed size is dependent on the data. This type of noisy data does not compress so well (one of the examples you sent me in tif is also around 45 megs in size). But if you crank up compression to, let’s say 9 in the export, you’ll get a smaller file size again. Very similar to the one in tiff (again, no data loss here, just a different way to encode the data on disk).

Now maybe about your performance issues: Are the objects you are after rather large, or small? Do you find yourself using a lot of big brush strokes?
ilastik works best with rather small (even size 1) brush strokes. Adding training data in places where the prediction is already doing well, does not really help - but slow down the training significantly.

If the structures you’re after are rather large, then you could also consider downscaling your data.

Also when training, try not to zoom out to too much while in live update. ilastik will only request the part of the image that your are looking at at any given moment - so updates are faster zoomed in.

I’ve also added some performance-related information here: Which upgrade for my laptop would be most beneficial for working with "Ilastik"? - #2 by k-dominik, maybe this is of help.


1 Like

Thanks again for the quick help.

@k-dominik I have now read through all of Ilastik’s instructions again, but the batch processing function in the Pixel Classification workflow has not yet become clear to me in the last part with ‘prediction Export’ and ‘Batch Processing’.
The following problem: I train the Pixel Classifier with several images in ‘Training’ (compressed hdf5-files as you recommended). Afterwards I set the export settings in ‘Prediction Export’ (Source: Simple Segmentation; default Export Image Settings).
In ‘Batch Processing’ I can select the images to be segmented and then process them. Not all images are processed, only the last selected one. I don’t know if I should set something else in the ‘Prediciton Export’ settings…
Might there be an online seminar on Ilastik in the future?


Hi @hofmanpa,

  • The Export Applet has two purposes: configure the export (what source, file type, all those settings), and export the data you have trained on.
  • The Batch Processing Applet is there to be able to easily apply the trained classifier to unseen data (data you have not used in training). The idea is that you train on some representative data and then can apply it to the rest of the supposedly bigger dataset. (Why not add all as input data in the beginning? First of all, ilastik unfortunately gets slow if you add more than 10, 20 images for training. Also it does not improve the classification if you add a lot of similar training data - this just makes the training slower.)

if you want to use the batch processing applet for unseen data, then it is necessary to set the file name in the export settings correctly (e.g. like the default, with magic values in curly braces). This ensures there will be a new file created for every input.


1 Like

Automatic feature suggestion

After I trained the classifier with some representative images in the ‘Training’ applet Ilastik offers the option to ‘Suggest Features’ for the selected trained image. Is the selected method also applied to all other Images which I selected in the ‘Input Data’ and trained in the ‘Training’ applet?
Thanks for all the tips,

Hi @hofmanpa,

the automatic feature suggestion will select a subset of the features that you had originally selected (I suppose you choose all, which is usually a sensible choice) in order to speed up processing a bit. This should only be done once you have a lot of annotations already. Feature selection will use the annotations you have provided (on all images), to come up with a feature set that performs similarly on the training data you have provided. You’ll also get an estimation for the speedup of the feature computation with the reduced set.
The set of features you select there is then the selected features set for the project (you can verify this by going back to the feature selection applet - you will see that only a few features are then selected).


1 Like

Hello @k-dominik,

thanks for your previous advices, they really helped me to understand the program. Just one more question about the feature set. I can choose between three different Methods to select the features. As recommend, I used the ‘Filter Method’. After running the feature selection i get a result like this:

with an ‘obb_error=0,770’. How can I interpret the out of bag error score? My method has an error probability of 0.7%?


Hey Hey,

the out of bag error gives you an estimate on how your classifier might perform on unseen data. During random forest training, two types of randomization are introduced (of which only the first matters for the out of bag error, but I mention both to be complete): 1) for each of the trees in the forest, a random sample from your training data is selected (so not all annotations are used in every tree). 2) at each split in the trees only a random subset of the features is taken into account.

Okay, so with the random sampling of 1) you end up with samples that not have been included in the training of some of the trees. Those are then classified using only those trees. The out of bag error tells you how many samples it got wrong.
Does that make sense?

A oob of 0.770 is not very good - does this also translate to a good segmentation for you?

That makes sense, thank you for your explanation. So far I have not found any suitable literature that explains the obb well or also deals with the interpretation (which values speak for a good segmentation of unseen images?). Do you have any recommendation?

But nevertheless, when I’ve done the simple segmentation with ilastiks and afterwards reopen the hdf5 file with FIJI and place the segmentation as a mask over the original image, the segmentation seems to work well visually.