Is the dataset used in Ilastik the labels or the image?

Hi,

I have an extremely unbalanced image data set (10:1 ratio between positives and negatives) which I want to analyze in the Pixel Classification workflow. I am concerned about overfitting with the large class imbalance, and I wanted to know whether the dataset Ilastik uses to process new images is the trained images (i.e. it will reflect the class bias) or if it is only using the pixels which I labelled? If I label equal numbers of each class will that circumvent the class imbalance in my data?

Thanks in advance!
@ilastik_team @k-dominik

Hi Omar,

ilastik only uses the labeled pixels for training. So in order to have a balanced training set, you would try to label approximately the same number of pixels for each of the classes. Right now there is no build in way to see the number of labels per class in the gui. But you should be fine if you have roughly the same amount of labels.
In general we advice to only label a little bit in the beginning, and correcting the classifier where it is wrong in live update mode.

1 Like