Using pre-made label images for classifier creation


I have tissue I’ve stained for a number of markers and dapi. I plan on making a segmentation map of the dapi and then using that to do object classification in each of the other markers in Ilastik or potentially some other program. I’m interested in first getting really accurate classification of single images by manually labelling as many objects as I need to create a “ground truth” label mask for each image. Ilastik is handy this kind of “assisted counting” as the machine learning can help label all the objects without having to click on each one to create a label image. The UI makes it easy to see and label objects and the results can be output for further processing.

My question is this: can I use the “ground truth” label masks from multiple images as input for a new classifier? After I’m done with all the manual counting I’d like to create a new more general classifier that could be used on unlabelled images. I don’t see that two different Ilastik projects can be combined and allowed to settle on a new classifier based on the combination of label images. Am I missing something? Is this not going to work for some reason I’m not anticipating?

thanks in advance for comments/suggestions! -John

Hi @johnmc,

first maybe the “easy” way to do it: Add multiple images to your Object Classification project. Train on each of them and automatically get a combined classifier.

Would this work for you?

Hi Dominik,

Thanks for taking my question. Hmm, I’m not explaining myself well enough. My first task is to manually count images, because if I use some kind of auto-counting then I have the issue of classifier accuracy. To assess classifier accuracy and prepare for slings and arrows I need a ‘ground truth’ so I’m trying to get that first.

I’m basically following the initial steps of this paper:

I’m trying get clever though (silly me) and start with a segmented image of the nuclei and then clean it up in gimp. I think this might be what Anne Carpenter is referring to in this post: with ‘Strategy 2’, but beyond this tantalizing post I haven’t heard more about it.

To continue the labor saving I want to bring the segmented image into Ilastik along with one of the additional channels (markers) and use the UI to label the nuclei objects as positive or negative based on the additional channel. Ilastik is handy for marking whole objects as opposed to cell counting in imagej which just adds a dot with a number on the image. I was further hoping to use Ilastik to label all the easy ones so I don’t have to click on all of them. I can then ‘overfit’ and stamp out all the false negatives and positives. This will then create a ground truth label image (mask) which gives me the numbers I need to figure out if the experiment worked.

The problem with starting with many images in Ilastik is that it will try and create a classifier using all the labels across all the images and this will necessarily introduce false positives and negatives. I see that Ilastik will not overrule a user defined label which is great, but if I go the multi-image route then it becomes laborious to label everything to get to a ground truth. If I stick to one image I can overfit my way to a valuable label mask pretty quickly. I’m thinking I can then use these from multiple images to train a classifier to analyze a much larger data set. If I have a ground truth set of labels then I can assess classifier accuracy if I then apply the new classifier to all the images I manually labelled.

Make sense? Maybe there’s a flaw in my logic I’m not apprehending at the moment.


Hi @johnmc,

so maybe before I go into a lengthy reply, let me try to summarize the problem in slightly different terms, to make sure I have understood it. (Disclaimer: I am not a biologist).

I see two different classifiers at play:

  1. A pixel classifier: You want to do pixel based segmentation of the dapi images in order to find the information for every pixel whether it is foreground (dapi) or background.

Okay, and furthermore, you want to

  1. a: correct those results using gimp to get more accurate masks.

Using the result of 1. a, you want to

  1. An object classifier: given the masks 1(a) you want to train a classifier that can differentiate between different kinds of objects (nuclei) using additional images with different stainings.

So what I don’t understand so far is the whole combination aspect. What is your goal, very accurate segmentations, so do you want to further analyze morphology? Or do you just care for counts of objects of different classes?

And where is the combining of classifiers coming in? Are you actually just trying to generate ground truth in a smart way as input for a different algorithm? (much like yapic is using ilastik)?

1 Like

Are your nuclei weirdly shaped such that you need a classifier to detect them? Intensity based segmentation is frequently pretty clean in many DAPI stains, though that varies from tissue to tissue, so impossible to say in your case. Large tumor ring nuclei and lymph nodes come to mind.

DAPI segmentation, with blind expansion for classification.

In this case a pixel classifier could be used if needed, but I’m not sure it is. In general, the main benefit to such things is the context given by a deep learning algorithm. If you aren’t using deep learning, you may not get as much benefit from a pixel classifier over traditional segmentation.

Hi Everyone, I was regretting my extended, possibly confusing post. Thanks for the replies, I will update later today. -John