Train Weka segmentation classifier on many images

Hello,

Following this post I created last month without success, I still could not find a solution to my issue.
I am using the Trainable Weka Segmentation. I would like to train my classifier on more than one image (e.g. all images don’t have all classes):

  1. Load image 1, define classes, create traces and train classifier;
  2. Load image 2, load classifier, create new traces from image 2 and train the classifier again to take into account traces from images 1 and 2;
    … and so on.

If I use the graphic interface of the plugin, adding traces at step 2 seems to erase previous training of the classifier. Macro examples such as presented here load the image before starting the plugin. I modified it slightly to open the second image after, but although the ROIs are selected, they are not added to the traces.

// Open Image1
open('leaf1.tif')
 
// start plugin
run("Trainable Weka Segmentation");
 
// wait for the plugin to load
wait(3000);
selectWindow("Trainable Weka Segmentation v3.2.0");

makeRectangle(67, 0, 26, 30);
call("trainableSegmentation.Weka_Segmentation.addTrace", "0", "1");
makeRectangle(86, 182, 23, 86);
call("trainableSegmentation.Weka_Segmentation.addTrace", "1", "1");
//makeRectangle(386, 182, 23, 86);
//call("trainableSegmentation.Weka_Segmentation.addTrace", "2", "1");

open('leaf2.tif');
img = getImageID();
selectImage(img);

makeRectangle(67, 0, 26, 30);
call("trainableSegmentation.Weka_Segmentation.addTrace", "0", "1");
makeRectangle(86, 182, 23, 86);
call("trainableSegmentation.Weka_Segmentation.addTrace", "1", "1");

How can I add traces from more than 1 image to my classifier ?

Hello @Marypop,

I have just answered to your previous post as well, but let me clarify the issue here as well:

The problem here lies on the fact that retraining the classifier creates a new version of the classifier from scratch, so all previous trace information is lost. The solution is to save the data instead of the classifier and load it back for each image you train. That way, the classifier will be train using the previous and the new traces.

Notice the default classifier of Trainable Weka Segmentation is FastRandomForest, which is not “updateable”. Only a few WEKA classifiers can be updated. If you are interested on any of them, I can implement the option to check if the selected classifier belongs to that group and update it instead of build it from scratch.

4 Likes

Hi @iarganda

Thanks for the tips and help with the Weka classifier. There’s plenty I’ve been reading to apply it to my images.

Following on this (quite old) question from Marypop, I’m dealing with this same issue at the moment. I trained a classifier and realised later that for some images it would be good to re-train it. I would like to do this without having to perform all the steps again, but from what I understand, this is not possible with FastRandomForest:

  1. did you already implement warnings in the code, to let the user know when a new training will add to the existing classifier or build it from scratch depending on which type of algorithm used?

  2. is there a way to pull out the data used to build the classifier, from the *.model file, or once built that is lost?

Thanks

Hello @NML!

No, sorry about that. The vast majority of available methods are not retrainable, that’s why I didn’t put any message there.

No, Weka (the machine learning library) separates the model from the data. That’s why you have the two options in TWS: “Save classifier” and “Save data”. For by-passing the problem of re-training, you would have to save your data and load it again.

2 Likes

Hi @iarganda, thanks for the reply.

It works well to save the data, pull it back and retrain it again. Now I know it :slight_smile:

1 Like

Hello!

Thanks for the clarification. Just another real quick question. If I load multiple data files, do I need retrain after every addition? And, once I have added and retrained and whatnot and have the classifier the way I want it, can I then save it as is? Or do I have to reload the data in a similar manner every time?

Thanks!

With the default classifier (and most of the available classifiers in the WEKA library) yes, you have to train again because they are not updatable. Here you are the list of updatable classifiers from WEKA:

So, I’m not sure what you mean by the classifier is not updateable. Does that mean it can only consider one set of data at a time?
My process for training the classifier up until this point has been as follows:
I load an image into TWS, draw the segments/add to classes and train the classifier. I get the result and save the data. For the next image, I upload it into TWS, load the previous data set, and click “train classifier”. From there, I correct the areas that are wrong and click “train classifier” again. I then save that data and close the window. For the third image, I load the first set of data, train the classifier, then load the second set of data and train the classifier, then correct where it’s wrong, train again, and again save the data. And so on and so forth.
Is this wrong?
I looked at the updateable classifiers that you listed and I’m afraid I don’t understand how to use those either. I really appreciate your help, and I’m sorry I am struggling with this. You did a great job with the program, I just don’t have any kind of computer background, and would love to be able to use it to its fullest capacity.

Not at all, you are doing the right thing!

“Updateable” means that once trained, we can train it again on new samples and it will adapt to that new data starting from the previous trained state. In general, most WEKA classifiers are trained every time from scratch, so that’s why we have load the previous data samples, so it takes both the new ones and the old ones into account. Does it make sense?

Don’t be, it is really useful feedback when you ask for all these details! I recommend you to keep the classifier by default (FastRandomForest) and keep training as you were doing. If you notice that you have many more samples (pixels) from one class than from the others, it might also be useful to click on “Balance classes” in the Settings dialog :wink:

Hi!
Yes, that makes much more sense. Along that same vein, when I save my completed classifier and reload it, it doesn’t need to be trained from scratch, correct?

I have just one final question if you don’t mind. For my research report, do you know of any citable references other than the ImageJ website that give detailed explanations of the different available settings? I’ve been scrounging around and can’t find anything detailed quite detailed enough.

Again, thank you so much for all of your help!

No, unless you have new traces and would like them to be taken into account.

Sure, have a look at the supplementary material of the TWS paper in Bioinformatics.

To plug in here that the WEKA library is relatively easy to run from a script. See for example this basic introduction, an example of segmenting blobs and another segmenting an electron microscopy image.

In short, you could have e.g. multiple images open, with ROIs on each, and then read the pixels under the ROI of each image, and pack them into the training data for a WEKA classifier, to train it. Then apply the classifier to other images.

3 Likes

That’s a great resource! Thanks so much for your help!

1 Like

Hi @iarganda,

Thank you for taking the time to help our community members!

@etadobson and I had a follow up question about workflow using TWS we were hoping you might be able to clarify.

If I am training a classifier using many images, should I be loading multiple data files (ARFF files) to train the classifier? Additionally, should I be saving a new classifier (model file) and loading this model file for each subsequent image I am training with?

For example, my current workflow is:

  1. Open first image in TWS
  2. Create classifier with class labels
  3. Train the classifier with my own tracings
  4. Save data (ARFF), classifier (model)
  5. Open next image in TWS
  6. Load classifier I created in step 4
  7. Load data I created in step 4
  8. Click “train classifier”
  9. Train further using my own manual tracings
  10. Save data (ARFF), classifier (model)

My questions are:

  • For the third image I will train on, do I load the classifier made in step 4 or step 10?
  • For the third image I will train on, do I load the data from step 4, step 10, or both?

Thank you in advance for your help! like others on this thread my background is not in comp sci but in medicine, so I really appreciate your patience!

2 Likes

Thank you for your feedback, @cschiefer !

Short answer: you don’t need to save and reload the classifier for each image, only the ARFF file, since the classifier gets trained from scratch every time.

I hope this helps!

1 Like

Thanks @iarganda !!!

But does he only need to load the most-recent ARFF file then… ?? We want this to be an additive process - moving from one image to the next building up the classifier. So should he:

  1. open image #1, train classifier, save data file (#1) & save classifier (#1)
  2. open image #2, load data file (#1), train ‘new’ classifier, save data file (#2) & save classifier (#2)
  3. open image #3, load data file (#2), train ‘new’ classifier, save data file (#3) & classifier (#3)

Or does he - for example at step 3 - need to open both previous data files (#1 and then #2) for image #3 and train on both sequentially for that image? Or is all the previous data saved in data file (#2) so he can just start the training with that one for image #3?

Is that clear? I suppose that’s where we were getting ‘mixed up’…

I would like to jump in here with what I hope is a simple question/simple answer: is it possible to load traces/masks made before to weka for training?

I already have a 100 slice stack where all the structures I’m interested in have been masked (I saved them all as independent ROIs), which took me a couple of days to do. I would like to use them with Weka (plus adding a few “negative” areas). Is there a way to re-use them and train a new model?
Maybe this is also an alternative to having multiple training images (we just stack them all and go from there)?

Thanks once again for your support!

Just the last ARFF, yes! Sorry for the late answer!

2 Likes

Well, not directly from the plugin BUT, you can simply load TWS with your image, use the ROI Manager to load all your previously selected ROIs and then add them to the class you want.

2 Likes

That’s great, as we can re-train and re-use previous data!

I just tried to get a small macro to load all the ROI to my 100 slice stack (there are close to 2000 manually drawn masks…!):

n = roiManager('count');
for (i = 0; i < n; i++) {
    roiManager('select', i);
    slice = getSliceNumber();
    call("trainableSegmentation.Weka_Segmentation.addTrace", "1", "slice");
}

but this seems to freeze (or slow down a looot) my computer (which is a capable machine, 64gb RAM, 1GB SSD, etc), which surprised me.

Can you think of a better way to do a macro…?