Extract from random trees feature importance figure

Hello, does anyone know how to extract from random trees a feature importance data to create a feature importance figure?

My goal is something like this:
https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e

I found some information about OpenCV to extract the importance variables.
getVarImportance

I could access this function over “RTree”.

import org.bytedeco.opencv.opencv_ml.RTrees
import qupath.lib.classifiers.PathClassifierTools
import qupath.lib.classifiers.PathObjectClassifier
import java.io.File 

def classifierFile = new File("C:/classifier1.qpclassifier")
def classifierRT = PathClassifierTools.loadClassifier(classifierFile)

RTrees trees = RTrees.create()
trees.getVarImportance()

How can I load the classifier to RTrees?
Thx

1 Like

I can’t think of any straightforward way to do this in QuPath for a detection classifier currently. You would need to access the OpenCV StatModel and call the setCalculateVarImportance method prior to training… but since QuPath calls the training for you, I’m not sure how achievable that is.

I’m also not sure how meaningful the results are with OpenCV. The documentation for OpenCV states here that

to compute variable importance correctly, the surrogate splits must be enabled in the training parameters, even if there is no missing data.

But some years ago the surrogate splits became lost and attempting to turn them on throws an exception.

Still, OpenCV can in principle provide feature importance information that looks plausible (to me, without looking in much depth) and I intend to make this accessible.

Using the pixel classifier in v0.2.0-m8 with Random Trees, you can already press the Edit button and select Calculate variable importance as an option. If you do this, every time a classifier is trained the feature importance values will be written to the log.

I’m currently entirely rewriting the object classifier to make it much more flexible for v0.2.0. This rewritten version will support logging feature importance in the same way that the pixel classifier currently does.

1 Like

Thank you very much Pete for your detail answer. :slight_smile:
Is there any way to access right now to the feature importance?
I have to finish one paper and need it really. :frowning:

You can export the training data as a text file (see Exporting training data as txt file) and explore it elsewhere, e.g. using OpenCV’s Python bindings or Weka. Note that if you use OpenCV the RTrees classifier may well not be identical since it may be created with different parameters.

Perhaps someone else can come up with a QuPath-specific solution, but as I wrote previously I can’t think of any straightforward one - I think it would take quite a bit of coding/hacking to try to piece something together, and then relate the importance values back to the feature names. I’m afraid I don’t have the time currently to investigate this myself.

Ok, so I would export the „model“ and create from that a new random tree in OpenCV or Weka, however predicting there the importance variables.

What is about the Weka extension for 0.1.2?
Maybe it could be updated for 0.2.0, and through that way getting on an easy way the importance variables?

thx :blush:

Sure, you could try updating the Weka extension as well, or use v0.1.2 yourself. Although it is likely to be more work than if you just export the data and use Weka as it is.