I’m trying to interpret the variable importance table for pixel classification, I understand the concept of random forest algorithme but I don’t find documentation about this specific one, especially about Class #0 and Class #1 coefficients.
Here is what I have seen on the Internet, although I am not 100% sure it can help people with the same problem.
The three first scores are related to the OOB error (Out Of Bag):
During training of the forest, each tree is train with only a subset of features (i.e. not all features). This means that at the end of the training, for each feature, there is a group of trees that have not been trained with this feature (the feature is “out of bag”). Class#0 coresponds to the mean of fals negativ on this group, Class#1 the mean of fals positiv. Overall is the mean of fals both positiv and negativ) on this group and you should probably only look at that one.
The higher these are, the more error your sub-forest does without your feature, which shows great importance
For the Gini coeficient (actually the Mean Decrease in Gini), there is a very good explanation one this page: Help-Mean-Decrease-in-Gini-for-dummies , but the higher it goes, the more important is your variable
Feel free to add something if you have a better explanation, I’m new to the field and maybe I forgot something
in short, we do not recommend using those values at all. This is why this table is also hidden away like that.
What is your goal when looking at those values? If your goal is speedup by selecting only features that lead a good segmentation, then we have the suggest features functionality available in the training applet. This will allow you to find a subset of the features that lead a “similar” segmentation.
I am looking at these values in order to understand my model and compare it with other methodes that use some of these features.
I also used suggest features and its a great functionality !
You say I should only use this one?
yes, definitely! Hope this helps…