Metric Evaluation (non expert)


For my bachelorthesis I am investigating how accurate and accessible different nuclei segmentation strategies are. I am not an expert in code by any means although I want to do some kind of metric evaluation. Can anyone help me how I calculate metrics like Precision, Recall, F1 score etc. I attach an example of my ground truth annotations (blue) and a segmentation result from stardist (yellow) in fiji. I am aware of the formulas for these metrics, but I am looking for some way to automate these calculations. Thank you in advance!

Thanks for new insights. For those like me, here is a nice explanation of the formulas (no affilation).

Hi @jamesmulder,

The python stardist library comes with a stardist.matching submodule that allows to calculate all these metrics for label masks (i.e. ground truth and segmentation are an integer label image). Here is an example of how to use it (note that the library additionally provides functions to create true positives/false positives plots):

Which should output something like this

Matching(criterion='iou', thresh=0.5, fp=1, tp=5, fn=1, precision=0.8333333333333334, recall=0.8333333333333334, accuracy=0.7142857142857143, f1=0.8333333333333334, n_true=6, n_pred=6, mean_true_score=0.5916911649023856)

Hope that helps a bit!