Creating ground truth images for classifier validation

Hi all,

I’m looking to create some ground truth images to help validate my pipelines. I’m wondering the best way to do it. I have tried CellProfiler IdentifyObjectsManually module, but it’s a little cumbersome. I also see this method here:

Annotating images with CellProfiler and GIMP

Not tried that yet though. Does anyone have any suggestions of ways they like to do this? I have a 2D co-culture system of two different cell types, and a couple of channels per image. Any recommendations on the number of cells to annotate/number of images to annotate would also be helpful!



A different approach is to create synthetic images. CytoPacq offers a framework for that:

Of course this simulation is limited by the available phantoms and models.


We used the GIMP method you linked to create ground truth for hundreds of nuclei for the 2018 Data Science Bowl, as well as the ground truth for our CellProfiler 3 paper, so we certainly think it’s useful, but any and all ground truth creation is going to be tedious.

If you have a CellProfiler pipeline that does a reasonably good job by eye of finding your objects of interest, you could also run it then correct errors with EditObjectsManually rather than using IdentifyObjectsManually to identify all objects from scratch.

Finally, if you truly only care about classification accuracy, not segmentation accuracy, you needn’t worry super accurately about capturing cell borders; even one or a few pixels inside each cell should be sufficient, which should be MUCH faster. For that, you could use GIMP, but also LabelBox, LabelMe, or any other online ground truth annotation tool (I don’t have particular favorites or affiliations in these arenas).


Amazing, thanks Beth!


Have you looked into QuPath to help with detection and annotation? I’ve had a lot of good experiences using QuPath for my fluorescence images.

Here is my workflow (in ZEN Blue)

  • use normal thresholding or pixel classification and binary postprocessing to first get a “rough” binary mask of the objects your are interested in
  • manually edit the resulting binary masks with cut or brush or … tools
  • export all mask images as PNGs (or whatever)
  • use the results as GT fo training your networks etc.

Thanks for the suggestions everyone!

At the moment I’ve decided to use automatic segmentation with CP then manually curate the output to create the GT. Seems to be working pretty quickly.