Add Split Annotation and Remove fragments and holes plugins to script

I am adding some annotation masks I would like to split and clean after adding them.

I found that the Split annotation and Remove fragments plugins work like a charm and I would like to add it after adding the PathAnnotationObject. I tried to add

selectAnnotations()
runPlugin('qupath.lib.plugins.objects.SplitAnnotationsPlugin', '{}');
runPlugin('qupath.lib.plugins.objects.RefineAnnotationsPlugin', '{"minFragmentSizeMicrons": 2.0,  "maxHoleSizeMicrons": 0.0}');

in my script but I am not getting any splitting. I guess I should transform somehow the plugins to something different but I am not quite sure how to do it.
I checked the source code and tried to port it to my script but this is not giving me good results…

Have you selected all your annotations before running your script? e.g. selectAnnotations()

EDIT: I see that you have edited your post, and tried with the above line. It works for me with a minimal example though, what kind of annotations do you have? Are they locked?

Yeap! Sorry I just edit the question to add selectAnnotations() line that I missed.

It seems that the plugins should work outside the loop I have for adding masks individually. This is kind of problematic since I have a lot of masks and run it at the end (my first attempt) make QuPath to crash.

No worries!

How many masks do you have? Does QuPath give you any information about the problem or does it just freeze (when running the plugins at the end)?

Also if you have a bigger sample of the script that you use we can maybe help you identify more precisely where the bottleneck is/if some code is not quite right.

1 Like

If this is an extension of the deep learning classifier, is the import a binary mask image that you are then generating annotations from, and transferring them into the whole slide image? Or have you already generated the coordinates per tile, and you transferring those coordinates into the whole slide image over thousands of tiles?

You may be better off using JTS or working with the ROIs rather than adding the annotations if that is the case. QuPath handles large numbers of annotations MUCH better than it used to, but I am guessing there are still limits, and I don’t think the intention was to encourage large numbers of annotations.
See here: QuPath 0.2.0 extremely slow with large annotation projects

Yes, this is part of the pipeline of the DL classifier. I used some code modified from this thread. Concretely, the code I am using is the following:

/**
 * Script to import binary masks & create annotations, adding them to the current object hierarchy.
 *
 * It is assumed that each mask is stored in a PNG file in a project subdirectory called 'masks'.
 * Each file name should be of the form:
 *   [Short original image name]_[Classification name]_([downsample],[x],[y],[width],[height])-mask.png
 *
 * Note: It's assumed that the classification is a simple name without underscores, i.e. not a 'derived' classification
 * (so 'Tumor' is ok, but 'Tumor: Positive' is not)
 *
 * The x, y, width & height values should be in terms of coordinates for the full-resolution image.
 *
 * By default, the image name stored in the mask filename has to match that of the current image - but this check can be turned off.
 *
 * @author Pete Bankhead
 */


import ij.measure.Calibration
import ij.plugin.filter.ThresholdToSelection
import ij.process.ByteProcessor
import ij.process.ImageProcessor
import qupath.imagej.tools.IJTools
import qupath.lib.objects.PathAnnotationObject
import qupath.lib.objects.classes.PathClassFactory
import static qupath.lib.gui.scripting.QPEx.*

import javax.imageio.ImageIO
import qupath.lib.regions.ImagePlane
import qupath.lib.roi.ROIs
import qupath.lib.objects.PathObjects

// Get the main QuPath data structures
def imageData = QPEx.getCurrentImageData()
def hierarchy = imageData.getHierarchy()
def server = getCurrentServer()


// Only parse files that contain the specified text; set to '' if all files should be included
// (This is used to avoid adding masks intended for a different image)
def includeText = server.getMetadata().getName().replace('.svs', '')

// Get a list of image files, stopping early if none can be found
def pathOutput = QPEx.buildFilePath(QPEx.PROJECT_BASE_DIR, 'masks')
def dirOutput = new File(pathOutput)
if (!dirOutput.isDirectory()) {
    print dirOutput + ' is not a valid directory!'
    return
}
def files = dirOutput.listFiles({f -> f.isFile() && f.getName().contains(includeText) && f.getName().endsWith('.png') } as FileFilter) as List
if (files.isEmpty()) {
    print 'No mask files found in ' + dirOutput
    return
}

// Create annotations for all the files
def annotations = []
files.each {
    try {
        annotations << parseAnnotation(it)
    } catch (Exception e) {
        print 'Unable to parse annotation from ' + it.getName() + ': ' + e.getLocalizedMessage()
    }
}

// Add annotations to image
hierarchy.addPathObjects(annotations)


/**
 * Create a new annotation from a binary image, parsing the classification & region from the file name.
 *
 * Note: this code doesn't bother with error checking or handling potential issues with formatting/blank images.
 * If something is not quite right, it is quite likely to throw an exception.
 *
 * @param file File containing the PNG image mask.  The image name must be formatted as above.
 * @return The PathAnnotationObject created based on the mask & file name contents.
 */
def parseAnnotation(File file) {
    // Read the image
    def img = ImageIO.read(file)

    // Split the file name into parts: [Image name, Classification, Region]
    def parts = file.getName().replace('_.png', '').split('_')

    // Discard all but the last 2 parts - it's possible that the original name contained underscores,
    // so better to work from the end of the list and not the start
    def classificationString = 'Tumor'

    // Extract region, and trim off parentheses (admittedly in a lazy way...)
    def regionString = parts[-1].replace('(', '').replace(')', '')
    
    // Create a classification, if necessary
    def pathClass = null
    if (classificationString != 'None')
        pathClass = PathClassFactory.getPathClass(classificationString)

    // Parse the x, y coordinates of the region - width & height not really needed
    // (but could potentially be used to estimate the downsample value, if we didn't already have it)
    def regionParts = regionString.split('-')
    double downsample = 1 as double
    int x = regionParts[0] as int
    int y = regionParts[1] as int
    

    // To create the ROI, travel into ImageJ
    def bp = new ByteProcessor(img)
    bp.setThreshold(42, 42, ImageProcessor.NO_LUT_UPDATE)
    def roiIJ = new ThresholdToSelection().convert(bp)
    
    
    int z = 0
    int t = 0
    def plane = ImagePlane.getPlane(z, t)

    // Convert ImageJ ROI to a QuPath ROI
    // This assumes we have a single 2D image (no z-stack, time series)
    // Currently, we need to create an ImageJ Calibration object to store the origin
    // (this might be simplified in a later version)
    def cal = new Calibration()
    cal.xOrigin = 0
    cal.yOrigin = 0
    def roi = IJTools.convertToROI(roiIJ, cal, downsample,plane)
    roi = roi.translate(y, x) // New line, use the coordinates from your image name

    // Create & return the object
    return new PathAnnotationObject(roi, pathClass)
}
selectAnnotations()
runPlugin('qupath.lib.plugins.objects.SplitAnnotationsPlugin', '{}')
runPlugin('qupath.lib.plugins.objects.RefineAnnotationsPlugin', '{"minFragmentSizeMicrons": 3.0,  "maxHoleSizeMicrons": 0.0}')
//selectAnnotations()
//mergeSelectedAnnotations()

As you may see, this code works in some situations where the image is small. If the tissue is too big then QuPth simply freezes or even terminates.

@Research_Associate I am not quite sure how to work with JTS or ROIs, could you please shar eme some info about them? Thanks!

1 Like


@smcardle’s script was for a project where there were many objects that needed to be split, whether they are annotations or detections is relatively unimportant once you get to the ROI or geometry. What this did was select all of the ROIs of a type of object (you may want to start with getAnnotationObjects() rather than getDetectionObjects() ), merge them into one giant complex ROI, and then split that ROI into non-contiguous parts. The resulting ROIs then need to be added back into the QuPath hierarchy as either detections or annotations, depending on what you want to do afterwards.

In this case the objects were added as detections since there were so very many of them, and it made interacting with QuPath a bit nicer. You could use createAnnotationObject instead, but beware potential performance issues if you do! You can also add a step to remove small ROIs prior to creating these objects, if you want. In fact, I would recommend it, as creating tens of thousands of small annotations or even detections could be very slow.

*This is all assuming that the problem doesn’t come from the importing annotation step. If the large images crash even just creating the annotation objects initially, you might need another option, like not creating the annotations as part of the import process, but only saving the ROIs and using those for splitting and removing small areas. That would probably be the most efficient runtime-wise, but you would lose intermediate steps at the moment that could tell you where your code is having problems.

Adding the script works nicely and quite fast, thanks!

As I can see in the docs, detections cannot be modified. My scripts is intended to be a efficient way to communicate with pathologists using QuPath projects. The idea would be to 1) DL predict on new cases, 2) Prediction validation by pathologists 3) Generate new training tiles and masks with refined annotations 4) Retrain. In this sense, I have two questions:

  1. I can see that there are ways to transform detections to annotations in batch. It seems that this script is not so fast (as you mention before), so I was wondering I there is a way to manually change specific detections to annotations, modify them and change them again.

  2. Regarding the generation of new tiles. I found in the docs a way to generate pairs of images and masks using “labels”. I am not sure if this “labels” could be both anotations or detections.

1 Like

While detections cannot be modified, you can:
Save the detections.
Take their ROIs, modify them.
Delete the old detections.
Add the new detections.

It isn’t quite modifying the detections, but the result is similar.

I would guess that it is the annotations that are slow, not the conversion process, though I could be wrong. I simply avoid annotations whenever possible if I have hundreds or thousands of objects.
If you can find a way to classify the subset of objects you specifically want to change, then you can run pretty much any of these scripts on those objects.

In general, instead of something like getDetectionObjects() or getAnnotationObjects(), you would use something like
getDetectionObjects().findAll{it.getPathClass() == getPathClass("SubsetClass")}

I believe Pete indicated in a previous post that there was an option for the labelServer that allowed it to work on detections.
Ah, Exporting detections within selected annotation

Uhmmm… This option requires a little bit of scripting and this part of the work would be performed by the pathologist staff. So, I guess the best option would be to delete and/or generate detections manually.

It seems to be the case. If I change return new PathAnnotationObject(roi, pathClass) to return new PathDetectionObject(roi, pathClass) in the last line of the loop in the previous script, the whole process finishes quite faster.

And since I could export masks with detections I guess now I have the whole pipeline working quite efficiently!! I guess the only way to improve this kind of transformations would be to work on GPU, but I guess this is something far away to be implemented :stuck_out_tongue:

1 Like

@Joan_Gibert_Fernande I recommend profiling with VisualVM to see where the bottleneck is if you want any further optimization :slight_smile: