Multithreaded GPU behavior with StarDist

Hello,

Hope you’re doing well! We’re running StarDist using GPUs, which has been running smoothly with multiple threads. For example, when running stardist detections only and writing them out to file, the main thread would load in the TF bundle and multiple workers would then be able to successfully undergo detections:

20:18:00.373 [main] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])

20:18:15.512 [ForkJoinPool.commonPool-worker-55] [WARN ] q.tensorflow.stardist.StarDist2D - Skipped 1 nucleus detection(s) due to error in resolving overlaps (0.1% of all skipped)

20:18:25.064 [ForkJoinPool.commonPool-worker-3] [WARN ] q.tensorflow.stardist.StarDist2D - Skipped 1 nucleus detection(s) due to error in resolving overlaps (0% of all skipped)

20:18:30.173 [ForkJoinPool.commonPool-worker-5] [WARN ] q.tensorflow.stardist.StarDist2D - Skipped 6 nucleus detection(s) due to error in resolving overlaps (0.1% of all skipped)

...

However, we noticed that when adding a pixel classifier into the script, this multithreaded capability doesn’t seem to be working. Each worker is now loading the bundle (versus the main thread), and then it encounters a ConcurrentModificationException - LoadBundle error. We saw this post (StarDist error Unable to load bundle:null), and added the synchronized keyword to the java class, but we are still getting this error (output below). We also have tried .nthreads(1), which resolves the issue, but it isn’t ideal for our HPC setup since it takes quite a while.

Do you know why this could be?

Thanks for you help as always!
Druv


17:08:23.366 [ForkJoinPool.commonPool-worker-19] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
17:08:23.366 [ForkJoinPool.commonPool-worker-53] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
17:08:23.370 [ForkJoinPool.commonPool-worker-61] [ERROR] qupath.tensorflow.TensorFlowOp - Unable to load bundle: null
java.util.ConcurrentModificationException: null
	at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1226)
	at qupath.tensorflow.TensorFlowOp.loadBundle(TensorFlowOp.java:151)
	at qupath.tensorflow.TensorFlowOp.getBundle(TensorFlowOp.java:99)
	at qupath.tensorflow.TensorFlowOp.getChannels(TensorFlowOp.java:134)
	at qupath.opencv.ops.ImageOps$Core$SequentialMultiOp.getChannels(ImageOps.java:1534)
	at qupath.opencv.ops.ImageOps$DefaultImageDataOp.getChannels(ImageOps.java:215)
	at qupath.opencv.ops.ImageOpServer.<init>(ImageOpServer.java:64)
	at qupath.opencv.ops.ImageOps.buildServer(ImageOps.java:158)
	at qupath.tensorflow.stardist.StarDist2D.detectObjects(StarDist2D.java:606)
	at qupath.tensorflow.stardist.StarDist2D.detectObjectsImpl(StarDist2D.java:583)
	at qupath.tensorflow.stardist.StarDist2D.lambda$detectObjects$1(StarDist2D.java:522)
	at qupath.tensorflow.stardist.StarDist2D.runInPool(StarDist2D.java:548)
	at qupath.tensorflow.stardist.StarDist2D.detectObjects(StarDist2D.java:522)
	at qupath.tensorflow.stardist.StarDist2D.lambda$detectObjectsImpl$4(StarDist2D.java:565)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
17:08:23.370 [ForkJoinPool.commonPool-worker-39] [ERROR] qupath.tensorflow.TensorFlowOp - Unable to load bundle: null
java.util.ConcurrentModificationException: null

Can you share the script / minimal (not) working example? I don’t quite follow what is happening and when the error does/doesn’t occur.

1 Like

Sure thing! Thanks :slight_smile:

Here is the first script (run stardist detections only) that is working:

import qupath.tensorflow.stardist.StarDist2D
import qupath.lib.io.GsonTools
import static qupath.lib.gui.scripting.QPEx.*

// Specify the model directory (you will need to change this!)
def pathModel = '/models/he_heavy_augment'

def stardist = StarDist2D.builder(pathModel)
      .threshold(0.5)              // Prediction threshold
      .normalizePercentiles(1, 99) // Percentile normalization
      .pixelSize(0.5)              // Resolution for detection
      .cellExpansion(3.0)          // Approximate cells based upon nucleus expansion
      .cellConstrainScale(1.5)     // Constrain cell expansion using nucleus size
      .measureShape()              // Add shape measurements
      .measureIntensity()          // Add cell measurements (in all compartments)
      .includeProbability(true)    // Add probability as a measurement (enables later filtering)
      .build()


// Run detection for the selected objects
def imageData = getCurrentImageData()

def server = getCurrentImageData().getServer()
// get dimensions of slide
minX = 0
minY = 0
maxX = server.getWidth()
maxY = server.getHeight()
// create rectangle roi (over entire area of image) for detections to be run over
def plane = ImagePlane.getPlane(0, 0)
def roi = ROIs.createRectangleROI(minX, minY, maxX-minX, maxY-minY, plane)
def annotationROI = PathObjects.createAnnotationObject(roi)
addObject(annotationROI)
selectAnnotations();

def pathObjects = getSelectedObjects()
if (pathObjects.isEmpty()) {
    Dialogs.showErrorMessage("StarDist", "Please select a parent object!")
    return
}
stardist.detectObjects(imageData, pathObjects)

def filename = GeneralTools.getNameWithoutExtension(imageData.getServer().getMetadata().getName())
boolean prettyPrint=true
def gson = GsonTools.getInstance(prettyPrint)
def output_detections_filepath = "/data/detections/" + filename + "_stardist_detections.geojson"
def celldetections = getDetectionObjects()

new File(output_detections_filepath).withWriter('UTF-8'){
    gson.toJson(celldetections, it)
}

println 'Done!'

Here’s the updated script that uses two pixel classifiers (first does tissue vs glass classification, and then specific tissue classification) in tandem with detections to only perform detections in certain tissue-containing areas.

import qupath.lib.scripting.QP
import qupath.lib.gui.scripting.QPEx

import qupath.imagej.tools.IJTools
import qupath.lib.gui.images.servers.RenderedImageServer
import qupath.lib.regions.RegionRequest

import javax.imageio.ImageIO
import java.awt.Color
import java.awt.image.BufferedImage

import qupath.tensorflow.stardist.StarDist2D

new qupath.process.gui.ProcessingExtension()
new qupath.opencv.ops.ImageOps()

setImageType('BRIGHTFIELD_H_E');

print QP.getCurrentImageData()
print '\n'

// Set color deconvolution stains
setColorDeconvolutionStains('{"Name" : "H&E default", "Stain 1" : "Hematoxylin", "Values 1" : "0.60968 0.65246 0.4501 ", "Stain 2" : "Eosin", "Values 2" : "0.21306 0.87722 0.43022 ", "Background" : " 243 243 243 "}');

// Run tissue detection
createAnnotationsFromPixelClassifier("TISSUE-GLASS_CLASSIFIER_UpdatedThresholder-HighRes.json", 50000.0, 0.0, "SPLIT", "INCLUDE_IGNORED");
saveAnnotationMeasurements('tissue_annotation_results.tsv')
selectAnnotations();

// Save tissue image as PNG
writeTissueMask('Tissue');
writeTissueMask('Glass');

// Select tissue to run pixel classifier
selectObjectsByClassification("Tissue");

// Run pixel classification
createAnnotationsFromPixelClassifier("pixel_classifiers/simplified_classifier_26.json", 0.0, 0.0, "INCLUDE_IGNORED");
saveAnnotationMeasurements('region_annotation_results.tsv')
selectAnnotations();


boolean region_pretty_print = true
def region_gson = GsonTools.getInstance(region_pretty_print)
new File('region_annotation_results.geojson').withWriter('UTF-8') {
    region_gson.toJson(region_annotations, it)
}

// Save tissue masks as PNGs
writeTissueMask('Adipocytes');
writeTissueMask('Necrosis');
writeTissueMask('Stroma');
writeTissueMask('Tumor');

// Save tissue image as PNG
writeTissueImage('tissue');

// Run object detection
def pathModel = "he_heavy_augment"

def stardist = StarDist2D.builder(pathModel)
      .threshold(0.5)              // Prediction threshold
      .normalizePercentiles(1, 99) // Percentile normalization
      .pixelSize(0.5)              // Resolution for detection
      .cellExpansion(3.0)          // Approximate cells based upon nucleus expansion
      .cellConstrainScale(1.5)     // Constrain cell expansion using nucleus size
      .measureShape()              // Add shape measurements
      .measureIntensity()          // Add cell measurements (in all compartments)
      .includeProbability(true)    // Add probability as a measurement (enables later filtering)
      .build()

def imageData = getCurrentImageData()
selectObjectsByClassification('Tumor','Stroma','Necrosis','Adipocytes')
def pathObjects = getSelectedObjects()
if (pathObjects.isEmpty()) {
    Dialogs.showErrorMessage("StarDist", "Please select a parent object!")
    return
}
stardist.detectObjects(imageData, pathObjects)
selectDetections();

// Run object classification
runObjectClassifier("ANN_StardistSeg3.0CellExp1.0CellConstraint_AllFeatures_LymphClassifier.json")

// // Calculate the distance of each object to tumor or stroma
// detectionToAnnotationDistances(false)

saveDetectionMeasurements('object_detection_results.tsv')

// Save detection objects
def detections = getDetectionObjects()
// Save objects as GeoJSON
boolean detection_pretty_print = true
def detection_gson = GsonTools.getInstance(detection_pretty_print)
new File('object_detection_results.geojson').withWriter('UTF-8') {
    detection_gson.toJson(detections, it)
}



// Define function to write tissue image — here set downsample to 20 //
public void writeTissueImage(def tissue) {
  // def viewer = getCurrentViewer()
  def imageData = QPEx.getCurrentImageData()
  double downsample = 20
  def server = new RenderedImageServer.Builder(imageData)
      .downsamples(downsample)
      // .layers(new HierarchyOverlay(viewer.getImageRegionStore(), viewer.getOverlayOptions(), imageData))
      .build()
  // Save image
  def fileOutput = tissue + "_image.png"
  writeImage(server, fileOutput)
}

// Define function to write tissue masks — here set downsample to 20 //
// Modified from QuPath P. Bankhead on QuPath user board //
// Haven't included any args to function. Modify function for now if needed//
public void writeTissueMask(def tissue) {
  
  // Extract ROI
  def shape_ann = getObjects { p -> p.getPathClass() == getPathClass(tissue) }
  def shapes = shape_ann.collect({RoiTools.getShape(it.getROI())})
  print(shapes)
  double downsample = 20

  def server = getCurrentImageData().getServer()
  int w = (server.getWidth() / downsample) as int
  int h = (server.getHeight() / downsample) as int
  print(w)
  print(h)

  // Define the mask
  def imgMask = new BufferedImage(w, h, BufferedImage.TYPE_BYTE_GRAY)

  def g2d = imgMask.createGraphics()
  g2d.scale(1.0/downsample, 1.0/downsample)
  g2d.setColor(Color.WHITE)
  for (shape in shapes)
    g2d.fill(shape)
  g2d.dispose()
  // Save mask
  File fileOutput = new File(tissue + "_mask.png")
  ImageIO.write(imgMask, 'PNG', fileOutput)

}

Curious… just to check, have you specified "pathModel" the same way in both (in the pasted scripts they are different)?

If so, did I understand correctly that the second script gives the error in your first post – but works if you set nThreads(1) in the StarDist2D.builder and make no other changes?

If the synchronized code is running, I really don’t see how a ConcurrentModificationException would be triggered at that location.

The first thing I’d try is updating the method to

private static synchronized TensorFlowBundle loadBundle(String path) {
        logger.info("Requesting model (synchronized): {}", path);
	return cachedBundles.computeIfAbsent(path, p -> new TensorFlowBundle(p));
}

to confirm it writes to the logger as expected (I haven’t tested this, may have made a typo or two).

Yes, the paths to pathModel are the same on both. I was removing some device-specific paths and accidentally changed one of them. Thanks for double checking this!

Yep, you’re correct. The second script gives the error from the first post, but is alleviated when we set nThreads(1) in the stardist builder (although runs very slowly).

Ok, I will try the updated loadBundle method and get back to you!

Thanks – appreciate it a ton! :slight_smile:
Druv

1 Like

Ah, as I was fixing it, I realized I had:

public static TensorFlowBundle synchronized loadBundle.

rather than private static synchronized TensorFlowBundle loadBundle.

Im guessing this was the culprit.

Druv

1 Like

Well, part of the reason I mentioned a possible typo is I can never remember exactly where the synchronized should go… although I was thinking before or after the static :slight_smile:

(Hopefully if it’s in the wrong place, it will just not compile)

Haha, yeah! I was pretty surprised that it compiled before as well, but I guess it was able to.

Just rebuilt using the new loadBundle method, and still getting the Concurrent Modification Exception from loadBundle (below).

I’m also not seeing the Requesting model (synchronized) that we just added in the logs.

18:06:40.292 [main] [INFO ] qupath.ScriptCommand - Setting tile cache size to 7672.00 MB (25.0% max memory)
18:06:40.429 [main] [WARN ] q.lib.images.servers.FileFormatInfo - Unable to obtain full image format info for file:/data/sample_data/HobI20-681330526186.svs (off < 0 || len < 0 || off + len > b.length!)
18:06:40.749 [main] [WARN ] q.l.i.s.b.BioFormatsImageServer - Temp memoization directory created at /tmp/qupath-memo-15060440580722515638
18:06:40.749 [main] [WARN ] q.l.i.s.b.BioFormatsImageServer - If you want to avoid this warning, either disable Bio-Formats memoization in the preferences or specify a directory to use
18:06:40.900 [main] [WARN ] q.l.i.s.ImageServerMetadata$ImageResolutionLevel - Calculated downsample values differ for x & y for level 2: x=64.1943359375 and y=64.26585695006747 - will use value 64.23009644378374
18:06:41.117 [main] [INFO ] q.l.i.s.o.OpenslideServerBuilder - OpenSlide version 3.4.1
18:06:41.147 [main] [WARN ] q.l.i.s.ImageServerMetadata$ImageResolutionLevel - Calculated downsample values differ for x & y for level 2: x=64.1943359375 and y=64.26585695006747 - will use value 64.23009644378374
18:06:41.508 [main] [INFO ] qupath.lib.scripting.QP - Initializing type adapters
18:06:57.299 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:06:58.487 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:06:58.515 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:06:58.557 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:06:58.775 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:06:58.775 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to Polygon.
18:06:58.888 [main] [WARN ] qupath.lib.roi.GeometryTools - Geometries must all be of the same type when converting to a ROI! Converted GeometryCollection to MultiPolygon.
18:07:03.644 [main] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
18:07:03.644 [ForkJoinPool.commonPool-worker-17] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
18:07:03.644 [ForkJoinPool.commonPool-worker-37] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
18:07:03.644 [ForkJoinPool.commonPool-worker-53] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
18:07:03.644 [ForkJoinPool.commonPool-worker-45] [INFO ] q.t.TensorFlowOp$TensorFlowBundle - Loaded TensorFlow bundle: /models/he_heavy_augment, (inputinput:0 [-1,-1,-1,3], output=concatenate_4/concat:0 [-1,-1,-1,33])
18:07:03.648 [ForkJoinPool.commonPool-worker-17] [ERROR] qupath.tensorflow.TensorFlowOp - Unable to load bundle: null
java.util.ConcurrentModificationException: null
	at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1226)
	at qupath.tensorflow.TensorFlowOp.loadBundle(TensorFlowOp.java:151)
	at qupath.tensorflow.TensorFlowOp.getBundle(TensorFlowOp.java:99)
	at qupath.tensorflow.TensorFlowOp.getChannels(TensorFlowOp.java:134)
	at qupath.opencv.ops.ImageOps$Core$SequentialMultiOp.getChannels(ImageOps.java:1534)
	at qupath.opencv.ops.ImageOps$DefaultImageDataOp.getChannels(ImageOps.java:215)
	at qupath.opencv.ops.ImageOpServer.<init>(ImageOpServer.java:64)
	at qupath.opencv.ops.ImageOps.buildServer(ImageOps.java:158)
	at qupath.tensorflow.stardist.StarDist2D.detectObjects(StarDist2D.java:606)
	at qupath.tensorflow.stardist.StarDist2D.detectObjectsImpl(StarDist2D.java:583)
	at qupath.tensorflow.stardist.StarDist2D.lambda$detectObjects$1(StarDist2D.java:522)
	at qupath.tensorflow.stardist.StarDist2D.runInPool(StarDist2D.java:548)
	at qupath.tensorflow.stardist.StarDist2D.detectObjects(StarDist2D.java:522)
	at qupath.tensorflow.stardist.StarDist2D.lambda$detectObjectsImpl$4(StarDist2D.java:565)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

Hmmm, it looks like it isn’t using the recompiled version. If there’s no message in the logs – and the error is still at line 151 (rather than 152) it looks to me like the ‘original’ version is still what’s being called.

2 Likes

You’re totally right – thanks for the catch. I’m running this through docker so although I changed the file in the build, I compiled it before the file change, so it wasn’t using the updated method.

I reran it after recompiling, and it’s working now! :slight_smile: Thanks a ton for your help – we really appreciate all the time you’ve spent helping us work with QuPath.

Thanks, and take care!
Druv

1 Like