Transferring k-means clustering data back into QuPath

Hi Pete and Melvin!

I’ve done some k-means clustering analysis in Python with scikit and currently I’m considering which would be the best way to display that data back into QuPath (M9).

The two options that I think are viable would be:

  1. Draw alpha shape of clusters as annotations;
  2. Assign QuPath detections according to clusters.

For option 1, I’ve been trying to figure out the format of coordinates accepted by PolygonROI. So far the raw format of the coordinates for the alpha shape vertex coordinates are {(x, y), (x, y), ...}. Obviously this can be transformed to whatever PolygonROI accepts, but I cannot find what it is!

I’ve tried:

  • {(x, y), (x, y), ...}
  • (x, y, x, y, x, y, ...)
  • (x, y), (x, y), ...
  • [(x, y), (x, y), ...]
  • [x,y,x,y,...]

The last one got me the furthest but it returns an error:
ERROR: ClassCastException at line 8: class java.lang.Integer cannot be cast to class qupath.lib.geom.Point2 (java.lang.Integer is in module java.base of loader 'bootstrap'; qupath.lib.geom.Point2 is in unnamed module of loader 'app')

The script looks like this:

import qupath.lib.roi.PolygonROI
import qupath.lib.objects.PathAnnotationObject
import qupath.lib.geom.Point2

def imageData = getCurrentImageData()

points = [4,9,178,173,240,242,248,234,280,281,48,56,262,272,204,178,207,204,22,21,270,260,66,64,59,61,173,174,103,97,199,202,0,2,221,217,18,16,187,212,107,103,232,253,40,22,210,213,211,187,10,17,161,177,258,247,167,147,202,205,236,225,272,285,247,240,147,143,284,286,225,218,213,219,32,36,3,0,220,224,21,24,72,90,286,288,206,197,2,1,162,166,165,162,214,194,33,18,68,67,36,44,224,223,24,27,228,232,217,211,287,275,253,270,149,161,242,236,65,66,245,262,17,26,168,167,194,186,201,210,275,278,42,46,227,228,205,207,279,284,64,78,54,40,234,245,283,268,268,258,61,63,1,8,204,215,95,79,46,52,177,206,13,3,230,227,223,221,105,95,9,5,85,134,143,105,96,149,16,19,281,283,90,96,98,75,278,280,174,189,67,72,19,13,219,214,5,4,282,279,197,201,260,248,134,118,75,68,78,85,97,98,212,204,189,199,56,54,52,41,186,160,285,282,8,10,27,32,215,220,41,34,131,107,44,48,26,42,160,131,218,230,118,152,166,168,159,165,152,159,34,33,79,65,63,59,288,287]
def roi = new PolygonROI(points)
def newPoly = new PathAnnotationObject(roi)
imageData.getHierarchy().addPathObject(newPoly,false)

For option 2, I’m not sure how useful going down this route will be outside of just for visual representation. If I’m not mistaken, manipulating detections can be a bit of a pain compared to doing the same with annotations. However, visualisation using detections (with different colours) might be better in the case when multiple clusters overlap spatially rather than having messy overlapping annotations. This is open for discussion of course.

I’ll be very grateful for suggestions and discussions about any of this!

Many thanks,
Yau

Have you considered giving each of the objects a label/count, and using lists of labels to reclassify objects within QuPath? You could read in a text file with one column being the number of the object and the second being its class.
Cycle through all objects in a groovy script and check their number, then assign the appropriate class.

That is a great idea! It is similar to what I wanted to do with the detections, but better. I guess I meant to say “objects” instead of “detections”. I’ll play around with this and see how far I get…

Thanks a lot!

To create a PolygonROI, you can use the static method in the ROIs class here.

There is also some information about creating ROIs here: https://qupath.readthedocs.io/en/latest/docs/scripting/overview.html#creating-rois

@Research_Associate after giving your idea some thought, I wondered how QuPath would recognise an assigned number to the specific object. I can append individual numbers as ID for each object, or use their index number; but how would QuPath know that the object 0 be the top left object for me to reclassify?

@petebankhead thanks for the reply but I’m afraid I do not fully understand createPolygonROI(double[] x, double y[], ImagePlane plane). Does it mean that I should use it like so:

createPolygonROI([x1, x2, x3, ...], [y1, y2, y3, ...], plane)

What does ImagePlane mean? EDIT: Nevermind, I read your second link and I think it should only apply if I have a z-stack.

Imageplane, you will always need one of these even for single images, the values will just be 0 and 0.

I am not directly accessing the forum right now but I am fairly certain that I posted a loop to create a measurement that increments through all cells or objects and adds a number a while back. Something like


i=0

getCellObjects().each{

it.getMeasurementList().putMeasurement("count",i)

i++

}

Though that code likely won’t be formatted correctly, you might be able to use parts of it to Search for the original post.

Also, the whole method assumes that you keep the “count” column consistent within the object data during the clustering, however that is being done. If you are making a mask as the output, I am not sure this would work. I was imagining tabular output from an R script or similar.

Yes, this can be done. I’m using Python and I can append the cluster IDs by pandas without changing the order of objects.

I would prefer to reclassify as you suggested but I am keeping options open in case I want to draw the polygon annotations.

Is it this one, just without subcells?

cells = getCellObjects()
count = cells.size()
for (i=0; i<count; i++){
    current = i+1
    cells[i].getMeasurementList().putMeasurement("ID", current)
    subcells = cells[i].getChildObjects()
    subcells.each{it.getMeasurementList().putMeasurement("ID", current)}
}

In case you want to outline areas of similarly classified objects, you may want to search for another post on Hotspots. That is more of an area/XY coordinate based clustering, but it might be useful to you if I am understanding correctly.

Ah. Discussion and Script: What is a hotspot?

@Research_Associate
Thanks for the discussion so far but it still isn’t clear to me how to get QuPath to link objects to its ID.

Your script

cells = getCellObjects()
count = cells.size()
for (i=0; i<count; i++){
    current = i+1
    cells[i].getMeasurementList().putMeasurement("ID", current)
}

works well to add unique identifiers to each cell object but on re-import of the table after cluster analysis, how do I get QuPath to associate ID with the object?

I imagine it would be along the lines of:

for i in ID
    find object with measurement i in column ID      <-- what would be the script here?
    it.setPathClass(getPathClass('cluster_no'))

Once again, many thanks for your help!

I would sort the Python table by cell ID, then getCellObjects again, cycle through (look up reading text files using Groovy), and use
measurement(it, “cellID”)

to
it.setPathClass(getPathClass(“labelFromPythonFile”))

Since you can access the position in the sorted list/map from python by position using the cellID, you only have to cycle through all cells once. At least that is my rough idea. Not sure of the format of the python file or how exactly to read it. If you can writeCSV and readCSV or something similar you should be set… I think.

Hi @ym.lim,

Could you please give us a sample of the CSV produced by your python code?
If you have the coordinates of each object with their cluster assignment, as you said you could create either:

  • a PolygonROI according to the alpha shape of your cluster. Then I would use something like:
import qupath.lib.roi.PolygonROI
import qupath.lib.objects.PathAnnotationObject
import qupath.lib.geom.Point2

def imageData = getCurrentImageData()

// These would be your lists of coordinates.
// Note that the order in the point coordinate might generate a different polygon.
listOfCoordinatesCluster1 = [[4, 9], [50, 9], [50, 158], [4, 158]]; 
listOfCoordinatesCluster2 = [[14, 19], [150, 19], [150, 1158], [14, 1158]];
listOfCoordinatesCluster3 = [[24, 29], [250, 29], [250, 2158], [24, 2158]];

clusters = [listOfCoordinatesCluster1, listOfCoordinatesCluster2, listOfCoordinatesCluster3];

for (int[][] cluster: clusters){
    List<Point2> pointList = new ArrayList<>();
    for (int[] coordinates: cluster) {
        pointList.add(new Point2(coordinates[0], coordinates[1]));
    }

    def roi = new PolygonROI(pointList);
    def newPoly = new PathAnnotationObject(roi);
    imageData.getHierarchy().addPathObject(newPoly, true);
}
  • A detection per object, each assigned a specific PathClass (cluster 1, …, cluster n). I would give a sample code but I think it’s better to know first what your data looks like :slight_smile:

I think you discussed both options above? But it would help to know what objects/coordinates you have (e.g. if your objects are “points”, then for the second option you could create a new Detection Object for each with a simple circle by giving it the coordinates of the centroid, then assign it to the right cluster class …)

1 Like

Ah, yeah, that was another way of writing it.

Though reading Melvin’s post makes me wonder about your workflow. My advise was given under the impression that you were clustering QuPath objects that you were exporting, but if you do not already have objects then my previous advice isn’t really as helpful.

Thanks for the response, @melvingelbard!

I’ve attached a sample of what my data would look like.
dataset_test_hdbscan.csv (5.3 KB)

It will basically be the export of cell detection measurements with the cluster IDs appended at the end as new columns.

  • “hdbscan_colour” for clustering based on colour, “hdbscan_spatial” for sub-clustering each colour cluster for it’s spatial location by x, y coordinates
  • “-1” indicates Noise, “>0” are for Cluster 0, Cluster 1, etc.

As with my discussion with @Research_Associate, it is possible for me to include “Cell ID” if that will help in linking the imported data back into QuPath to specify which cell object to reclassify into the corresponding cluster IDs.

My objects would be cells for now, so I would probably be looking to assign it to a Cluster class. Is it possible to assign objects to multiple classes? For example: T cell, Colour Cluster 0, Cell Cluster 0.

My workflow would be:

Cell detection in QuPath > Export detection measurements as txt/csv in QuPath > Import txt/csv into Python and perform cluster analysis > Export csv with appended cluster IDs using Python > Import csv back into QuPath > Assign Cluster IDs to corresponding cell objects.

If your clusters are numbers, you could create a new measurement for those just like the CellID. Then you can visualize that through the Measurement maps.

1 Like

What @Research_Associate suggested is probably the easiest and cleanest option actually.

Though if you want something that iterates through your cells in QuPath and assigns its class according to the CSV, you can do something like this (assuming that you have exported the ID column and kept it in your CSV somewhere):

cells = getCellObjects()
nCells = cells.size()

// Get location of csv
def file = getQuPath().getDialogHelper().promptForFile(null)
// Create BufferedReader
def csvReader = new BufferedReader(new FileReader(file));

// first row (header)
row = csvReader.readLine() 

def map = [:]; // Empty map
while ((row = csvReader.readLine()) != null) {
    def rowContent = row.split(",");
    int id = rowContent[0] as int;         // !!ID Col!!
    int cluster = rowContent[9] as int;    // Cluster Col
    int subcluster = rowContent[10] as int;// SubCluster Col
    map[id] = [cluster, subcluster];
}

for (i = 0; i < nCells; i++){
    def measurementList = cells[i].getMeasurementList();
    int ID = measurementList.getMeasurementValue("ID") as int;
    int[] cluster = map[ID];
    cells[i].setPathClass(getPathClass("Cluster " + cluster[0] + " (" + cluster[1]+ ")"));
}

print "Done!"

It works for me so I assume it should be fine for you as well. Don’t forget to add an ID column (in my example I put int id = rowContent[0] but you might want to change that)! :slight_smile:

2 Likes

Thanks a bunch Melvin!

When specifying columns, is it restricted by column number (rowContent[9]) or can I also indicate the column header? This is because the number of columns can vary depending on the number of colour channels or other measurements done in QuPath.

An example would be similar to Python pandas where I can do df["hdbscan_colour"] to indicate that column regardless of the number/order of other columns in the dataframe.

No worries!

This should work then:

cells = getCellObjects()
nCells = cells.size()

// Get location of csv
def file = getQuPath().getDialogHelper().promptForFile(null)
// Create BufferedReader
def csvReader = new BufferedReader(new FileReader(file));

// Get the column indices
row = csvReader.readLine()
def columnNames = [:]; // Empty map
def cols = row.split(",");
for (i = 0; i < cols.size(); i++){
    columnNames[cols[i]] = i;
}

def map = [:]; // Empty map
while ((row = csvReader.readLine()) != null) {
    def rowContent = row.split(",");
    int id = rowContent[columnNames["id"]] as int;         // !!ID Col!!
    int cluster = rowContent[columnNames["hdbscan_colour"]] as int;    // Cluster Col
    int subcluster = rowContent[columnNames["hdbscan_spatial"]] as int;// SubCluster Col
    map[id] = [cluster, subcluster];
}

for (i = 0; i < nCells; i++){
    def measurementList = cells[i].getMeasurementList();
    int ID = measurementList.getMeasurementValue("ID") as int;
    int[] cluster = map[ID];
    cells[i].setPathClass(getPathClass("Cluster " + cluster[0] + " (" + cluster[1]+ ")"));
}

print "Done!"
2 Likes