Cluster analysis

I am using qupath for the characterization of the imunne infiltrate in breast cancer tumors. I have already performed a quatitive analyses but I am also interested in cluster/hotspot analysis.

I already tried the Delaunay cluster features but I can’t get my head arround it. Can anyone help me with this? I want to detect hotspots of DAB stainings.

Can you define more precisely (ideally with images) what you mean by ‘detect hotspots’?

The Delaunay command may not be very useful in this case. I have only used it myself to identify the immediate neighbors of cells with a particular classification.

Hi Pete,

I would like to detect (count + localize) areas with a high concentration of positively stained cells, I have marked some spots in … in the image.

Thanks, that helps. However, to be automated there would need to be a way to define precisely and unambiguously how the ellipses are determined.

For example, I can think of how to write a script that identifies the the ‘hottest’ circular hotspot in the entire image (i.e. the highest positive percentage) after defining the following parameters:

  • Diameter of the circle
  • Minimum number of cells inside the circle

I have written a similar script in the past, and occasionally think of putting it into the software but haven’t got around to it yet (or needed it myself). It works by effectively calculating the positive percentage and cell count for every* possible circle of the specified diameter in the image and then taking the circle that has the highest % and also enough cells.

This could be extended to give e.g. the top 5 non-overlapping hotspots, although it’s not guaranteed that this would match with what you’d choose -mostly because the circle size is fixed.

I find it more difficult to think how to unambiguously define a way to identify hotspots of different sizes (and perhaps not all circular), or how these should be interpreted afterwards.

Do you have any ideas on this? Or more information about how you would summarize and interpret the results from the calculated hotspots, however they are defined?

*-In practice it is limited to a resolution of perhaps 10-20 μm, so it might not be exactly the hottest possible hotspot… but it should be very close.

1 Like

The idea of the 5 non-overlapping hotspots, is indeed not what I had in mind at first but I think it may also be very interesting to test.

I am just looking in to the hotspot analysis and I am not sure what the possibilities are but ideally I would like to know:

  • The location of groups/hotspots of positively stained immune cells in the tumor (invasive front or tumor center)
  • The area of these hotspots
  • The density of positive cells would be (number of + cells/mm²)

I also came accross some papers that use the Getis–Ord hotspot analysis (e.g. PMID: 25720324), it identifies regions with high prevalence of a certain type of cells by using the classifacation and localization of the cells.

Have you considered using the pixel classifier or SLICs to find the areas as annotations (due to some smoothed intensity of DAB)? You could then use the centroids of the annotation, or run cell detection within each annotation, or whatever you wanted after the fact.

Quick pixel classifier on your image. Those areas could each be converted into annotations, and cell counts performed after.

Not going to do the cell detections on this image, and you can see that the green circle mucked things up, but it gives you an idea.
Doing it with SLICs would look something more like this:

But should get you a similar result.

1 Like

And as far as cluster features, I imagine they could be used something like this, to borrow the LuCa image again (results are the same as the final pictures from here):
And if I add cluster measurements, I can start looking at clusters over a certain size. You will almost certainly have VERY large negative clusters if you do this with all of your cells! Clusters continue to grow as long as centroid to centroid distance is within the entered value.

I think the precise, unambiguous definition of what a hotspot is remains difficult. @Research_Associate has some good suggestions, but I don’t know what is suitable here.

Considering some of the screenshots above, it’s clear that if the hotspot has an arbitrary shape that ‘hugs’ the dense regions, you’ll get a much higher percentage or density of positive cells than if you choose a circle or ellipse that also contains a substantial proportion of tumor cells. In general, I’d expect that measuring larger areas results in lower percentages (or densities).

For the Getis–Ord hotspot analysis, it looks like H&E images were used and cells were classified into different types. It might be possible to train a classifier here, but it’s hard with IHC with just hematoxylin and DAB whenever the brown staining can influence the classification decision for neighboring cells more than it really should.

Currently, QuPath gives the tools to implement hotspot analysis (albeit using scripting) but it doesn’t have any built-in. I certainly plan to change that one day, but figuring that what to implement remains a substantial research topic…

Going back to the original idea, I think there are two main ‘simple’ metrics you can get:

  • density of positive cells (either expressed as a positive %, or as a number per mm2)
  • hotspot area

I’d say that you can’t meaningfully get both: either you need to keep the density fixed and measure area, or keep the area fixed and measure density.

If you allow both to change, then your density can pretty much always be made higher by decreasing the area, because then you can choose only the very hottest bit of any ‘hotspot’. Taken to an extreme, you’d get a lot of hotspots, each with 100% positive cells, if you were to treat each individual positive cell as a distinct (very small) hotspot.

Previously, I suggested it would be possible to get the hotspots with the highest density for a given area.

You could also go the other way: get the hotspots with the largest area having a minimum fixed density. Basically, you’d need to calculate the local density for every pixel in the image* and identify hotspots as being clusters of pixels where this density exceeds a threshold. Now the area of that hotspot might be meaningful, but the density is not so meaningful because it depends upon whatever density threshold you chose.

I don’t know enough about the underlying hypothesis or what exactly you are working on to be confident this is relevant, and it doesn’t pay any attention to type of any other cell (e.g. if the potential hotspot is anywhere close to the tumor).

Anyway, it’s something to consider. I’m sticking with my view that the definition of a hotspot is troublesome and non-obvious. One day I’ll try to write the code to implement both approaches, but for now other work is calling…

*-At some manageable resolution… perhaps 10μm per pixel.

Hi, thank you very much.
I would really like to try this, I attempted to recreate what you did here but I failed.
Can you maybe help me out, which steps should I follow?
Thanks in advance!

We are studying age-related changes of the immune system and its impact on tumor immunity. We have already looked at the % sTILs and are now characterizing the immune infiltrate in the tumor with the help of Qupath. I have quantified the positive and negative lymphocytes in three different regions using the “detection classifier” function. We have already calculated the densitiy of positive cells but we are still wondering if we got the most out of our data.

That is why we were now looking into the hotspot analysis. We are very curious to know if there is a difference between our younger and older patients when it comes to hotspots of immune cells (where are they located, how big they are and how dense they are).
But indeed it is hard to define the hotspot and I agree that we should choose either to keep the density fixed or the area. I was wondering if you the script that you mentioned earlier (the one that will define the circle with a specified diameter, with the highest % positive cells) can be found somewhere or if you are willing to share it?

Anyway, I would already like to thank you for the help and I will take all your suggestions and advice with me and will start by looking in the the suggestions of @Research_Associate.

Which image/images specifically are you interested in? The pixel classifier can only be run in 0.2.0 (Classify menu I think, cannot be run as batch), while the SLICs could be run in any version, but do take several steps. You would need to first have an annotation to create the SLICs within, just like cells.
Create the SLICs, and here you may want to play with the settings a bit, I can’t tell what good ones will be without testing on an original image.
And then use Add Intensity Features in the Calculate Features menu shown above, which will create color measurements for each SLIC tile that can be used in a classifier, in the same way that DAB stain can be used in cells.

Also in version 0.2.0 there is a Specify Annotation command that can be used to create an object of a particular shape, within the Objects menu. You could duplicate that a bunch of times.
Pete has a script here to create a rectangle, which you could then possibly edit to use create a new EllipseROI instead.

I would really like to try the pixel classifier and the SLICs.
For the pixel classsifier, I think this one will be the easiest and fastest to work with? But I am just not able to recreate what you did on the same picture. I have no clue what to use as settings and what steps to follow.

For the SLICs, I am able to create them but afterwards, when I try to add the intensity features, nothing shows up in my measurment map. I started with an image where the tumor border was defined with an annotation - run the SLICs and got this:

Then I tried to add the intensity features:

But when I open my measurement maps, nothing shows up?

I think I may be skipping a step? I am sorry for the inconvenience!

I think you added the intensity measurements to the Annotation itself (which can be useful at times as an overall measure of the tissue staining), but not to the detections. Measurement maps will only show the detection measurements.

Also it may be quicker to allow larger holes in the tissue annotation, so that you don’t have to filter out whitespace later. Though this will make the tissue detection itself run slower.

The pixel classifier is probably best described here, see the YouTube video.

I managed to work with it, thanks for all your help!

1 Like

I’m afraid not, but it’s on my todo list to re-write it if I don’t find it eventually. Might take me a while though…


In a multiplex IHC image, 3 markers + dapi. How could I identify clusters base in one fluorophore expression along all the tissue with QuPath?

Currently the clusters can be set to within a class (assuming you already have classes set, more on that here), but you do get all of the classes. If you want to limit your data output, I would recommend something like deleting all cells that are not of an interesting class, and then adding the cluster features.

So that would be:
Detect tissue
Detect cells
classify cells based on a channel
delete all negative cell objects
run delaunay cluster analysis
export results

If you want something other than a list of tightly connected positive cells, you need to elaborate on what a cluster is to you. As you can see above, there are varying definitions for clusters/hotspots/etc depending on context and study.