Hi all, working on some cluster analysis scripts. Have one that calculates local cell densities per class, and between classes, but it is deathly slow right now, even for data sets as small as 6000 cells.
For the moment, Nearest Neighbors by class is working, though it can also be quite slow for large data sets (n^2, uses naive method). I had initially been using the Delaunay Cluster min distance for this sort of things within a class, before I realized that was not always accurate, and frequently returned NaN. In some cases, other clusters will “interrupt” what would normally be the nearest measurement, leading to higher than expected values.
Even with a fairly large distance for the Delaunay clusters here, the top circled red cell has no neighbors, and therefore the min distance to nearest other red cell is NaN.
Thus, this script.
The script itself is relatively straightforward, though there are two important notes:
- Annotation level measurements are, by default, turned off. Uncomment the bottom half of the script to activate them. My test case had 27 actual classes from 6 base classes, leading to over 700 measurements being added to the annotation summary list. It may be better to calculate these for the classes of interest individually in these cases!
- As indicated above, the script calculates distances between full classes, not base classes (after using the multiplex classifier). It could be adapted to detect, for example, the nearest “PDL1” positive cell to all full classes in the LuCa example, regardless of whether the cell was PDL1+something else positive, or just PDL1 positive.
The script is here.
As usual, it is easiest to use the Raw button on the right to copy the script.