Calculating R-squared values for detection measurements

Another script, this time with GUI!

Very briefly, it takes two measurements that exist in your population of detections (selected by drop-down menus), all classes or one as a subset of those detections, and then calculates a best fit line and R-squared value for the resulting data point pairs. The script also provides a plot of those data points (plot axis values are determined by min/max values and cannot be changed at this time), along with the option to save to csv all of the calculations you have made since starting the script. The calculations will be saved to the project folder within a subfolder called “R-squared results.”

Always do be careful with R-squared values as not all relationships in your data will be linear, and this is based off of a best fit line. A plot is always generated, so do take a moment to look at it!

image

The interface is in the upper left hand corner in the picture and consists of the two drop down menus for selecting measurements, the class dropdown and all classes checkbox (which overrides any selected class), and the calculate button that makes it all go. Any time you click Calculate, a new plot will be created, and a line will be added to an array that keeps track of every result you have calculated. If you decide you want a list of those results, you can click Export all calculations to CSV when you are done, and go and collect the results!

In this example, I just picked some fairly random values to see if anything was strongly correlated (I didn’t expect anything to be), and then threw in a comparison of the nuclear area and perimeter to prove that everything was actually working.

Please do let me know if you run into any issues with the code, but so far it seems to work for both main versions of QuPath, though the file name for the export changes slightly. Suggestions welcome for ways to improve the code, or take it and run with it :slight_smile:

If you want the data points to plot in something more configurable, I recommend using something like:

saveDetectionMeasurements("C://Path//To//nowhere.txt", "Nucleus: Area", "Nucleus: Circularity")

1 Like

Quick update to this, I merged the R-squared script with the colocalization script to create a new script that calculates the R-squared value of the pixels within each detection area/region for two channels of choice.

image

Then look at the results in the Measurement Map.


I just noticed when using the previous script with VERY small tiles (say 3x3 pixels) I have managed to get R^2 values of greater than 1. I haven’t tracked down what is happening there, but wanted to make a note of it in case anyone else noticed something similar or tracked down what happens in those cases. Seems to work well 99% of the time though.

1 Like

Very Fine Work!!
Thanks,
Bob

1 Like

Nice! I only explored a little bit, but one small change I’d suggest is adding all your data points to the scatterplot in one go, e.g.

 XYChart.Series scatterPlot = new XYChart.Series()
    def data = []
    for(i=0; i < cells.size()-1; i++){
        data << new XYChart.Data(points[i][0], points[i][1])
    }
    scatterPlot.getData().addAll(data)

Lots of things can trigger events to be fired in JavaFX, and you should find the change makes it much faster.

There are also ways using Java streams (as opposed to Groovy collect) to create summary statistics all in one go and avoid (effectively) multiple loops, e.g.

def xStats = Arrays.stream(pts).mapToDouble({p -> p[0]}).summaryStatistics()

and then request the min/max/sum from those stats.

However you can also use Apache Commons Math (already accessible in QuPath) to get R-squared values and avoid the need for this calculation entirely, e.g.

def regression = new org.apache.commons.math3.stat.regression.SimpleRegression()
regression.addData(points)
print('From commons-math3: ' + regression.getRSquare())

It would be good to check what happens whenever the values become very large to give some warning of when things may go bad, and to compare if the commons method is any more robust.

1 Like

Thanks! I have updated the scripts and they do run much faster now, though the 2channel Rsquared for pixels script is still slower through the ImageJ calls. Don’t have any data sets on me to try to stress the max().

Mmm, yes, the increased speed has already been very useful for demonstrating to people where bleedthrough or autofluorescence are occurring.