I recently stumbled on a pretty good analysis by @SVI_Huygens on Intensity Preservation where they present Huygens deconvolution vs. “mystery product deconvolution”. This got me thinking of how we would fairly test restoration in CLIJ, Imagej-ops, CARE and other open source libraries. Under what criteria can we claim open source restoration algorithms are “quantitative”. How do we do a fair apples to apples comparison ?
If you look at the white paper. The intensity conservation in Huygens deconvolution is certainly close to linear. The “Mystery Product” is non-linear at low intensities.
Was this report fair to the mystery company? Are the mystery’s company’s results typical of most vendors or an outlier? What algorithms are actually being tested here? (These questions are not really a criticism of @huygens at all. I can see why they tested this way, as there are not currently benchmarks publicly available to do a fair comparison).
Similar work was done by Claire Brown where she tested Autoquant Blind Deconvolution (I was involved in the early planning of that project but had no part in the final paper). Is Autoquant linear? The paper claims so, but if you look closely at Table 1 the response has non-linearities at low intensities.
|Intensity mean original||Intensity mean iterative blind||ratio|
Why wasn’t the data of other products shown in this paper? Why only 5 data points? Many bead images were collected so wouldn’t it be proper to show each bead as a separate point instead of averaging them? As the @SVI_Huygens white paper points out edge handling and PSF centering /normalization are very important, so you would want to look at each image and even each bead separately to detect potential non-linearities between blind deconvolution runs (with potentially different blind PSFs) and spatial non-linearities (because of edge handling).
Are the possible non-linearities at low intensities in some algorithms important? Or is it enough that the curve is linear at higher intensities?
And linearity is only part of the story. As @Kota recently showed morphology is very important, and the PSF of a microscope causes changes in morphology. To call something “quantitative” is it enough to just consider linearity or does morphology have to be considered?
Is it black and white? Can we just say, such and such an algorithm is “quantitative” or not? What is the threshold to proclaim your algorithm is “quantitative”. Is it a certain R value for the linearity curve? An MSE on a volume measurement ? Neither metric will ever be perfect so how do we decide what is quantitative and what is not?
Does the test need to publicly reproducible? If not how do we test CLIJ and Imagej-ops? Do we have to redo calibration images and make our own CLIJ test set? What about testing CARE and other machine learning? It would be interesting to have a large set of bead images of different known sizes, large enough to divide into training and testing.
In conclusion I believe the solution is to have large publicly available tests, and have developers of algorithms run the tests themselves to ensure a) the algorithms are run properly and b) any potential problems the test reveals can be addressed (tests should make products better!). This would be better for all the big companies, “mystery companies” and open source efforts out there.