How do we reasonably show a restoration algorithm is "quantitative"?

Hi all

I recently stumbled on a pretty good analysis by @SVI_Huygens on Intensity Preservation where they present Huygens deconvolution vs. “mystery product deconvolution”. This got me thinking of how we would fairly test restoration in CLIJ, Imagej-ops, CARE and other open source libraries. Under what criteria can we claim open source restoration algorithms are “quantitative”. How do we do a fair apples to apples comparison ?

If you look at the white paper. The intensity conservation in Huygens deconvolution is certainly close to linear. The “Mystery Product” is non-linear at low intensities.

Was this report fair to the mystery company? Are the mystery’s company’s results typical of most vendors or an outlier? What algorithms are actually being tested here? (These questions are not really a criticism of @huygens at all. I can see why they tested this way, as there are not currently benchmarks publicly available to do a fair comparison).

Similar work was done by Claire Brown where she tested Autoquant Blind Deconvolution (I was involved in the early planning of that project but had no part in the final paper). Is Autoquant linear? The paper claims so, but if you look closely at Table 1 the response has non-linearities at low intensities.

Intensity mean original Intensity mean iterative blind ratio
56 160 2.86
126 650 5.16
310 2160 6.97
1280 8600 6.72
3280 22400 6.83

Why wasn’t the data of other products shown in this paper? Why only 5 data points? Many bead images were collected so wouldn’t it be proper to show each bead as a separate point instead of averaging them? As the @SVI_Huygens white paper points out edge handling and PSF centering /normalization are very important, so you would want to look at each image and even each bead separately to detect potential non-linearities between blind deconvolution runs (with potentially different blind PSFs) and spatial non-linearities (because of edge handling).

Are the possible non-linearities at low intensities in some algorithms important? Or is it enough that the curve is linear at higher intensities?

And linearity is only part of the story. As @Kota recently showed morphology is very important, and the PSF of a microscope causes changes in morphology. To call something “quantitative” is it enough to just consider linearity or does morphology have to be considered?

Is it black and white? Can we just say, such and such an algorithm is “quantitative” or not? What is the threshold to proclaim your algorithm is “quantitative”. Is it a certain R value for the linearity curve? An MSE on a volume measurement ? Neither metric will ever be perfect so how do we decide what is quantitative and what is not?

Does the test need to publicly reproducible? If not how do we test CLIJ and Imagej-ops? Do we have to redo calibration images and make our own CLIJ test set? What about testing CARE and other machine learning? It would be interesting to have a large set of bead images of different known sizes, large enough to divide into training and testing.

In conclusion I believe the solution is to have large publicly available tests, and have developers of algorithms run the tests themselves to ensure a) the algorithms are run properly and b) any potential problems the test reveals can be addressed (tests should make products better!). This would be better for all the big companies, “mystery companies” and open source efforts out there.



It would certainly be better for the users. The companies that did not come out “on top” might disagree. :slight_smile:

1 Like

When I worked at a company I tried not to “associate” algorithms with the company and instead concentrated on implementing algorithms from the literature that worked the best, and tried to ensure they were implemented properly.

Most algorithms begin as research papers and the details are mostly public, so if a company has made a mistake in their implementation, that is something they should be happy to find out about and correct it.

I’m not sure “coming out on top” should even be a thing. Shouldn’t everybody be implementing well understood and tested algorithms? Instead of “coming out on top” shouldn’t reward go to the companies and open source efforts that can reproduce each others results?


@bnorthan totally with you on all that

In the end we have to be pragmatic… it depends on what will be done with the image next.

Indeed shape correctness and intensity linearity are both important, but maybe sometimes one is more important than the other. There is no black and white here. The real answer is impossible to get, and all different algorithms will get close to it but be wrong in different directions. We cant explore every variable here, so we have to aim at best case for specific applications.

My pet topic here is low signal images as these are what you get from tricky live cell and bacterial images
I like algorithms that do a good job with noisey images and get the structure as right as possible and also intensity linearity. For me that where the fun is.
Thats why I want to do a reference, from the book, implememarion of Agard’s acellerated van Cittert/Gold algorithm, because on delatvisions, it has been doing a great job for many year, with only 5 or 10 iterations. But soon will be no longer available for new users. So it needs to be reimplemented as open source… after I get it working in a slow clunky but correct manner, we can figure out how to clij it.

To wrap up, the algorithm itself is much less important than the quality of the used PSF. Crap in, crap out. Meaning , how well it matches the actual PSF in experimental images. A computed PSF rately cuts it. We could figure out how to use the nice semi empirical .OTF files from the deltavision Olympus lenses. They are mutated .mrc files with a real and complex part. A bioformats importer is a god way to go here but beyond my skills, as the need to be de-rotationally averaged. Maybe UCSF priism / IVE has some clues.


Is it possible to implement the Agard/Gold in Python or Matlab first then we can take a look and see how to port it to CLIJ?

@bnorthan yes I think so.
I can maybe do it using imageJ FD Math if I can figure out a wiener filter or something that is close enough.
The javascript convolution / deconv demo script I wrote, it’s on my guthub and linked from the imagej website deconvokution page, is a fair start… it already contains the text description and some of the required operations. I will take from that and make a new script, python or javascript or whatever. It’s more complex than RL as there are a few distinct modular parts to the algorithm for the first few iterations vs the rest of the iterations.

1 Like

I had a couple of people ask on Twitter how they could run a similar linearity test to the SVI white paper and Claire Brown paper. Below is an outline but I am very interested in hearing other’s thoughts.

  1. You need a sample with sparse structure of known morphology (bars or beads) and several different known relative intensities. Ideally there would be a publicly available test image set for this, so everyone is testing on the same data.

  2. Before deconvolving carefully consider pre-processing, especially background subtraction. Ideally the preprocessing of the deconvolution implementation can be separated from deconvolution, so each algorithm tested can be preprocessed the same way. We could consider a common background subtraction script to try and make sure background level is not a cofounding variable.

  3. Deconvolve with a specific implementation.

  4. Run an (ideally common imagej/python) script that takes the input and output images and compares regions of known intensity levels in each, then plots a input vs output intensity scattergram and fits a line. Careful consideration should be paid to low intensities, because there could be non-linearities at low intensities, even though a line fit R-value is very good. Details of how to write such a script should be an open community discussion.

Would love to hear others thoughts on the best way to do this.


Hi Brian et al.,

wow, lovely thread! We would be super excited to have a better idea for how to quantify the quantifiability of intensities (and predicted morphology?) after Noise2Void. Our informed gut feeling is that Noise2Void does in general a very good job in predicting intensities that are true to the number of fluorophores present. Hence, intensity quantification should be not only possible, but potentially much simpler whenever the raw data has low signal to noise.

Anyways, so far we never attempted to quantify the quantifiability. Reasons are an ubiquitous lack of time and the insight that its not easy to nail it down so well that people would be fully convinced and all corner cases are being explored sufficiently well.

I think the existence of a public dataset of known relative intensities (as @bnorthan pointed out in the list above) would be AMAZING… and whoever does the heroic job of acquiring such a dataset should please, please, please do the imaging at many signal-to-noise levels. That would be great for Noise2Void and the quantification of the quantifiability of denoising results in general, it also is interesting for checking deconvolution methods, wouldn’t it?.

One more thought: it would be great if not only easy structures (lines, circles, beads) but also structures with more complex morphology would be contained in such a reference dataset. Of course I understand that sprinkling beads on a slide is much simpler than finding complex structures of known intensities (fluorophore concentrations)… but… this is really more a whish list I guess… :wink:

Last thing: 2D / 3D / 4D? I could live with some 2D images and 3D volumes being part of the wishlist-dataset, but others might have other opinions.

Hope some of you find my thoughts useful,


Dear Brian and other contributers.

During our more than 25 years of existence we’ve been frequently asked whether deconvolution is producing ‘quantifiable’ results. A very valid but also a broad question indeed!

The aim of the study presented in our white paper was to address at least part of this question and to make researchers aware that imaging and image processing should be done with great care. Especially when it comes to automated solutions, and not only for quantification but also simply when images are presented and judged by eye. The mystery product is advertised as an automated solution and we tested it against our automated deconvolution solution: Deconvolution Express. So we believe the comparison is an honest one. Our automated DeconvolutionExpress results can be reproduced using manual settings in Huygens. You can find the white paper via this webpage:

The Argolight sample used for this white paper is commercially available at, and we also published details on the morphology of GATTAquant STED beads before and after deconvolution ( Huygens results can thus be reproduced and have in fact been confirmed by non-SVI users (see the reference list on IntensityPreservation | Scientific Volume Imaging).

Algorithms and also their implementation are important as they may differ in their capacity to be regularized and to deal with both high and low signal data. PSF should include relevant microscope-specific parameters (for example immunity fraction and saturation factor with STED), and/or include system-specific aberrations as can be distilled from bead images (

We like to highlight this great microscopy community initiative on 'quality assesment and reproducibility for instruments & images in Light Microscopy": and the recent paper:

Vincent Schoonderwoert


Hi Brian,

this is not exactly a large public test set, but I’ve been working on a simulation framework for quantifying errors of different deconvolution algorithms.

The framework generates synthetic images, deconvolves them with a specified deconvolution algorithm (supports external software), and quantifies deconvolution errors. This is done in a batch mode to test various deconvolution algorithms and their parameters simultaneously, as well as various modes of synthetic images.

Unfortunately, the first version of the framework turned out to be clunky and not very user-friendly. It is also quite hard to integrate additional deconvolution software. Only simple input objects and PSFs are implemented so far, and there are only a few available deconvolution algorithms. The error quantification is also quite basic.

I am currently working on an improved version of this framework, which should be more flexible and easy to extend to further synthetic objects, PSFs, restoration algorithms, and quality measures, with a possibility to use real recorded images instead of synthetic ones. I am not working on this very vigorously though, since I don’t know whether there is enough interest.

Do you think something like this would be useful to evaluate - in a standardized way - how “quantitative” and accurate a restoration algorithm is?



Something like this would be very useful. Especially if it has flexible ways to implement synthetic objects. If you had a way to quickly generate a bunch of simulated images, you could not only test deconvolution algorithms on them, but you could potentially also use them as input to test a deep learning framework like CSBDeep.

The synthetic objects framework already looks pretty useful. I’m going to try and test it out in the next few days and maybe I’ll have some more feedback.