Resolution enhancement with machine-learning

Hi all,

I’m not an expert on AI and deep learning, but I do understand the basic principles of it. I can ‘accept’ that machine learning can be used for things like deconvolution or denoising. However, more and more people seem to suggest that AI can transform a normal resolution image into a super-resolution image. This is much harder for me to understand. I don’t see how an algorithm would be able to ‘create new information’ that is not present in the original data. I can understand that an algorithm might be able to transform the original image into the most likely super-resolution image, but then what is the benefit in the context of scientific research? If you need to train the algorithm on many super-resolution images of a very similar sample, then what can you learn from the newly restored image? All the information the algorithm used was fed through real super-resolution images (usually), so the restored image does not really contain new information. So in a certain way you don’t learn anything new from the image. And if you want to apply the algorithm to a new sample to actually learn something new, it’s not possible, because the algorithm wasn’t trained for that sample. I hope my way of thinking is a bit clear, and maybe it’s also a bit of a philosophical question, but the idea of using AI to transform a low NA objective image into a super-resolution image really confuses me.

Thanks a lot for sharing your thoughts.


Hi @EliS, thanks for this very interesting question.
My feeling is that to enhance resolution, you need to train a ML tool. To do so, you either need lots of reference images either a model. In both cases, you already know how your images will look like, so you won’t discover anything. So, for me, in this case, the only interest I see in enhancing resolution is to get nicer images for display, but not for research.
The only “real” interest I see to ML resolution enhancement would be to try ML to validate an hypothesis ; Let’s say you have a good ML tool and a low resolution image. You wonder if your image contains Something you can’t see at this resolution. You apply your ML enhancement tool. If the resulting image is incoherent, your Something was missing. If your resulting image is admissible, your Something was probably present.

What do you think of that ?



Hi Eli,

I’m 100% with you!

Fact: you cannot create information that is not already contained in the original input data by any means, including deep learning.

So why do people do that? I’m not sure if I can answer for all or even most of them… people are often motivated by weird things.
The only way to try to see some value in this endeavor is maybe something like this: if the low-res image does in principle contain some structural information but is not so crisp and nice that it is also easy to see (or even better, analyze) it… maybe it helps then to train a network that can understand those little visible indicators and translate them into something we are used to spot/analyze.
The nature and amount of necessary controls for any biological findings based on such ‘virtually’ super-resolved images is currently up for our community to asses/discuss/fight over. I am, and I guess people in my lab are, generally cautious going down this road.

Not sure if this answer satisfies you, but I’m sure other people have other opinions, and hearing them will be nice!!! wink, wink

Shout out to Uri Manor @umanor … he might want to opine on this thread … I assume! :slightly_smiling_face:



I rarely see people discussing resolution using “two point” tests or showing examples using the Rayleigh criteria (see figure below, copied from an astronomy site). Often times people use the full width at half max of a point source in a processed image, or the frequencies present in the OTF of a processed image as a proxy for “resolution”. However, these measures can show arbitrarily high frequency for processed images. If you overprocess an image using Richardson Lucy Deconvolution, you can get arbitrarily small FWHM for a point source, and arbitrarily high frequencies in the OTF. Same with a deep learning method, especially if it is trained on high frequency structure… it will reproduce “high frequencies” in the output image, but that does that mean it has increased resolution.

It is always interesting to see two point tests, or tests on cleverly designed phantoms. Though there can be practical difficulties with that.

An important aspect of this is “how much” super-resolution, and how have you measured the increase in resolution? If someone is claiming they have “4x” super resoluiton without much detail on how they concluded this, it’s a lot different than someone claiming they achieved a 25% decrease in the distance between resolvable points as measured by the Rayleigh Criteria.


Just linking in Brian’s post here, which mostly covered how I felt about the whole thing.

I have also seen a lot of activity on Twitter from Uri Manor where they want to use less light (more noise) to keep cells alive for longer, but attempt to denoise the sample after. Even that I think is a bit suspect because unlike the CARE paper (images in above link) there is no ground truth to compare the movies to, though you could probably compare different models to determine likely error points.

1 Like

Hey everyone,

This is a really useful discussion on multiple levels, but particularly for me as I prepare our preprint for official submission.

tl;dr - I think everything needs to be used with extreme caution, and validated for each specific application until we come up with more generalized models for microscope systems as opposed to models for specific sample types. That said, I also believe that even if a prediction isn’t 100% accurate, that doesn’t mean it isn’t 100% useful. Because sometimes the cost of 100% accuracy far exceeds the benefits of what you can do with lower accuracy - in extreme cases it is an all or nothing situation!

I think the current state of the technology is such that we must have some way to validate the output of our models for each specific application we wish to use it for. This is in contrast to deconvolution or other denoising methods, which are based on fixed rules/assumptions that are well-established and known.

I do hope to someday generate a model for each microscope instead of each sample type, or some other way of creating a more generalized model that can work for whatever unknown sample we put onto the scope. But we are not there yet.

As far as why superresolution can work for such low information content, I think of our output as a prediction that is based on past knowledge (i.e. the “content-aware” in “CARE” is not just a cool sounding acronym - it has meaning!). So if the model has seen enough training data relevant to the test-data, it can make a pretty damn good prediction about what is really going on in the image. Whether that prediction is sufficient for drawing meaningful, useful scientific conclusions is up to the researchers (and reviewers) and whether they can validate their findings.

As far as the utility of any prediction that is less than 100% accurate compared to ground truth (i.e. why not just get ground truth data?), I think we explained this fairly well in our paper. You can image faster and with lower doses than is physically possible with “ground truth” settings. That opens up new possibilities for imaging experiments.

How do we validate our model when you can’t take simultaneous high vs low res movies on our system? What we did was use the same semi-synthetic “crappifier” method applied to some ground truth data, then tested the accuracy of the prediction - FOR OUR FEATURES OF INTEREST. I’ve gotta emphasize that last part because it is entirely possible that if we tested it for some other feature, it could very well fail and we’d need to recalibrate our model, training data, etc accordingly. But for our live imaging, we were interested in two major feature types: Mitochondrial dynamics (i.e. fission/fusion) in cultured cell lines, and mitochondrial motility in neurons.

For the mito dynamics, we validated our model by looking at the ability of our model to detect (generate?) breaks vs. continuities in mitochondria. Those two types of features are what we ultimately need to be able to see if we want to detect fission events, for example.

For the neuronal motility, we focused on a key problem which is that when mitochondria move past one another, they “blend” into a single structure which makes tracking difficult. So we checked whether our model could reliably “resolve” individual mitos as they moved past one another.

Finally, we used pixel-wise metrics (i.e. PSNR/SSIM/FRC) to measure the ability for our models to at least reasonably outperform nearest neighbor methods for superresolution (i.e. bilinear), and found our model always outperformed.

One interesting thing we played with but did not include in our manuscript (or in my Twitter thread) is a control I’ve not seen elsewhere: We measured the PSNR/SSIM of individual ground truth acquisitions which I was thinking could provide a “ceiling” for what we might expect our model to be able to do. the problem is that the ground truth data is extremely sensitive to movement, noise, etc in the sample, and so our PSNR/SSIM values were lower than those of our model output. The other confounding issue is our loss function is MSE (directly related to PSNR), and so our model was already trained to minimize PSNR and I think that’s kind of “cheating”. Long story short, I think PSNR/SSIM are useful but should be interpreted with caution.


I am really not that great with optics (and only so-so with analysis), but would it ever be possible to have something in the lightpath that splits the collected light not by wavelength (or maybe by wavelength if you really trusted that there was an even distribution, but I have a feeling that might be a flawed assumption), but something like 90%/10% (photons) to two identical detectors on a point scanning system? Turn up the gain a bit on the 10% (or not, depending), and while you would be burning your sample, you might be able to get something like a “real” and “crappy” signal at the same time point.

Thought of this due to a new LSM900 that will have identical GaAsP detectors.

Again, no real experience with optics, so this might be an impossible or terrible idea.

1 Like

That’s actually a great idea. An analogous idea is with camera based systems you can split the light to two different detectors.
We have 3 detectors in our system and we could do very fast switching between detectors at different gains or laser powers if we wanted to.

But that doesn’t solve the pixel size issue, which was the main focus of our method. Unfortunately, our current setup does not allow for simultaneous (or near-simultaneous) acquisitions of the same sample with different pixel sizes.

I did realize recently that we could in theory do something similar with our camera based system by acquiring sequential images with binned vs non-binned pixels. But that is a whole other project we haven’t even started yet (but are excited to do when we finish wrapping up this paper).

1 Like

Would it be significantly different to downsample/bin the image collected at high resolution/low power (the 10% in the example above)? You would be generating the larger pixel size in the software afterwards, which seems like a reverse version of the process you were testing. Which is, I guess, what you were saying in the last paragraph. I think. :slight_smile: I more meant binning the image itself rather than the collection method as I haven’t had access to cameras, just PMTs.

Not that it is important now, but as a theoretical exercise.

That is how we started - computationally binning pixels for both training and testing purposes. But for testing purposes it became important to test on real world data, and that was when we figured out that the noise, dynamic range, etc. required significant computational “crappification” for both training and validation purposes. At the end of the day, what you are suggesting leads us right back to where we ultimately settled on - using semi-synthetic “crappification” data for both training and validation.


Ok, I probably should read the paper in more detail. I read a bit of the Tweetorial, so… definitely not an expert.

It “feels” (yeah, yeah) like the deep learning approach with the crappifier is, in large part, a reverse crappifier (and more, yes!). So the image is being processed one way, and then the model is learning the crappifier to reverse the process. With two images taken at exactly the same time but with different settings/exposures/gain/etc, plus some binning, the model would be learning the binning, but also the differences in the image acquisition as the noise would be “real.” And ideally the 10/5/whatever% image would be the exposure you were aiming for on the biological side to keep the sample alive/unbleached.

If I am understanding things, an ideal situation, I guess, would be a system that could collect both the low res image on a binned camera and the high res, high power image on a second camera at exactly the same time. But short of that, I think digital binning wouldn’t be too far off.

Back to reading though, hopefully I can understand this better soon :slight_smile:


They make 90:10 beam splitters. Just need a two camera setup, ideally with simultaneous acquisition. I don’t know if they sell them in easily mountable formats, but they’re designed to handle raw lasers so input power isn’t an issue.


Hi Eli,

I work on single-molecule localization microscopy and am not an expert in AI/DL/NN.

But your question reminded me of what people ask when “compressed sensing” started to spread across the imaging/microscopy fields.
“You used a worse system (e.g. less illumination, lousier camera, a single-pixel camera, worse optics) to first degrade your image just to show that you can reconstruct more or less the same high-resolution image with less information?”

The experts have already mentioned two of the primary benefits:

  1. Speed
  2. Less phototoxicity

I think there are also other possible benefits where I IMAGINE resolution-enhancing AI could be useful.

  1. Throughput
    For e.g., if we have 20 conditions, with 1,000 cells or tissue samples in each condition, that would be many cells to image.
    Resolution-enhancing AI will help us either by improving the resolution of each cellular image by say 1.4x for “free”, or it will help us cut down the time to image all the cells to the same resolution (because we are now imaging at higher speeds or lower illumination doses).

  2. Weak signal (no choice)
    Sometimes, we have no choice but to have a weak signal because of the circumstances. This is a recurring theme in compressed sensing. Certain applications just have specific limitations as to how much information they can obtain.
    For example, a weak signal could occur when a cell is deep within a tissue that scatters light and introduces aberrations. Or the label that can be used is at a very low concentration. Or the label is simply very dim in that physiological condition. Or we have limited excitation intensity because the illumination light is scattering, we don’t want to melt the cryo fluorescence sample, or because the laser intensity is already maxed out.
    Resolution-enhancing AI could improve the image (from within the specific limitations) to help us better analyze and understand the labeled biological structure.

  3. Two-color images
    Many colocalization studies in the past have been done with diffraction-limited imaging. With a lateral XY resolution of ~200 nm, sometimes everything appears to overlap very well. However, what if the overlap was simply because the diffraction limit was blurring the structures too much?
    Resolution-enhancing AI could improve the images from both color channels and help us better analyze the images for colocalization.
    Or we could image many cells quickly, enhance the resolution of the images, and select the most interesting cells with the most interesting features (which is colocalization in this example) to do time-consuming single-molecule localization microscopy on.

  4. Larger FOV
    Resolution-enhancing AI could in principle allow us to change a high-NA objective to a lower-NA objective to image more cells in a larger FOV (higher throughput, saves time).
    Or sometimes the biological sample (organoids) just requires us to image a larger FOV without any tiling/stitching. This again is a limitation that we cannot change and resolution-enhancing AI may be useful.

  5. Correcting artifacts in SMLM with SIM images.
    This is something that I think a few groups in the world are trying to do now. I’m not sure if this is done yet.
    Resolution-enhancing AI could be trained with SIM (~100 nm XY resolution) and SMLM images (~30 nm XY resolution) with simple samples such as beads or DNA origami.
    The resolution-enhanced SIM images could help us figure out whether the empty patches or clusters in the SMLM reconstruction are true-negatives vs false-negatives for the empty patches or true-positives vs false-positives for the clusters.

I’m just guessing what could be done with resolution-enhancing AI here. I might be wrong as to whether these sort of applications could actually work IRL.

Twitter handle: @Maurice_Y_Lee


Hi Eli,

I mainly focus on image analysis although I have recently crossed into thinking about illumination and acquisition side problems.

An important concept that took me a while to understand is that all imaging requires the use of a model to interpret the pixels. For example, you may have heard of Nyquist sampling which tells us how many pixels we need to acquire to reconstruct the full image given the spatial resolution of the system. Specifically, we need need to sample at more than twice the rate as the highest spatial frequency resolvable by the acquisition system to collect all the available information and reconstruct a continuous signal. The model here assumes 1) a bandlimited acquisition system, 2) that we know what this bandlimit is, and 3) that we have a sufficient signal-to-noise level that we can distinguish the real signal from background.

What happens if we knew more about either the system or the sample? Maybe we know about the underlying noise distribution or know how regular the structure we are trying to image is. In that case our signal model might be constrained enough to provide more detailed information.

The question at the end of the day is how well do you trust the signal model to interpret your samples.

For example, we have enormous trust in the Shannon-Nyquist model although there are immediately some problems with how we interpret it. We tend to satisfy the Nyquist critical sampling criterion using equispaced samples which assumes periodicity of the image domain. This of course is not true resulting in resolution degradation at the boundaries of our images.

In summary, the way we acquire, sample, and interpret digital images is entirely model dependent. Rather than automatically rejecting more sophisticated models that are content-aware we should seek to understand their limitations and appreciate what additional information they might provide.

1 Like

Ah, I think I see how. I was thinking that digital binning would result in the same resulting image size and “pixel size,” butI wasn’t taking into account how the scan speed differences might affect acquisition quality at 512x512 vs 1024x1024. It would take a really crazy setup to manage two scan speeds at the same time… possibly some variant of a 4Pi system.

That said, now I want to play around with taking some fixed slide scans at different settings and powers and see how the noise varies in the resulting images! I wonder just how far off a 1024x1024 digitally binned image is going to be from a 512x512 taken at speed when the sample is fixed in place, and how similar the results of gaussian blur and pixel shifts would be.

1 Like

By the way I think you’ll find this paper interesting where they use deep learning to accelerate deconvolution. Hari is one of my heroes so while I haven’t yet read this in detail I suspect this is solid work.

1 Like

Adding this to the discussion:
For those without access, it involves using a single 2D image to generate a 3D structure. It does appear that they have limited themselves to fairly flat samples. But I was wondering what other people’s takes are on the possibilities/pitfalls here?

Whether or not their math was accurate is quite a bit beyond me at this point, but the entire idea seems… I don’t know. It would be amazing if this does work, even for only fairly flat samples (adherent cells?).

I believe this is the corresponding (or at least closely related) BioArxiv preprint for those without access. Looking at it but haven’t groked it yet.

1 Like