[NEUBIAS Academy@Home] Webinar "A beginners guide to content-aware image restoration — an introduction to the methods and tools within CSBDeep" + Questions & Answers


To following questions have been asked during the NEUBIAS Webinar “A beginners guide to content-aware image restoration — an introduction to the methods and tools within CSBDeep”. A recording of this webinar is now available on YouTube.

Thanks and credit for answering these questions goes to (in alphabetical order):

  • Tim-Oliver Buchholz, CSBD/MPI-CBG, Germany
  • Florian Jug, CSBD/MPI-CBG, Germany; Foundation Human Technopole, Italy
  • Alexander Krull, CSBD/MPI-CBG, Germany
  • Mangal Prakash, CSBD/MPI-CBG, Germany
  • Deborah Schmidt, CSBD/MPI-CBG, Germany

We took the liberty to regroup the questions in a few broad categories instead of listing them in the order they have been asked.

General Questions

Which are some good resources on installing TensorFlow (Linux, Windows)?

We would like to include a good source, but it seems that there is not ONE instruction that works for all users. Windows, Linux, Mac… different OS versions… the installation is always a bit different. Do you know a good page that in your experience is complete? If so, we’d be happy to look at it and link to it directly from csbdeep.bioimagecomputing.com

If you have a favorite page, why don’t you respond to this thread? Thanks!

Is it better to train a CARE or a Noise2Void network? The latter would be less work for the experimenter. How do CARE, Noise2Noise, and Noise2Void compare to each other on the same data?

CARE will typically lead to the best results. Hence, if you have paired training data available, CARE is usually to be preferred. However, it can be hard to image high quality paired training data (e.g. if your sample moves, or dies when exposed to too much light). Also, the advantage of CARE is more pronounced when images to be restored are less noisy. Additionally, some methodological improvements have been proposed that close this gap between unsupervised denoising and denoising with CARE even more (e.g. PN2V, PPN2V or DivNoising). We will talk about them later a bit.

Is there a general guideline for how many sets of images we need to train an accurate CNN? 100 pairs? 500 pairs? And does the number of pairs of images depend on the signal and noise? What’s a good number of training data images for CARE to work properly? Is there an acceptable minimum?

Good questions, unfortunately there is not one single answer. How much data you need for training depends on how large your images are and how complicated the restoration task is. In all papers we write, we always state precisely on how much data we have trained… this can give you some indications. In the original CARE paper, for example, the Supplementary Material is actually offering quite detailed plots on quality for varying amounts of training data. Generally, a dozen megapixels or so start making sense, but… the more the better, and if you have an amenable sample, even less can work well.

What should we do in case of overfitting? What parameters would one have to play with to fix this?

Overfitting issues usually indicate that the training dataset is too small. What will help best is: more data! There are more subtle answers, but really, the one real course of action here is: MORE data… :wink: This being said, sometimes you could play around with the size of the network. If you have simple images/structures a smaller network might do the job just as well without overfitting.

Does CARE or Noise2Void work with RGB/multi-channel images?

In Python, they do. Predicting a RGB / multi-channel image in Fiji based on a model from Python should also work. Training on RGB / multi-channel images in Fiji is not yet supported, but we would love to see this feature being added soon.

What is the influence of the patch size on the output. Meaning, does it make a difference if I use 32x32 px or 64x64 px regarding the reconstruction quality/accuracy?

Using larger patches will in general not cause any trouble. However, at some point they will no longer fit into your GPU memory. Small patches, on the other hand, can limit the achievable quality of a trained network. It is important that the training patches are larger than the largest receptive field used in the network. If this all tell you nothing, just stick with the default parameters, they should do just fine for most datasets.

Would it be better and faster to train a new network or to retrain an existing one, which has been trained on similar data?

I would say, it is generally safer in a scientific context to use a new network, avoiding the danger of introducing a bias from the other dataset. This being said, training could be faster by starting with a sensible pretrained network. Still, since training a CARE network of any sort only takes a few hours, I see no reason to retrain and gain a few minutes…

Would you visually detect an improvement in images with PSNR of 29 vs 32?

Yes, there are examples of this later in the presentation, and our publications are full of examples too. (Note: PSNR values are living on a log-scale, meaning that even much more subtle differences, such as 29.3 vs. 29.8 are substantial!!!)

You presented today multiple tools available with Jupyter notebooks. Will they also be available for Google Colab (e.g. ZeroCostDL4Mic)?

Yes, some of them are already part of ZeroCostDL4Mic and others are planed to follow soon. :slight_smile:

CAREthe original CARE work

Can one do isotropic reconstruction in Z and deconvolution? Do I have to do it separately/sequentially?

It should be possible to do it jointly if you have the correct training pairs with high-res and deconvolved ground truth.

Will the GT images have to be normalized before being loaded into the data generation notebook?

The notebooks are taking care of normalization of the data, you do not need to worry about this.

CARE images often look even better (smoother) than the GT images. How does this happen?

Smoother is not necessarily better. Smooth looking images can mean that your network is very uncertain about the true solution and makes a compromise. Still, if you consider the findings of Noise2Noise training, it is absolutely possible that CARE predictions are of better quality than any of the images used during training. You can play around with one of the examples we offer to illustrate this if you want! :slight_smile:

Does CARE work well with artificial noise to create the low quality data from the ground truth?

If you artificially noise your data, CARE will learn to remove it. If your question was: can I then use the trained model on real noisy data, the answer is: please don’t do that (except you know EXACTLY what you’re doing). Why? Because the artificially noised training data will likely not reproduce the noise distribution you will then later feed when trying to denoise real noisy images.

In the original CARE paper, images acquired with 60-fold reduced light dosage could be restored. Question: Is there a rule of thumb on how noisy (low in intensity) we could go with the low SNR images for good restoration?

This depends on many factors. Most importantly: the amount of available training data and the complexity of structures in the data you want to recover.
Maybe the following observation of our is interesting: in order to get the most impressive denoising results with any CARE method, we usually love it to get the noisiest data our collaborators can give us. Microscopists typically optimize the images they acquire to be quite alright already when they come directly from the scope. We actually think that CARE methods would allow you all to repurpose quite some light by acquiring at lower exposures.

I think I read somewhere that CARE can be used like the Label-free prediction (FNet, from Allen Institute) to “artificially" label 2D bright field images. Is this correct?

Yes. Just feed the desired fluorescent channel as GT and the fitting DIC (or whatever) as input. Other methods that are intended for this learning task might perform better though… but in multiple courses we thought, students achieved great results when doing this with the CSBDeep tools!

For CARE, if I have many pairs of (ground truth, low quality) images of, say, small particles that undergo small random motion, does this matter? Or might I hope that the network will not pick up this motion, if it is random?

If your particles moved significantly between the two acquisitions you are in trouble with CARE. If the movement is random, this will most likely lead to a smoothed blurry prediction, because the network has to make a compromise between the possible conflicting solutions.
We have made very positive experiences denoising such datasets Noise2Void and its successors. In general, all unsupervised methods (that don’t need image pairs during training) will likely work better than supervised methods such as CARE.


How many pixels should be masked?

You mask one pixel. Full stop. Why this is not completely true will be discussed later in this webinar… :wink: (See StructN2V.)

In the first live-cell example, the restoration is really impressive, but I also noticed some artifacts in the less bright regions: it looks like the cells are ‘dancing’ and/or reshaping their nuclei all the time, which could lead to false conclusion. (I’ve seen this artifact more clearly on my own data; with 2 seconds intervals nuclei should not move.) This raises the question: How can you know when to trust the restored images, and when to be a bit skeptical?

You should always be skeptical about the results.
If you find apparent artifacts in your denoised data and you are unsure what is their origin, we appreciate it if you discuss this in the Image.sc forum with us!
Regarding the ‘dancing’ motion, the movie was not played at the original speed, but at increased playback speed.
One pro-tip: better results on movies can sometimes be obtained by treating the movie as a 3D stack and processing it with 3D CARE or 3D Noise2Void. But be warned… the temporal resolution should in such cases better be quite high, so that neighboring frames are not too different. Also, if you observe any ‘bleed through’ (where neighboring timepoints cast slight ‘shadows’ onto each other, you need to train longer… much longer)!

Where can I find more information on Noise2Void in Fiji?

On https://imagej.net/N2V all parameters are explained. Please let us know if there is something missing.

Is the structured noise a function of the camera?

Yes, it depends on the camera (or detector).

How structured can noise be in StructN2V? Are there limits?

There is a trade-off. The bigger the masked area (blind spot), the more difficult we make it for the network to fill in the missing information. If the area is too large we will not get nice results anymore.

Is there any downside to using PPN2V instead of N2V?

You need a model for the noise and the available code is slightly less mature.

When one uses SCMOS cameras one has several artifacts such as Charge transfer, hot pixels, etc. does N2V solve for this?

This is a hot topic. We are working on it. SCMOS cameras can be potentially problematic for N2V. It depends on the particular camera. One thing that helps is to acquire a dark image that contains the static noise and subtract it from your data before processing it.

Model Zoos (bioimage.io)

How useful are the models one can find online (i.e. from BioImage Model Zoo) since I thought that each model is supposed to be only used for one type of data from one single microscope with the same acquisition setup ?

Restoration models in the model zoo are only interesting when they were made for you! For all existing methods, I would never suggest reusing trained restoration models. Sorry… :neutral_face:

I don’t understand why you go to all the trouble to make these models available through bioimage.io but then tell people not to use them on their own data. Am I missing something?

As said above, we advise against reusing trained image restoration models. But in general there are many scenarios where people legitimately share specific models with specific users.

  • The Model Zoo architecture can be used to setup individual hosting places for facilities.
  • Some models, e.g. segmentation approaches, are more safe to reuse than restoration models because it is easier for users to verify their results. StarDist, for example, ships pretrained models that work really well on many different datasets!
  • For reproducibility, any publication utilizing on a deep learning approach should make their trained model publicly available. Model Zoos allow all other to inspect the precise network used to generate given results. Additionally, a model in the Zoo is encouraged to also point at the precise data used for training, hence truly enabling reproducibility!

Are the models in Bioimage Model Zoo clearly labeled as being versatile or not?

I will raise this question in the bioimage.io channel to make sure we address it properly. Currently, it is the responsibility of the model author to provide this information in the model description. Thanks for your question!

Joint denoising and segmentation with DenoiSeg

What is the segmentation part in DenoiSeg based on? StarDist?

We use a simple 3-class (foreground, background, boundary) U-Net. In principle, though, one could in fact do a similar thing also with StarDist as the segmentation ‘module’. This is future work though. In a slightly different context we have looked and suitable combinations of denoising and StarDist though. Check out our ISBI 2020 paper on that…

Does DenoiSeg do better than N2V or does DenoiSeg replace N2V when I will use the denoise result for segmentation?

On some data sets, the performance of DenoiSeg is almost as good as N2V. On some others, it is slightly worse, but this is to be expected since the network is not dedicating its full power to the denoising task alone. We make a detailed comparison of DenoiSeg with N2V in our DenoiSeg preprint

Looks like you are using different ideas for segmentation than in StarDist?! If that’s the case, how would DenoiSeg compare to StarDist?

For very small number of annotated images, DenoiSeg will likely be better than StarDist. For many annotated images, StarDist will most surely outperform DenoiSeg. This boils down to choosing if you want to annotate few images in the first place and get some decent results and curate them (DenoiSeg routine) or you want to do many more annotations in the first place and then do less curation (with StarDist).

StarDist is targeting and particularly useful for roundish shapes. Odd shapes e.g. with bends might be better segmented using DenoiSeg.

And last but not least: the output of StarDist and DenoiSeg are quite different. While DenoiSeg gives you foreground, background, and border probability maps, StarDist gives you a bunch of vectors per pixel (pointing at the predicted outline of the object this pixel is part of). Depending on your downstream processing needs you might also prefer to use one of these two methods more than the other…

Can I use DenoiSeg for 3D data?

Currently, it is available only for 2D but it can be extended to 3D with some software development effort. We hope to find the time to do this soon. If you know somebody who could help us, we would be happy to support others doing with us!

Does DenoiSeg work for large structures (e.g. very elongated shapes, with bends etc.)?

Yes, the shape of the structures does not matter as much since we use a 3 class U-Net as the backbone for segmentation part of DenoiSeg. It does per pixel classification into foreground, background and border class.


Is there a way to avoid overlapping of labels in Labkit (with some kind of command like in QuPath, maybe)? Also, with the method you described for labeling, if two of them are adjacent and some of the pixels are in contact, I think they will be labeled as one region instead of two. Any way to avoid this?

In this case, you can create one label per object in Labkit (you can add labels to the existing labelings foreground and background on the left). Opening the result in ImageJ will already be the labeled result image then.



Thanks for this Q&A.

Can denoised images (with Noise2Void) be used for quantitative intensity measurements? Or is it best to avoid it?


1 Like

Hello LPUoO,
this is a hot topic. My current assessment is the following:
In principle, you can do this, but we have until now never made a real quantitative evaluation of the type and magnitude of errors that occur.
We have observed that when training is too short, systematic errors occur, e.g. intensity peaks (local bright spots) are an average darker.
These problems slowly disappear when the training is prolonged.
I would recommend being very careful and check for example profile plots of the N2V result and the raw data, to see if N2V is e.g. underestimating the intensity at its peaks.
As I said, this is currently uncharted territory.