Content-aware image restoration - questions

Hi all,

I tested the Content-aware image restoration (denoising) using the Jupyter notebooks. I have absolute no experience doing this sort of things and virtually no knowledge related to this sort of things. So I have a few question below. Apologies if they are dumb or if the answer is in the paper.

  1. Is there any reason not to use the Jupyter notebook with my own data? Basically I replaced the example data with my own in order to run it.

  2. When doing 3D denoising is the information in the slices above and below a given slice used to reconstruct the 3D stack or are the z slices restored individually?

  3. During the data generation should be the patch size adapted to the size of my objects (the nuclei) of interest?
    o This is my input and GT. I used 36 images like that for training. Images taken on a Zeiss 880 Airyscan 0.01% laser power for input and 2% for GT, same speed of acquisition.


    And these are the patches I got with the default settings. I used the default patch_size = (64,64),
    n_patches_per_image = 1024,

    Should the patches be larger ?

  4. In the training I see that one may redefine the configuration:
    image
    o I see how to change the number of steps per epoch. But how do I change the learning rate for example? Is it not possible in the Jupyter notebook ?
    o What are the epoch anyway? Is it the more the better? Is there a limit?

  5. Is there an explanation of what all the information in Tensorboard means?
    o These are the final plots I had:
    image
    o Is it good? Is it bad? How do I know?

  6. I have two Titan GPUs but I see that only one is in use during the training. Is this normal or does this mean that there is something wrong with my setup? Or is this a limitation because of the Jupyter notebook?

  7. I see in the CSBDeep in Fiji – Installation wiki that there is a section for multiple GPU support but it is for Linux only. Is it because you don’t need to do anything special to have multiple GPU working on FIJI in windows or is it because multiple GPU in FIJI will only work on Linux?

  8. I installed both the plugin and the TensorFlow GPU native libraries but apparently it can’t load TensorFlow. What am I missing?
    image
    image

  9. This is an example of output I had. GT is 2% laser power. The input is 0.01 or 0.04 % laser power. Trainning was done with 36 images like shown in point 3 and with 400 steps per epoch (and all the other defualt settings).
    o As I don’t see a major difference between both network outputs, is it correct that I can continue decreasing laser power without compromising the image restoration?

Thank you for your help and I hope that this questions will as well help other unexperienced user like me.
In any case I am very pleased with the network output and the whole Jupyter notebook experience.

@fjug

2 Likes

Hi,

Thanks for trying things out!

No, that is perfectly reasonable and the correct way to do it

The former. The network will use the full 3D information, i.e. pixel values below and above a given slice

Not really. It should be large enough to avoid boundary effects and small enough such that the a typical number of batches fit into GPU memory. Good defaults are (256,256) for 2D and (64,64,64) for 3D networks

I would use larger patches but sample less patches per image. E.g. patch_size = (256,256) and n_patches_per_image = 64 would make sense in your case (sampling more would lead to many patches with redundant information)

It is. You can set many options when construction the Config object:

vars(Config())

{'n_dim': 2,
 'axes': 'YXC',
 'n_channel_in': 1,
 'n_channel_out': 1,
 'train_checkpoint': 'weights_best.h5',
 'train_checkpoint_last': 'weights_last.h5',
 'train_checkpoint_epoch': 'weights_now.h5',
 'probabilistic': False,
 'unet_residual': True,
 'unet_n_depth': 2,
 'unet_kern_size': 5,
 'unet_n_first': 32,
 'unet_last_activation': 'linear',
 'unet_input_shape': (None, None, 1),
 'train_loss': 'mae',
 'train_epochs': 100,
 'train_steps_per_epoch': 400,
 'train_learning_rate': 0.0004,
 'train_batch_size': 16,
 'train_tensorboard': True,
 'train_reduce_lr': {'factor': 0.5, 'patience': 10, 'min_delta': 0}}

So you can change e.g. the learning rate (or number of epochs, batchsize etc) with
config = Config(..., train_learning_rate = 1e-4).

An epoch is the number of model updates such that the whole training dataset was seen once (i.e. number of training images/batchsize). There is no general rule for how many of those you need for reasonable performance. Typically one monitors the training/validation loss in tensorboard and uses a number of epochs such that at the end the validation loss is plateauing.

Looks reasonable :slight_smile:

Currently only one GPU is used at a time.

The link doesn’t really describe how to use several GPU’s at the same time, but how to choose which GPU is used if several are available. So not quite what you want :slight_smile:

Yes, that sometimes can be an issue, especially if your CUDA and tf are not playing along well (You have CUDA 9.0 installed?). In case this persists, could you open an issue at https://github.com/csbdeep/csbdeep_website/issues ?

Well, one would have to quantify the error of both model predictions against the GT. Indeed both restorations look reasonable, but the input as well does look very similar.

Glad to hear :slight_smile:

5 Likes

Thank you very much for all the explanations Martin @mweigert,

Now everything is much clearer.

I have a few extra questions:

  1. Is there in optimal bit depth for the input images? I noticed that for example, in my case, with 8 bit images the “low” images have pixel values between 1 and 20 and I wonder if that’s enough for good reconstructions.

  2. Is there an optimal Z overlap for the input stacks? When acquiring, I could have for example between 0% and 50% overlap in the Z slices.

  3. Is it okay if in the input I have stacks of different Z size as input? (Same interval but different total of slices. All other acquisition parameters identical)

  4. About the amount of patches. In the notebook it says that one “should use more patches the more training stacks [one has].” I am not sure I understand why this is the case?
    image

  5. About patch size you say that

So this means that the amount of GPU memory is a limiting factor in the training. Does this mean that more GPU memory leads to better models or only the speed of the model creation changes?

  1. Finally, I repeated the training with 3D stacks (with slightly lower laser power than in the example in my first post). This time and I had the following TensorBoard output:
    image
    You say that

I used train_steps_per_epoch=400 as recommended but the val_loss doesn’t plateau. Do you suggest I increase the train steps per epoch? How much ?

7.In which cases is it useful to limit the GPU memory ?

  1. On a side note, is it correct that any of these GPUs in compatible with CARE?

Thank you very much for all your answers,
I look forward to better understand these last items

Hi,

Yes, thats pretty low. But there is no general rule to whether that is enough, as this depends on what you want to do after processing (e.g. segmentation). So in general, one needs to try out whether the final (e.g. segmentation) result is still satisfactory.

That should only be relevant if you stitch the images afterwards (so should not be too relevant for the restoration aspect).

That’s okay, as long as the z spacing between planes is roughly the same.

If you use create_patches this will be done automatically, as then the same number of patches per stack are extracted but from more stacks.

Feeding larger patches will reduce boundary effects, so the model should get a little better, albeit not dramatically so. So it has not really a large influence imho.

The validation loss (left image, orange curve) seems to plateau from epoch 60 on. So all good.

Mostly if you would like to do other stuff on the same GPU while training (e.g. image processing) and don;t like that your GPU is effectively blocked for as long your network is training.

Yes. Of course, the more recent ones will be faster (having more cores, better memory bandwidth etc).

1 Like

Hi @mweigert ,
I run into a problem while installing today all the necessary to run CARE on a new machine.

Previously I did the following in a anconda prompt.

conda install tensorflow-gpu
conda create -n tensorflow_gpuenv tensorflow-gpu
conda activate tensorflow_gpuenv 
conda install -c conda-forge jupyterlab
conda install pip
pip install csbdeep
jupyter notebook

I’m no expert, but the issue I have is that conda install tensorflow-gpu installs tensorflow 2.0. Could you please indicate me how to uninstall Tensorflow 2.0 and install Tensorflow 1.x in a anconda prompt?

Thank you very much

Hi @LPUoO,

With conda (and pip) one can install a specific versions of any package, so the following should work:

conda install tensorflow-gpu=1.14

Hope that helps!

1 Like

Hi @mweigert,
Thanks for the tip. I don’t have any more a message error in the training step of Denoising3D saying that I cant user tensorfloww2.0 but I have the following instead:

history = model.train(X,Y, validation_data=(X_val,Y_val))

WARNING:tensorflow:From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\lib\site-packages\csbdeep\utils\tf.py:240: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\lib\site-packages\csbdeep\utils\tf.py:268: The name tf.summary.merge is deprecated. Please use tf.compat.v1.summary.merge instead.

WARNING:tensorflow:From C:\ProgramData\Anaconda3\lib\site-packages\csbdeep\utils\tf.py:275: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Epoch 1/100

---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-7-dc0809c42a5c> in <module>
----> 1 history = model.train(X,Y, validation_data=(X_val,Y_val))

C:\ProgramData\Anaconda3\lib\site-packages\csbdeep\models\care_standard.py in train(self, X, Y, validation_data, epochs, steps_per_epoch)
    175         history = self.keras_model.fit_generator(generator=training_data, validation_data=validation_data,
    176                                                  epochs=epochs, steps_per_epoch=steps_per_epoch,
--> 177                                                  callbacks=self.callbacks, verbose=1)
    178         self._training_finished()
    179 

C:\ProgramData\Anaconda3\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1730             use_multiprocessing=use_multiprocessing,
   1731             shuffle=shuffle,
-> 1732             initial_epoch=initial_epoch)
   1733 
   1734     @interfaces.legacy_generator_methods_support

C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    218                                             sample_weight=sample_weight,
    219                                             class_weight=class_weight,
--> 220                                             reset_metrics=False)
    221 
    222                 outs = to_list(outs)

C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1512             ins = x + y + sample_weights
   1513         self._make_train_function()
-> 1514         outputs = self.train_function(ins)
   1515 
   1516         if reset_metrics:

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras\backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node down_level_0_no_0/convolution}}]]
	 [[metrics/mae/Identity/_259]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node down_level_0_no_0/convolution}}]]
0 successful operations.
0 derived errors ignored.

I’m unsure how to fix it. What do you suggest?

Thank you !

Seems like a cudnn/CUDA mismatch (which conda should take care of automatically, but in my experience almost never does).

What you can do is

  1. Check your nvidia driver version (via nvidia-smi, something like 387.34)
  2. See what CUDA/cudnn version it maximally supported by your driver:
    https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html#cudnn-cuda-hardware-versions
  3. See what tensorflow 1.x version is maximally supported:
    https://stackoverflow.com/questions/50622525/which-tensorflow-and-cuda-version-combinations-are-compatible
  4. Install all packages with the correct versions manually e.g. (for the driver above)
    conda install cudatoolkit=9.0 cudnn=7.6.4 tensorflow-gpu=1.12

Always a bit of a pain…

1 Like

Hi @mweiger,
Thanks again for your help.

There is something I don’t get in the cuDNN Support Matrix because I don’t see my nvidia driver version (431.60, nvidia-smi output pic below ) nor

image

I tried the following but it still didn’t work, so I don’t know which python, cudatoolkit, cudnn and tensorflow-gpu versions I’m suposed to use?

conda install cudatoolkit=9.0 cudnn=7 tensorflow-gpu=1.9.0
conda install python=3.5.0

Am I suposed to run pip install csbdeep again after installing all the above?

Thank you very much

Hi,

You’re driver seems to be new enough for everything, so that shouldn’t be a problem.
Maybe you could try to install in a fresh conda env

Other than that, I’m unfortunately out of ideas :frowning: