CSBDeep - N2V - Training recommendations?

I am trying to use N2V to de-noise a series of fluorescent images. So far, it is all working quite nicely, but I was unsure about some aspects relating to the selection of training images. I had a look at the published papers and other documents I could find, but couldn’t really find any answers to my questions. So, I was wondering whether anybody had any advice on the following points:

  1. Does it make any real difference whether the training is performed on a single or multiple images from the image set? Would it be better to extract training patches from more than one image?

  2. Is it better to use two independent images for the extraction of training and validation patches or train and validate using patches extracted from the same image? Would it make any substantial difference?

  3. If the labeled structures of interest on the image are relatively sparse (see sample below), is it better to train using cropped versions of the image that reduce the amount of ‘empty’ space (not really empty as it still contains background information, but just no structures of interest) or is it better to use the full-sized, original images?

Sorry if these are rather obvious questions, but I would be really grateful for some further insights beyond me just trying all the possible permutations. Are there general principles that I should be aware of, but that I have overlooked?
Thanks for your help,

Hi @Volko,

I would recommend to use as many images as you have. This will allow the model to generalize better. Of course if you have thousands of images, you would chose a subset. So if you have 1908px * 1904px images, and you decide to extract 128px * 128px patches, you could get 14 * 14=196 non overlapping patches per image. If you have ten such images you could generate a training dataset of 1960 image patches, which would be a good start. By using data augmentation (rotation by 90, 180 and 270 degrees as well as flipping) the dataset will become 8*1960 non overlapping training patches.

In general you want to use as much training data as you can get :slight_smile:

The validation set should represent the whole diversity of the training dataset. A bad split of training and validation data for the example image would be, if the training dataset contains patches from the top half and the validation dataset contains only patches from the lower third.

During training the validation set is used to check the performance of the network on unseen data, which means that the validation set should contain at least “one example” of every possible kind of structure.

But it is important that the training and validation patches are strictly separated i.e. they should also not overlap! So following up on the example from above, you would take the 8*1960 images, which are non overlapping, shuffle them and then pick the last 10% as validation data.

It depends a bit on the ratio of empty space to filled space. If only a quarter your images contain actual structure and you would use all of the image for training, half of the training patches would be empty. Which is in a sense wasted.

Looking at the example I would not say that you have to crop the images.

In our experience with N2V, the most important bit is to train long enough. Even though results look pleasing after training for 30min, one should really train for a couple hours e.g. over night. The network will still improve, especially on the finer structures.


Thanks for the detailed response. So far I have used a single training image (which worked reasonably well training for 300 epochs with 64x64 patches) as I am not absolutely clear how to expand the examples to multiple images (I have 100’s of these images, so plenty for a large training set). I used the examples posted on github (https://github.com/juglab/n2v/tree/master/examples/2D) as a starting point, but I am not absolutely clear how to generate training/validation patches across multiple images. As far as I understand, the 2D SEM data example uses the first image to generate the training patches and the second image to create the validation patches. The training patches are generated by the following code:
X = datagen.generate_patches_from_list(imgs[:1], shape=patch_shape)
If I wanted to create training patches from the first 10 images, would I just change the imgs[:1] to imgs[:10]? Does this automatically generate training patches across multiple images?

Am I working along the right lines or is there something missing? Are there any other examples that show how to use multiple images for training? All the examples seem to just use one training image.


Yes, the function datagen.generate_patches_from_list(...) takes a list of images and extracts n patches from each image. The functionality is used in this example.

Great, I think I am starting to understand. I hope you don’t mind me asking a few more questions related to patch sizes and train_steps_per_epoch values:

  1. Why is the size of the patches that are extracted for training and validation using datagen.generate_patches_from_list different to the patch size in the N2VConfig? In the example, the extracted training and validation patches are 96 x 96, while the patches in the N2VConfig are 64 x 64. Is there a general rule for the relationship between the two sizes? Are the smaller patches in the N2VConfig random subsets of the larger training patches generated by datagen.generate_patches_from_list?

  2. Is there a general recommendation for the train_steps_per_epoch value? In some examples, this value is set to a fixed number, while in other examples it is number of training patches divided by batch size (128). Should this value be within a certain range similar to to the number of training epochs being greater than a few hundred? What happens when this value x batch size doesn’t equal the number of training patches?


Sorry, I was out of the office :slight_smile:

Yes, the smaller patches (defined in the N2V-Config) are randomly cropped from the larger patches during each epoch.

In each step per epoch one batch (defined by batch size) of training patches is shown to the network and the loss is backpropagated. Usually one wants to show all training patches once per epoch and then compute the validation loss. This is why we often set train_steps_per_epoch = train_data.shape[0]/batch_size. But if you have thousands of training patches it might makes sense to evaluate validation loss more frequently i.e. before every training patch is seen. Then one would set train_steps_per_epoch to something smaller. If you set train_steps_per_epoch > train_data.shape[0]/batch_size some training patches will be shown more than once during a given epoch.

I usually set it to train_data.shape[0]/batch_size and increase the number of training epochs.

Thanks for the detailed answers - very much appreciated. I certainly feel that I now have a better understanding of the parameters and it doesn’t feel like I am just using some magic or random values.
All the best,

1 Like