Training Voxel Size when training CSBDeep (OOM Errors, Prediction tile size, etc)

I have successfully trained CSBDeep (CARE) on my data using Python on the GPU. I am using GeForce RTX 2090 Ti (12 GBs) and ofcourse compatible CUDA, CUDnn, Tensorflow, and Keras.

I find that our sample sizes are much bigger than what CSBDeep was designed for; we use 25x and 40x objectives and image nuclei much larger than the Tarabetium sample used originally. I understand that the training model is dynamic based on the dimensions of input voxels (16x64x64 originally). I would like to increase these dimensions to include at-least two cells/nuclei/ROIs within one voxel, however just going up to 32x128x128 throws OOM errors. I believe this is because the neural network model is built within the GPU and would be limited by GPU memory.

My question is: is it possible to bypass the GPU and construct the model in the RAM (I have 128G so much better). I do not mind the training time - can leave it to train over the weekend. Also, a discussion on what is the optimal voxel size would also be very helpful. Is it even necessary? My principal input resolutions are ~150x1000x1000 (‘ZYX’).

Furthermore, the problem can also be translated at prediction time where the input image would be broken into n_tiles. Do the internal functions break each tile to the same voxel size as the network’s dimensions?

Great to hear!

I think you’re fine as long as your images fit into RAM.

It is often not that important for image restoration to have whole objects within a training patch, but increasing the training patch size is generally not a bad idea. Using 32 x 128 x 128 pixels should be fine when you decrease the train_batch_size accordingly (e.g. between 1 and 4).

There’s no theoretically optimal patch size for training, and it probably won’t matter all that much. Furthermore, the tile size during prediction after training does not have to be the same as for training.


1 Like

Thanks Uwe very much for your quick reply.

Does decreasing the train_batch_size have any effect on training weights and the end-result on our prediction? Our data has many levels of SNR ratios and I am now trying to train the network sequentially on each SNR with highest SNR as target.

Aah this is interesting to me. I would have thought otherwise, because the network at least needs to differentiate edges between two cells at low SNR and we should choose a voxel size which can at-least emulate that (otherwise it would confuse everything as background or everything as cell). Perhaps I am wrong (maybe experimenting would help) and the pre-processing thresholding to select image voxels gets rid of that problem.

Thanks again. I would try that today.


All the hyper-parameters, such as the batch size, have an effect on the final weights and thus all the predictions later on. However, there’s no theoretically optimal way to chose all those parameters. Unfortunately, you have to try and see what works best. But again, I wouldn’t worry so much about that. The default parameters typically give good results.

After training, you can validate your model’s predictions on low/high-SNR image pairs that you haven’t used for training.

What do you mean by “sequentially”? We find that it typically works well to train a single model for denoising input images with various SNRs.


Thanks Uwe for your help.

We found that our structures were bigger in number of pixels than the demo dataset tribolium and therefore default parameters gave us ‘patchy’ and ‘clumpy’ output images. It was partly fixed through increasing the input voxel size and partly through increasing input data. We therefore have a proof-of-concept that CSBDeep can be included in our acquisition pipeline, however are now moving towards complete theoretical understanding and fine-tuning of the network.

Let’s say I have perfectly aligned SNR data from 1 to 20 (SNR1, SNR5, SNR10, SNR15, SNR20). Then by sequentially I mean the model is first trained to predict SNR5 from SNR1, then to predict SNR10 from SNR 5 and so on until it predicts SNR20 from SNR 15. I was wondering if this would make the model more robust for different acquisition parameters (which are difficult to control in the real world).

Also, we are also looking at noise2void to compare different approaches and I find their use of CSBDeep quite confusing. However, that is another discussion.

Hard to comment on that without having seen an image.

I don’t think that this will lead to optimal results. What we found in the CARE paper is that training a single model on different SNRs at the same time works well. I.e. train with input/output SNR image pairs 1/20, 5/20, 10/20, 15/20. Such a model will be robust in practice later on.

Perfect! Thanks so much!

Will try this and keep you updated.