I’ve been working with CARE 3D denoising to accelerate the acquisition of a large tissue sample. First tests with ~20 zstacks (10GB size) went very well, but it seems like some of the (rather rare) structures were not restored in the predicted images so I decided to try pursuing the training for longer on augmented data and see if it improves the results.
I rotated the training data to augment it (rotation 90°,180° and 270°) and ended up with a training patches file of about 40GB. The machine having 120GB of RAM, I wasn’t much concerned about it and started the training (800 steps/epoch, 200 epochs).
The attached image shows the monitored memory usage of the machine
@uschmidt83 @mweigert Is this increase of memory usage something expected or could there be a memory leak somewhere? Am I doing something wrong or is there a way I could avoid this issue? I guess I could double the RAM, but there may be a more clever solution
I checked the tensorboard display to make sure that it was useful to do that many epochs and it seems like the model is still improving by the crash of the training tensorBoard_there_is_still_stuff_to_learn|524x499
The code I’m using is derived from the example Jupyter notebooks shared on the CSBDeep GitHub, I can attach it if it can be useful. Also I’m using a CENTOS7 machine with a GTX1080 GPU in a conda environment (I’m attaching the conda env environment.yml (5.6 KB) ).
Let me know if I can add anything!