Noise2Void for Fiji

@David thanks for reminding me, I ran into this issue earlier and mentioned it here, it’s related to the latest Fiji update, something changed in the way the persistence of command parameters are handled. I hopefully found a workaround for N2V by marking the input image parameters as non persistent, please update Fiji and give it a try!

@imagejan @NicoKiaru any idea why RAI input parameters create this exception now? SCIFIO is trying to load an image based on the persistent value of these parameters. I don’t have the time right now to make an issue so I’m just tagging you to not forget about it.

1 Like

Hi @frauzufall,

I think it might be related to the fact that the RAI (or Dataset) input now tries to resolve String as a filepath that could be opened using scifio. See https://github.com/imagej/imagej-legacy/issues/246 for a long discussion about this.

Best,

Nico

1 Like

@frauzufall This solved the java errors, and it’s now running. Thank you!

Unfortunately, it’s just using CPU despite stating

[INFO] Using native TensorFlow version: TF 1.15.0 GPU (CUDA 10.0, CuDNN >= 7.4.1)

when initiating the training.

I have added the location of my tensorflow_jni.dll to “Path” under both system and user in the windows environment variables.

Also downloaded a clean version of Fiji and just installed CSBDeep in case there was something weird going on with the Fiji I usually use, but it still just uses the CPU.

Is there some log or something I could post for you to look at in order to figure out the culprit? Is there any way of knowing if it’s the CUDA or cuDNN installation that hasn’t worked?

EDIT: I’m using cuDNN 7.6.5 for CUDA 10.0. I figured that was fine since the TensorFlow settings says CuDNN >= 7.4.1

Got it to work by reinstalling everything again!

GPU (Titan RTX) utilization is only at 25-40% though, is that normal?

Been experimenting with the parameters without really knowing exactly what they do. Pretty impressive results sometimes!

One weird thing is that validation loss always is lower than training loss, regardless of dataset or how many steps/epochs I use.

Hi,
I have the same problem as @esgomezm, the training worked well and I saved the model and renamed it (with the right extension .bioimage.io.zip) to another folder than the one by default (the default folder is “tmp” for me). When I tried to predict the denoising on an image, I got the error messages

“[ERROR] Cannot create plugin: class=‘net.imagej.modelzoo.consumer.model.tensorflow.TensorFlowModel’, name=‘tensorflow’, priority=0.0, enabled=true, pluginType=ModelZooModel
[ERROR] Could not find a plugin matching the model framework tensorflow
[ERROR] Model does not exist or cannot be loaded. Exiting.”

Any hints to fix this error?
Thank you in advance,
Yousr

Hi @David, sorry for the lack of response so far!

Happy that you figured out how to run it on the GPU. 25-40% GPU usage sounds low - but I don’t have enough experience across systems and graphics cards to judge that… Does it increase if you increase batch size per step or patch shape? Though this can lead to hitting the memory limit of the graphics card.

All N2V command parameters are explained here: https://imagej.net/N2V Please let me know if something is not well explained or missing!

Regarding the question why the validation loss is always lower than the training loss - I think I discussed this once with @tibuch but I don’t remember, maybe it’s a bug. I made an issue for now.

Hi @yousr_rekik,

this looks like you trained with an older version of N2V, but maybe I’m wrong. Can you help me figure out what’s happening? Please…

  • post the version of N2V, by going to Fiji.app/jars and looking for n2v-0.6.1.jar (that’s the most recent one, maybe your version number differs)
  • upload the model somewhere so I can test it, feel free to send it to me via PM if you don’t want to post it publicly

In case your version number differs, or you maybe trained before updating Fiji, make sure everything is up to date first. You don’t need the N2V update site any more, the CSBDeep update site is sufficient. After updating, you might have to convert your model into the most recent format, which is simple and explained here (last paragraph).

Hope to hear back from you,
Deborah

1 Like

Thank you for your reply, I updated Fiji and now I have the right version of n2v.jar (n2v-0.6.1.jar). Fortunately, the prediction is working!
Thanks a lot!
Yousr

1 Like

Hi, no worries!

When I change those parameters around, usage changes, but always in the range of 20-40%. I don’t really see a pattern, except that small sizes of patch shape appear to give higher GPU usage than large ones. The card has 24 GB of memory, which appears to be like 98% full as soon as you start training with N2V, regardless of dataset size. Just asked in case it was possible to make it utilize the GPU to 100%, would give me half the training time :smiley:

It’s not that it doesn’t work, I get pretty nice results for some types of noise. Haven’t really figured out why some channels in a multi channel image works very well while others seem to be very difficult to de-noise even with a lot of training.

I have read the info page you linked, but it doesn’t really explain exactly how these things affects training/predicting. I’m not very knowledgeable about machine learning or programming, I guess most of these settings are self-explanatory if you’re even just a beginner level ML programmer/researcher.

Also: The training/validation loss thing. I have now had instances where the training loss is actually lower than the validation loss, so it’s not 100% of the time.

Really appreciate this plugin, very nice to be able to try this out without knowing programming. Thanks!

1 Like

Point taken, I should better explain the ML terms.

The N2V python code can do multichannel denoising, in Java this is not yet implemented. You can also post examples of images where it does not work and the N2V experts @tibuch and @akrull probably can give you more insight!

And yes, TensorFlow allocates all the GPU memory by default, but I recently saw that one can specify how much of it should use. I should make a parameter for it.

Thank for the feedback, this is really productive and helpful!

Got some nice results on a stack (stack is 30 images, checkbox for 3D unchecked), saved the model, and then tried to predict with this model using the template script that you provided (pointing it to a directory with a lot of stacks with the same dimensions).

When doing this, I get the following error:

[ERROR] Input "input" dimension 2 should have size 1 but is 30
[ERROR] Model and input data do not match. Exiting.

I first thought this was because the model was trained on a stack that had all 30 frames (accidentally) designated as “t” instead of “z”, so I made a macro that changed the properties of all the stacks to have z=1 and t=30 instead of the opposite (which was the case initially).

This does however throw the exact same error. If I check the properties of each stack after running my macro, they do indeed have t=30 and z=1, so that worked.

Any ideas?

EDIT: If I open one of the stacks in the directory and run the predict command with this model manually using the GUI, it works fine.

I was trying to train Fiji N2V on a stack subset when I get the following message:

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: bound must be positive
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at de.csbdresden.n2v.train.N2VTraining.train(N2VTraining.java:187)
	at de.csbdresden.n2v.command.N2VTrainCommand.mainThread(N2VTrainCommand.java:153)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: bound must be positive
	at java.util.Random.nextInt(Random.java:388)
	at de.csbdresden.n2v.train.N2VDataWrapper.subpatch_sampling(N2VDataWrapper.java:290)
	at de.csbdresden.n2v.train.N2VDataWrapper.getItem(N2VDataWrapper.java:132)
	at de.csbdresden.n2v.train.N2VTraining.makeValidationData(N2VTraining.java:385)
	at de.csbdresden.n2v.train.N2VTraining.mainThread(N2VTraining.java:258)
	... 5 more

It’s on a computer that has run N2V with good results… The file is a substack (10 slices at 2048*2048) of a much bigger stack. Any idea what could be wrong? The error message appears whether I select 3D or not.

I get exactly that error if I select 3D. If I don’t select it, it runs fine.

Not sure if running it in 3D actually improves anything, denoising-wise…

EDIT: I found what caused it for me in 3D: You need at least 16 slices since the minimum patch shape is 16x16x16.

1 Like

Hi @David and @frauzufall,

The validation loss can be lower, about the same or higher than the training loss. All of these options are valid. If the validation loss is lower it could indicate that the validation set is not as complex as the training set.

Eventually it is important that the validation loss converges i.e. reaches a plateau.

1 Like

The validation loss seems to plateau pretty fast, after a few epochs with over 100 steps/epoch. I feel like the results get better if I continue to train after this though. Just me imagining?

So… has anyone been able to do batch prediction with this plugin? I just can’t get it to work.

It could be that the loss looks like reaching a plateau because of the zoom level. But you are 100% correct with training for longer. We usually recommend to train over night (around 8-10h) to obtain a nicely trained model.

But if the actual number is also stable, there cannot be any improvement? Often when I train ON, it has the same number for the last 8-10 hours.

How does this relate to the fact that the “lowest loss” model is not recommended for N2V? If lowest loss doesn’t matter (always use latest checkpoint), why care about the loss at all?

Another question: Some models produce this artifact when predicting:

If you look at the black holes, there are lines that shouldn’t be there. Is this because of training too much? When training this model, I actually used two different stacks (with the same type of noise, they were stacks from timepoint 1 and 2 in a timelapse) for training and validation. This was to see if it would make the validation loss number higher than training loss, as per what @tibuch said above:

Hi @David,

If the losses don’t change at all (i.e. the numbers are equal) then the network is not improving. This is correct.

The loss still helps us to identify if the model is converging. If the loss would go up and down like crazy, it would be a good indicator that something is going wrong.

Interesting images! Does the black correspond to no photons/electrons detected? So truly 0 measurements?

I think the lines are part of the input-data as well. I can see short horizontal and vertical lines of same pixel intensity. This would indicate that the noise is not truly pixel-wise independent. And that is what N2V is picking up and leading to these stripes in the predicition.
A solution would be to use StructN2V with a kernel covering these crosses or you could averge-downsample your data until the noise becomes pixel wise independent.

In the screenshot of train-/val-loss I can see that the losses are still fluctuating a little bit, which indicates that the network is still updating.

1 Like

Yes, the training loss fluctuated, but the validation loss was stable the last 70 epochs (at least). Is it validation loss I should be looking at?

This is from a Leica HyD detector, which supposedly is very sensitive. Whether black is truly 0 photons, I don’t know for sure. Semi-technical promotional material here: https://www.leica-microsystems.com/science-lab/what-is-a-hybrid-detector-hyd/

N2V seems to be extremely good at denoising the very “digital-looking” HyD-data into clean images with smooth histograms. I only tried N2V on one dataset from a PMT detector so far, but interestingly it almost didn’t do anything despite training for the same amount of time as for the HyD-data.

This could potentially be very useful for us since the data here is from in vivo 3D timelapse datasets where the experimental conditions are such that we would greatly benefit from being able to use very low laser intensity and fast scanning speeds (producing noisy images that can be denoised by N2V and then further processed with registration, etc.) The image in my last post is just an z-slice from one of these datasets.

My main problem now is being able to apply the trained models to the the hyperstacks. Either through the Fiji GUI, or with the macro/python that’s included in the N2V plugin (predict on folder). Unfortunately, I can’t get either of these to work, I either get a tensorflow error:

[INFO] Processing tile 1..
org.tensorflow.TensorFlowException: 2 root error(s) found.
  (0) Not found: No algorithm worked!
	 [[{{node down_level_0_no_0/convolution}}]]
	 [[activation_11/Identity/_471]]
  (1) Not found: No algorithm worked!
	 [[{{node down_level_0_no_0/convolution}}]]

or a java error:

[INFO] Processing tile 1..
java.lang.IllegalArgumentException: LoopBuilder, image dimensions do not match: [-63, 352, 416, 8], [352, 352, 8, 8].
	at net.imglib2.loops.LoopBuilder.checkDimensions(LoopBuilder.java:347)
	at net.imglib2.loops.LoopBuilder.<init>(LoopBuilder.java:335)
	at net.imglib2.loops.LoopBuilder.setImages(LoopBuilder.java:117)

EDIT: With the update released yesterday I can now batch-predict on a folder using the GUI! :tada: Requires separating the hyperstack into individual stacks though.

I agree with you that there appear to be some lines in the raw data as well. I don’t know why, and I find it a bit weird. So far the lines have appeared when training on two different datasets, the rest of them have been fine.