In this thread we are adding all the Questions & Answers we collected during our NEUBIAS Academy webinar “Introduction to Machine Learning and DeepImageJ” (available in Youtube). More than a hundred questions answered by me (Ignacio Arganda-Carreras) and the three moderators of the webinar: Estibaliz Gomez-de-Mariscal (@esgomezm), Daniel Sage (@daniel.sage) and Carlos Garcia-Garcia-Lopez-de-Haro. Enjoy and post any missing question here!
Q&As Table of contents:
- Q1: General questions about the presentation and the materials
- Artificial Neural Networks
- Deep Learning
- Colab Super-resolution in EM notebook
Q1: Hi, could you post a link for the workbook etc. please? I didn’t receive an email link to these resources. Many thanks
Sure, here you are the link to the slides: https://docs.google.com/presentation/d/1tmeXE2a8-1yau6DdYEl9hFuogBmajiss6py39MZ74wE/edit?usp=sharing
Q2: Can there be classifier evaluation in an unsupervised ML?
Yes. As it is explained in detail here, you can evaluate your clustering algorithm based on ground truth labels if you have them (external evaluation) or, if you don’t have labels, based on the compactness and similarity between the samples clustered together by your algorithm (internal evaluation). Scikit-learn has a very extensive page dedicated to the clustering evaluation methods implemented in their library with references to the methods.
Q3: Precision is dependent on prevalence, how to generalize?
That is a classic problem in machine learning. When working with highly imbalanced classes, you might need to resample based on the proportion of samples of each class, or apply weights so the model does not favor the prevalent class. Here you are a couple of links about this topic in wikipedia and machinelearningmastery.com with possible solutions.
Q4: What are the heuristics to decide which proportion of the dataset to allocate to train and test set ? If we let too few data in the training set we will never be able to learn a very complicated model right?
The proportion of samples to be left for training and test depends on the number of samples we have. In general, we want as many different samples as possible in our training set so our model is exposed to all kinds of samples. At the same time, we want a test set that is independent from the training set but has a similar probability distribution. That way, if our model does well on both, we are sure we are not doing overfitting.
Q5: How could I relate images and labels in a way understandable for the algorithm?
It depends on the problem you are dealing with. For example, in classification each image would correspond to a label. In semantic segmentation, each pixel is classified as part of the segmented object or not. Something like a mask if the segmentation is binary.
Q6: Are machines 100% coachable? Is there ever variability like with humans or is it simply a matter of how the models are created/executed?
In learning by sample, which is what we introduced in this course, the models try to replicate the best labels produced by human experts. So your results will be as good as your training labels.
Q7: Can a simple classification of data into two types A and B (B being very rare, ~5%) be used to identify B objects in a larger test set (in order to enrich them) before manually annotating subclasses B1, B2, B3 and re-train the network? So essentially, is re-training possible when removing a class, or replacing one class by several classes?
There are different options for the case you are contemplating. You can do that for instance in a hierarchical way with a first model to differentiate between A and B, and a second one to differentiate the B’s between B1, B2 and B3. If your model is a neural network, you can first train it for A-B classification, and then, once you are satisfied with the results, fine tune it for A-B1-B2-B3 classification.
Q8: Does a 3D image tile count as 1 image or as all the images it is constituted of? (i.e. a 2X2 tile with 100 images in Z counts as 1 image or as 400 when we decide how many images we need to train our classifier?
It depends on the input your model is expecting, 2D or 3D. You can use 3D images as input, obviously in case of deep learning classification it will require a powerful computer to make the training. About how many images to train a classifier: this is a fundamental question. The trick is to have enough images that represent all the variability of our problem.
Q9-1: Can I ask what is the minimum in terms of numbers for training and testing, for example I am using 10 images for training and 5 for testing?
Q9-2: For selecting sample # to build a classifier? Is it based on # of images? Or # of pixels being selected/trained?
It really depends on the problem you are trying to solve. You would normally need more than 10 images if you are trying to classify whole images. But maybe 10 are enough if you are segmenting cells on those images and you have some hundreds of them on each image, so you can create patches to pass to the model as different inputs (remember the data augmentation tricks explained in the presentation).
You can also choose the number so the classes to classify are balanced. Hence, if you want to classify pixels (segmentation for example) you should ensure that you have enough presence of positive and negative pixels in your training data
Q10: What is this we getting after pre-processing?
I assume you are referring to the “classic classifier cycle design” slide. After pre-processing, your samples (whole image or pixels) get represented by feature vectors.
Q11: Does a human select which features are relevant, or does the machine learning algorithm/classifier select these?
An algorithm. Feature selection is a whole field in itself.
Q12: How does one select a specific classifier from the list given? Is it trial and error?
You usually select the ones that in the literature have been successful for your type of problem, and then you evaluate them and choose the one with the highest accuracy.
Q13: In train and test sets, how could I connect images and labels in a way understandable to classification algorithms?
Typically, as we saw later on in the notebook, we put N images in a folder and N labels in another folder. The label file could have the same name than the image filename.
Q14: Ethical question: in possible applications to decision making, for example operate surgery on oncological patients or prosecute a person identified with face recognition, how can we be sure training dataset included enough and diverse data?
If our data are biased, the results will be biased. Therefore the need for large unbiased datasets.
Q15: In K-fold cross-validation, is the model trained separately each time?
Yes, on every fold of the cross validation you need to train your model again with the new data partitioning,
Q16: Why k: 5 or 10 any particular reason? What happens if we use other than 5 or 10?
Nothing, 5 and 10 are simply typical values for k in k-fold cross-validation, but you can use others.
Q17: If you use k-Fold cross-validation, do you get multiple models out of it and have to choose one or do you get only one, which is the combination of all?
In the latter case, the model already saw all images at least once. Can you still talk about generalization then?
You get k different versions of your model, but you have to keep in mind that we use k-fold cross-validation to derive a more accurate estimate of model prediction performance, so you can compare different types of models among them. Once you have the best configuration/architecture, you can forget about cross-validation and use that architecture on your own train/test partition.
Hidden layers are those between the input and the output.
Q19-1: How do we define the weights?
Q19-2: How does the network set the first weights before iterating through the training set?
Q19-3: How are first weights selected?
They are normally initialized before training. There are several different ways to initialize them.
Q20: What is meant by vanishing gradients?
It’s a historical problem in the training of deep artificial neural networks with gradient-based learning methods and backpropagation. In such methods, each of the neural network’s weights receive an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training. If the gradient is very small (vanishing small) then the update is minimal and it prevents learning. This happened when stacking three or more hidden layers, what is considered a deep network.
Q21: What is the difference between object detection and instance segmentation?
Object detection means for instance a binary segmentation: tell me which pixels belong to cell nuclei and which ones not. Instance means not only to distinguish nuclei but alse which one of them is a nuclei. Hence, segmentation+detection. When you detect each instance of the kind of object you want to detect.
Q22: What data augmentations are good to use for biomedical images?
First it really depends on the features of your data. For instance, if you segment really circular shaped nuclei, then augmentation could be adding noise, rotations, translations but not shearing as you change the shape of the object into something that is not real.
Q23: Is there any standard Python library for data augmentation of tiff files?
All the deep learning frameworks (TensorFlow, PyTorch, …) includes data augmentation processing.
Q24: Is there a minimum number of images per group to run a supervised deep learning?
It is a interesting question. The number of images should be large enough to represent all the variability of your problem.
Q25-1: Can the annotation be done automatically and then validated or changed?
Q25-2: If the manual annotation of data is too complicated work is it possible to sequentially train networks on partial annotation to eventually get good annotation?
As we said before, your results will be as good as your labels. If you train your algorithm on rather imperfect labels, you will get (at best) rather imperfect results. That being said, it might help to use those as starting labels, do some proofreading / correction and then retrain. If that is faster than fully manual annotation, way to go!
Q26-1: Suggestions for Manual Annotation?
Q26-2: What tool do you use to correct the segmentations to generate the ground truth for training?
In Fiji: https://imagej.net/Labkit
Q27: How can we assess whether there is overfitting or not? Thanks.
We take part of the training set as a validation set, not used for training, and evaluate the network at each epoch on it. If the error goes down in the training set but up in the validation set, we are doing overfitting.
Q28-1: For any given biological problem, could your trained classifier contain biological information? Ex: the function that explains your problem? If yes, do people recover this information/function?
Q28-2: Do people recover, for instance the loss function, and use this function to explain the difference in the specific biological phenomena that you are studying.
To the first question, yes. For instance, you could define your loss function so it contains this info. Depending on your task, you could also move into Bayesian models so you incorporate a prior in your weights.
For the second one, it has to do with how your formulate your problem so the result of the training can interpretable. There is a whole field in artificial intelligence about interpretability.
Q29: Say a model has been trained on a certain selected set of features. During testing, is it possible to find out which features were given how much weight to arrive at a decision for classifying?
For other machine learning models yes, for neural networks no. Networks perform feature selection in an unsupervised way.
Q29-2: In the context of an ML model only, how would one go about doing that? (Re: question about finding weights of features)
It depends again on the type of model. Not all algorithms offer this possibility. For example random forests in sklearn have the option to show which of the features has a bigger influence in the final result.
Q30: I am wondering about the best training strategy for segmenting & classifying cells using U-NET, where one cell type is rare, and this rare cell type actually can be further sub-divided into different sub-types. It is relatively straightforward to train a network to distinguish the “main” cell types. So my question is: if I use the trained network to predict the rare cell type (let’s call it C) in images, then manually annotate the sub-types among these (say C1, C2, …, C5), can I somehow “remove” class C from the originally trained network and replace it by the more granular classification C1-C5?
Literally no, i.e. when you train your network to detect class C, then it won’t be able to distinguish between classes. But what you can do is to use those weights to do reinforcement learning. In this way your model, that already knows what is class C, would in principle, learn to distinguish subclasses C1, C2, …
Q31: In the U-net code I see a concatenate operation. What does it actually represent?
If you remember the shape of the U-Net you will see that there is a flow between the encoding and the decoding branches. Concatenate means to concatenate the convolutions of both branches.
Q32: Are there any good alternatives out there to the U-Net structure that are worth exploring for bioimage analysis?
For biomedical applications, U-Net and its variants (residual connections, attention blocks, etc.) are now the most frequently used.
Q33: So, we’re using a U-Net for this application, which is transforming images. Would we use the same type of model to perform pixel classification?
Most probably. The U-Net is the most popular network nowadays for image processing applications, including for pixel classification.
Q34: Is U-Net compatible with the TPUs rather than GPUs?
Yes, it is. It actually depends on your layers. You can check it if you run tensorboard, which layers are compatible and which not.
Q35: Do you know about a publication in the field of biological/medical image segmentation using 3D U-Net?
Sure, check Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=3d+%22U-net%22&btnG=
Or the maybe check the platform NiftyNet: https://www.sciencedirect.com/science/article/pii/S0169260717311823
Q36-1: Is deep ImageJ available through update sites?
Q36-2: To install deepImageJ in Fiji… is there an update site to activate?
Yes, but you have to put the URL explicitly:
Q37: Do I need to download Tensorflow if I am using an already made bundle as a starting point for now?
No. You should install the dependencies that come with the deepImageJ models and that’s it.
Q38: Would deepImageJ analysis work with big data (stack with 20 GB and more) on a normal desktop computer? Like cell segmentation
It will be slow, but it will be possible.
Q39: How (well) does deepImageJ handle multiple model outputs, e.g. different image channels/ images in different resolution, mb roi data?
For the moment deepImageJ only works for 2d image to 2d image taks (independent of the number of channels). However, the developer team is working to provide more functionalities.
Q40: Can the plug-in read models saved in h5 files?
No, you would need to transform them first to Tensorflow format. We provide a notebook to do so: https://github.com/deepimagej/python4deepimagej
Q41: Does it support multidimensional input?
Yes, but it supports only one input. It has to be 2d (with any number of channels).
Q42: It is possible to use DeepImageJ with confocal images?
Yes! But you should train a model for that.
Q43: Is deepImageJ compatible with digital scanned images (having in mind their format (not tiff/jpeg) and also their big size (eg. 1GB))? Thanks!
Yes! DeepImageJ processes images dividing the images in more affordable tiles. However, you could also use deepImageJ on the GPU.
Q44-1: Is the tool ImageJ2 compatible? Or is it an ImageJ1.x tool?
Q44-2: Is it written as an ImageJ2 plugin? Or is the source code based on IJ1.x?
It is compatible but you also have the source code written for IJ2: https://github.com/deepimagej
Q45: Can I use a network trained in CARE?
Yes, but it is advisable to retrain it yourself.
Q46: What are the differences between DeepImageJ and CSBDeep in Fiji?
Both plugins allow using TensorFlow image processing models in ImageJ/Fiji. CSBDeep was used at the beginning for the CARE models and now, it is open to other TensorFlow models. It is written in IJ2. CSBDeep integrates training tools in Python and prediction tools in Python or in Java. DeepImageJ is written for IJ1 and it is compatible with IJ2. It doesn’t have any training tool, instead, it provides a tool to import limited pre-trained TensorFlow models: 2D image to image processing. Regarding pre- and post-processing routines, DeepImageJ integrates them as simple ImageJ macros, while CSBDeep has a fully specified-mechanism to access to exposed routines of any libraries.
Q47: Am I correct to understand that the developer has trained the model using a training data set? If the user (test data) does not have images similar to training data, what will be the result of the output?
You can try it and see that actually it is not what you would expect A model is trained for a particular kind of data. If your data is different, then your model does not know how to analyse it.
Q48: What’s the difference between using a model in DeepImageJ and using the plugin developed by the dev? I’m thinking about StarDist for example which is available as a Fiji plugin.
In the case of StarDist, it is much better to use their plugin due to the very specific post processing they have.
Q49: Can we see the training algorithms or modify them?
Yes, you will see in the Python code how to train a model using Google Colab.
Q50-1: Can I run any model generated in ZeroCostDL4Mic colab notebooks in DeepImageJ ?
DeepImageJ can run any TensorFlow models, until now only for image2image applications. So if you could save the ZeroCostDL4Mic model as TensorFlow, you will be able to use in DeepImageJ
Q50-2: Same for N2V then ?
Q51-1: Is it compatible with tensorflow 2.0 (TF2)?
Q51-2: I noticed that DeepImageJ is compatible with tensorflow1, what about tensorflow2?
Not yet. We are waiting for the tf2.0 java api. DeepImageJ uses tf1.15 which should be able to run early tf2 models.
Q52-1: Can you create your model with PyTorch in order to build a model in deepImageJ?
Q52-2: Will models that I created with PyTorch become applicable too?
Q52-3: So deepImageJ is working only with tensorflow? If I coded a model in PyTorch without Colab I would need to redo it in colab in tensorflow?
You will have to convert your PyTorch model into Tensorflow in order to import it into DeepImageJ. There are some tools for that.
We are going to provide a notebook to transform PyTorch into TF soon. In the meantime you either could do it yourself or train it in TF as you suggested.
Q53: How can I select which models from the page match my real problem? What features should I be looking for?
In general, whenever you want to reuse a trained model, you really need to verify that it was trained on data that is similar to yours. Also, you should also review by visualization that the results you get are correct. For the case of noise2void you really need to train on your data again, or for content aware restoration tasks, you need to make sure that you have exactly the same kind of images. For other models such as sEVs segmentation, you can reuse the model in your EM data as it can generalize.
Q54: Regarding model reuse: How do you have to transform your images, so that the downloaded model works optimally with them? And: how can you quantify that you did well/optimal with your transformation?
You should try to make your images as close as possible to the training ones. Apart from the format and content, you can always apply histogram matching as well.
Q55: Please define “close to the input data of the original model”. Do you try to “match” the image histograms?
Yes, but only the histogram. The input data should be the same resolution, same quantity of noise, same microscopy modality, …
Q56: Do my images have to be acquired in the same microscope conditions (same microscope, objective lens, pixel sampling, etc.) of those datasets?
Not necessarily, but the closest your images are to the training ones, the most similar the result you will get to the original one.
Q57: But when you say similar images, or same kind of images, You mean same size, same microscopy, same histogram, same intensity, same amount of examples in the classes or what exactly?
Same modality, same pixel size, same biological tissue, cell type, pixel values. The closest to the training data, the better.
Q58: How generalizable are these bundles as the images they are trained on may be acquired using very different parameters e.g. pixel size, intensity etc than my data?
This is not related to the bundled model but to how each of these models was trained. This information has to be given by the person who trained the model / manuscript.
Q59: How do I train my data using your models such that I can use them in my future studies using deepImageJ. For example, I have an image of bright particles in the background of dark.
You should have a lot of images and a lot of annotated images (labels). Then you choose a network (usually U-Net) and a loss function. Then you can train the model, and if your training converges, you will be able to save the model and use it later in deepImageJ.
Q60: Does DeepImageJ employ GPU computing?
Yes, there is a release compatible with GPU but you would need to install the proper drivers: https://github.com/deepimagej/deepimagej-plugin/releases
Q61: Do you need a GPU for running the model in imageJ?
No, it runs on CPU without any extra installation needed.
Q62: In the future, will DeepImageJ work on GPU?
It already works on GPU on its latest version.
Q63: The recent Noise2Void Fiji plugin works with TensorFLow 1.13 or 1.14 on GPU, and cuDNN. Would this be possible for DeepImageJ as well?
You can use Noise2Void directly in Fiji. In this case, you don’t need deepImageJ.
Q64: Is this example presented here based on the ImageJ1.x code base?
Q65: For this particular example of em superresolution. Is there a way to see what the network has learned? For instance to see if the PSF makes sense?
You can visualize the output of each convolution layer and see which features have been learnt. And yes, you can check the PSF of the output and compare it with real high resolution images.
Q66: Why is the 32-bit image divided by 255 gives you an image with value between 0-1?
Here the input image was a 8-bit image from 0 to 255, if we convert to 32-bits and divide by 255, it will be in the range of 0-1.
Q67: In the colab example how did you create the groundtruth images?
The ground truth is the high resolution image and the input is a “crappified” version of it to simulate its corresponding low resolution version.
Q68: Is there any difference between fluorescence images and transmission images with regards to training?
The images have nothing to do in terms of contrast, range of intensities, noise, signal to noise ratio, but you can probably use the same backbone models. You might need to re-think the pre- and post-processing techniques though.
Q69: The ground truth here is the same area taken at high mag in the scope?
In this case the ground truth is the original image (good quality image). We are adding some noise/blurring the image. So you create a false input with less quality.
Q70: These ground truth images were distorted with a Gaussian blur - why did the U-Net not simply learn to transform the images with the inverse function?
That’s exactly what it should do at pixel level. The original idea was to downsample the images as well, but DeepImageJ does not take inputs of different size as outputs for now…
Q71: What’s the meaning of the activation=elu and activation=sigmoid in his model definition?
They are all activation functions applied to the output of the corresponding layer. The typical ones are:
Relu = Rectified Linear Unit, it is the activation function, generally if is a nonlinear function which was inspired by biological motivations and also mathematical justifications.
ELU = Exponential Linear Unit.
Q72: In this case you use a sigmoid shape function. How do you choose it? There are other function shapes to fit in output right?
Yes, that’s a design decision. The sigmoid guarantees an output between 0 and 1. You can use other activation functions, of course. Have a look at this interesting discussion about it.
Q73: How to decide for instance max pooling or average pooling?
It’s similar to the activation layers: trial and error. However it is true that most of the time, for segmentation and classification max pooling is recommended and for continuous outputs (superresolution), average pooling makes more sense.
Q74: Is the validation set like an intermediate test set? Is it the K-fold validation?
K fold refers to the number of times the network is retrained over a fraction of the training set. The validation set is another fraction of the training set which is used to check if the model generalizes well.
Q75: When do you stop the training? When is it acceptable when comparing ground truth?
When the error in the validation set starts to increase or it does not improve for a user-specified number of epochs known as “patience”.
Q76: What can we do if the plateau is at a high value of loss? Do we have to check/change the training images?
You might try changing the learning rate.
Q77: When you use models to predict content, e.g. compute super res from widefield, what is the recommended way to check the data for artifacts or data accuracy?
Always compare with some expected output, real or synthetic. For example, for segmentation, compare the expected output with the predicted one based on the segmented objects.
Q78: Can I use the resolution model to test my data?
Yes, but take into account that it might not work well. It is better to train it yourself in your own data.
Q79: Can we train the model using our own data?
To train a model on your own data you need a notebook (or equivalent code) such as the one in our tutorial. Then you can replace the image dataset with yours.
Q80: Is there a similar training example for image segmentation which could be shared with the seminar attendees?
You could try modifying the notebook code with the data at the cell tracking challenge: http://celltrackingchallenge.net/2d-datasets/. That being said, we’ll try to make more notebooks available soon for other typical tasks.