ZeroCostDL4Mic - Cellpose - Use model outside of Collab

Dear @Guillaume_Jacquemet ,

  • I trained a model using the ZeroCostDL4Mic-Cellpose notebook (it was so smooth, congrats and thank you again for that)
  • I downloaded the model from the googleDrive.
  • I wanted to use this model from the command line (local conda env) but I struggled a bit before realizing that in cell 5 the output is :
parsing model string to get cellpose options
** MXNET CUDA version installed and working. **
>>>> using GPU

SOLUTION : To be able to reuse this model with the command line one has to specify --mxnet :

$ python -m cellpose --dir …/pathtoImages --pretrained_model …/pathToModel --mxnet --use_pgu
  • A- My “user feedback” would be to add it more explicitly in the model name, or in bold in the documentation…
  • B- Do you plan to also have pytorch model? (I think I read that it’s faster than mxnet)
    For example, having a dropdown menu to allow training with one or the other?
  • C-About the Flow_threshold and Cell_probability_threshold, the documentation in the notebook that default value are respectively 0.4 and 0.0. I was wondering if this is still true after a transfer learning, or if we should use a different value because of the added dataset?
  • D-I’ve a similar question/concern about the diameter parameter. Before doing the training shall we rescale images before getting closer to the default diameter=30 (for cyto) and 17 (for nuclei)? If we don’t which value should we use ?

Thank you again for making our life easy!

Best,

Romain

1 Like

Hi @romainGuiet ,

Thanks for the feedback! Great that it is now working well for you!

Did you get better results?

For A). Yes, definitively I will add some text to make this clear!
For B) I tried for a while to get the PyTorch version to train in Colab, but I never managed to get it to connect to the GPU. So the training was CPU only… Because of this the notebook currently only trains with MXNET. If I figure out how to get the Pytorch model to train on GPU, I will definitively add it has an option.
For C) I tested a few datasets and these values (also suggested as default in the Cellpose doc) always seem to work quite well. It’s definitively useful to try a few parameters though. I was actually thinking about creating a parameters sweep function to the QC cell of the notebook but I have not had the time yet.
For D) My understanding is that if you continue training from either of these models, your images are automatically rescaled to these values. No need to do it yourself.

Cheers
Guillaume

Hi @Guillaume_Jacquemet ,

It’s different dataset (waiting for better annotations for the previous project)

That would be very useful

But then we should provide the actual dimension in our dataset at some point, don’t we? Or is it automatically using the function determine scale during training?

I did a couple of test and using the cell below before the step 2 (inspired by this )

!wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh

!chmod +x Anaconda3-2020.02-Linux-x86_64.sh

!bash ./Anaconda3-2020.02-Linux-x86_64.sh -b -f -p /usr/local

!conda install pyqt cudnn==7.6.0 cudatoolkit=10.1 -q -y --prefix /usr/local

!conda install pytorch -q -y --prefix /usr/local -c pytorch

import sys

sys.path.append('/usr/local/lib/python3.7/site-packages/')

The output of the cell “2.Install Cellpose and dependencies” is :

** TORCH CUDA version installed and working. **
Libraries installed
This notebook is up-to-date.

Then in the cell " 4.2. Start Training", I removed the parameter --mxnet and I can train a network from scratch

** TORCH CUDA version installed and working. **
>>>> using GPU
>>>> training from scratch
>>>> during training rescaling images to fixed diameter of 30.0 pixels
Running test snippet to check if MKL-DNN working
see https://pytorch.org/docs/stable/backends.html?highlight=mkl
** MKL version working - CPU version is sped up. **
flows precomputed
flows precomputed
>>>> training network with 2 channel input <<<<
>>>> saving every 100 epochs
>>>> median diameter = 30
>>>> LR: 0.00020, batch_size: 8, weight_decay: 0.00001
>>>> ntrain = 27
>>>> ntest = 2
(2, 540, 540)
Epoch 0, Time  0.9s, Loss 1.8620, Loss Test 1.7541, LR 0.0000
Epoch 1, Time  1.6s, Loss 1.8322, Loss Test 1.7395, LR 0.0000
saving network parameters
Epoch 2, Time  2.4s, Loss 1.7611, Loss Test 1.7335, LR 0.0000
Epoch 3, Time  3.1s, Loss 1.6371, Loss Test 1.7248, LR 0.0001
Epoch 4, Time  3.8s, Loss 1.7533, Loss Test 1.7135, LR 0.0001
Epoch 5, Time  4.6s, Loss 1.7476, Loss Test 1.7077, LR 0.0001
Epoch 6, Time  5.3s, Loss 1.7540, Loss Test 1.7049, LR 0.0001

but I get an error (see below ) if I try to do transfer learning from cyto or nuclei model


** TORCH CUDA version installed and working. **
>>>> using GPU
>>>> pretrained model /root/.cellpose/models/cytotorch_0 is being used
>>>> during training rescaling images to fixed diameter of 30.0 pixels
Running test snippet to check if MKL-DNN working
see https://pytorch.org/docs/stable/backends.html?highlight=mkl
** MKL version working - CPU version is sped up. **
NOTE: computing flows for labels (could be done before to save time)
100% 27/27 [00:05<00:00,  4.94it/s]
NOTE: computing flows for labels (could be done before to save time)
100% 2/2 [00:00<00:00,  3.66it/s]
>>>> training network with 2 channel input <<<<
>>>> saving every 100 epochs
>>>> median diameter = 30
>>>> LR: 0.00020, batch_size: 8, weight_decay: 0.00001
>>>> ntrain = 27
>>>> ntest = 2
(2, 540, 540)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/site-packages/cellpose/__main__.py", line 258, in <module>
    main()
  File "/usr/local/lib/python3.7/site-packages/cellpose/__main__.py", line 237, in main
    batch_size=args.batch_size)
  File "/usr/local/lib/python3.7/site-packages/cellpose/models.py", line 642, in train
    learning_rate, n_epochs, momentum, weight_decay, batch_size, rescale)
  File "/usr/local/lib/python3.7/site-packages/cellpose/core.py", line 931, in _train_net
    train_loss = self._train_step(imgi, lbl)
  File "/usr/local/lib/python3.7/site-packages/cellpose/core.py", line 793, in _train_step
    y, style = self.net(X)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 191, in forward
    T0    = self.downsample(data)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 83, in forward
    xd.append(self.down[n](y))
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 46, in forward
    x = self.proj(x) + self.conv[1](self.conv[0](x))
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
    self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: Tensor for argument #2 'weight' is on CPU, but expected it to be on GPU (while checking arguments for cudnn_batch_norm)
Time elapsed: 0.0 hour(s) 0.0 min(s) 15 sec(s)

1 Like

Hi Romain. I think this is an issue with Cellpose itself and it looks like you are not alone: Tensor for argument #2 'weight' is on CPU, but expected it to be on GPU · Issue #211 · MouseLand/cellpose · GitHub
It seems that the model is not properly pushed to the GPU. You can try to fork the Cellpose repo and use the fix suggested in the previous link… or wait until it gets officially fixed.
Cheers,
Guillaume

2 Likes