Hi @Guillaume_Jacquemet ,
It’s different dataset (waiting for better annotations for the previous project)
That would be very useful
But then we should provide the actual dimension in our dataset at some point, don’t we? Or is it automatically using the function determine scale during training?
I did a couple of test and using the cell below before the step 2 (inspired by this )
!wget -c https://repo.continuum.io/archive/Anaconda3-2020.02-Linux-x86_64.sh
!chmod +x Anaconda3-2020.02-Linux-x86_64.sh
!bash ./Anaconda3-2020.02-Linux-x86_64.sh -b -f -p /usr/local
!conda install pyqt cudnn==7.6.0 cudatoolkit=10.1 -q -y --prefix /usr/local
!conda install pytorch -q -y --prefix /usr/local -c pytorch
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
The output of the cell “2.Install Cellpose and dependencies” is :
** TORCH CUDA version installed and working. **
Libraries installed
This notebook is up-to-date.
Then in the cell " 4.2. Start Training", I removed the parameter --mxnet
and I can train a network from scratch
** TORCH CUDA version installed and working. **
>>>> using GPU
>>>> training from scratch
>>>> during training rescaling images to fixed diameter of 30.0 pixels
Running test snippet to check if MKL-DNN working
see https://pytorch.org/docs/stable/backends.html?highlight=mkl
** MKL version working - CPU version is sped up. **
flows precomputed
flows precomputed
>>>> training network with 2 channel input <<<<
>>>> saving every 100 epochs
>>>> median diameter = 30
>>>> LR: 0.00020, batch_size: 8, weight_decay: 0.00001
>>>> ntrain = 27
>>>> ntest = 2
(2, 540, 540)
Epoch 0, Time 0.9s, Loss 1.8620, Loss Test 1.7541, LR 0.0000
Epoch 1, Time 1.6s, Loss 1.8322, Loss Test 1.7395, LR 0.0000
saving network parameters
Epoch 2, Time 2.4s, Loss 1.7611, Loss Test 1.7335, LR 0.0000
Epoch 3, Time 3.1s, Loss 1.6371, Loss Test 1.7248, LR 0.0001
Epoch 4, Time 3.8s, Loss 1.7533, Loss Test 1.7135, LR 0.0001
Epoch 5, Time 4.6s, Loss 1.7476, Loss Test 1.7077, LR 0.0001
Epoch 6, Time 5.3s, Loss 1.7540, Loss Test 1.7049, LR 0.0001
but I get an error (see below ) if I try to do transfer learning from cyto or nuclei model
** TORCH CUDA version installed and working. **
>>>> using GPU
>>>> pretrained model /root/.cellpose/models/cytotorch_0 is being used
>>>> during training rescaling images to fixed diameter of 30.0 pixels
Running test snippet to check if MKL-DNN working
see https://pytorch.org/docs/stable/backends.html?highlight=mkl
** MKL version working - CPU version is sped up. **
NOTE: computing flows for labels (could be done before to save time)
100% 27/27 [00:05<00:00, 4.94it/s]
NOTE: computing flows for labels (could be done before to save time)
100% 2/2 [00:00<00:00, 3.66it/s]
>>>> training network with 2 channel input <<<<
>>>> saving every 100 epochs
>>>> median diameter = 30
>>>> LR: 0.00020, batch_size: 8, weight_decay: 0.00001
>>>> ntrain = 27
>>>> ntest = 2
(2, 540, 540)
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/site-packages/cellpose/__main__.py", line 258, in <module>
main()
File "/usr/local/lib/python3.7/site-packages/cellpose/__main__.py", line 237, in main
batch_size=args.batch_size)
File "/usr/local/lib/python3.7/site-packages/cellpose/models.py", line 642, in train
learning_rate, n_epochs, momentum, weight_decay, batch_size, rescale)
File "/usr/local/lib/python3.7/site-packages/cellpose/core.py", line 931, in _train_net
train_loss = self._train_step(imgi, lbl)
File "/usr/local/lib/python3.7/site-packages/cellpose/core.py", line 793, in _train_step
y, style = self.net(X)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 191, in forward
T0 = self.downsample(data)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 83, in forward
xd.append(self.down[n](y))
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cellpose/resnet_torch.py", line 46, in forward
x = self.proj(x) + self.conv[1](self.conv[0](x))
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 2058, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: Tensor for argument #2 'weight' is on CPU, but expected it to be on GPU (while checking arguments for cudnn_batch_norm)
Time elapsed: 0.0 hour(s) 0.0 min(s) 15 sec(s)