Stardist prediction kernel dies

Hi,

I would like to use #stardist (3D) for the segmentation of a large (290, 1024, 1024) (z, y, x) confocal microscope image, but during the prediction the #jupyter kernel of the notebook crashes without any error message.

What I did so far:

I started to used an edged detection method (LoG) to create an initial data. Due to under-segmentation, the results are not usable for the type of analysis we have in mind. So we manually curated the data set with #napari.

After one week of curating and annotating, we ended up with a fully annotated volume with the dimensions (100, 476, 714) which contains approx 1000 rod-shaped bacteria.

This volume I split into 7 sub-volumes with (100, 476, 102) px each:

  • 6 for training
  • 1 for validation
  • 1 for later testing

Thanks to data augmentation, I succeeded to train a stardist3d network with the following configurations:

anisotropy=(1.6521739130434783, 1.0, 1.1875),
axes='ZYXC'
backbone='resnet'
grid=(1, 2, 2)
n_channel_in=1
n_channel_out=97
n_dim=3
n_rays=96
net_conv_after_resnet=128
net_input_shape=(None, None, None, 1)
net_mask_shape=(None, None, None, 1)
rays_json={
    'name': 'Rays_GoldenSpiral',
    'kwargs': {
        'n': 96,
        'anisotropy': (1.6521739130434783, 1.0, 1.1875)}}
resnet_activation='relu'
resnet_batch_norm=False
resnet_kernel_init='he_normal'
resnet_kernel_size=(3, 3, 3)
resnet_n_blocks=4
resnet_n_conv_per_block=3
resnet_n_filter_base=32
train_background_reg=0.0001
train_batch_size=1
train_checkpoint='weights_best.h5'
train_checkpoint_epoch='weights_now.h5'
train_checkpoint_last='weights_last.h5'
train_dist_loss='mae'
train_epochs=400
train_learning_rate=0.0003
train_loss_weights=(1, 0.2)
train_n_val_patches=None
train_patch_size=(100, 100, 100)
train_reduce_lr={'factor': 0.5, 'patience': 40, 'min_delta': 0}, train_steps_per_epoch=100
train_tensorboard=True
use_gpu=True)

For the training I used a single GTX 980 with 4 GBs of VRAM. This is the reason why I limited the patch size to (100, 100, 100). This was simply the first patch size which worked.

Technically I have access to GPUs with larger VRAM (and longer waiting times …), but I always prefer quick iterations over perfect results during testing.

In tensorboard I got the following loss curves:

The prediction with the test image (here cropped) works quite nice. To only draw-back are over-segmented cells (i.e the marked ones).

Compared with our previous efforts to tackle our problems with a classical segmentation pipeline in MATLAB; 1.5 weeks for annotation, python coding, setup, and training is ridiculously fast.

Finally I gave the larger volume a try:

from __future__ import print_function, unicode_literals, absolute_import, division
import sys, os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

from glob import glob
from tifffile import imread
from csbdeep.utils import Path, normalize

from stardist import random_label_cmap
from stardist.models import StarDist3D

np.random.seed(6)
lbl_cmap = random_label_cmap()

model = StarDist3D(None, name='stardist', basedir='models')

# contains only a single tif stack:
X = glob('largedatasets/*.tif') 
X = list(map(imread, X)

X= [normalize(x, 1,99.8) for x in X]

print(X[0].shape)
# returns (290, 1024, 1024)

labels= model.predict(X[0], n_tiles=(3, 11, 11))

(Heavily inspired by the corresponding stardist example notebook)

Shortly after the final progress bar reaches 100%, the #jupyter kernel dies and all labels are lost … :cry:

My questions so far:

  • Has anyone an idea what could cause the kernel to crash?
  • Has anyone seen over-segmentation like the one shown above and knows how to deal with it? (Would be awesome if we could eliminate this without post-processing or - even worst - by annotating more training data)
  • (Are there other improvements possible?)

Eric

I would ping @mweigert and @uschmidt83 to get opinions from the experts. Though perhaps changing the NMS threshold yourself might help with the over segmentation. It’s a value in the exported .json file if I’m not mistaken.

Regarding the dying Kernel, my suggestion is to try it in plain old Python and see if you get some more explicit errors there.

Also try the prediction without GPU to see if that works. You don’t need mental GPU power for prediction, and it’ll give your access to all the RAM in your machine.

1 Like

Hi oburri,

thank you for the quick answer!

I ran the program above line-by-line in the terminal to check my assumptions. Besides issues with the input data, I notices that I really should use the line

labels, _ = model.predict_instances(X[0])

for prediction.

When I run this line, I receive the following warning (?) messages:

2020-03-08 14:43:37.425140: W tensorflow/core/framework/allocator.cc:107] Allocation of 38788923392 exceeds 10% of system memory.
2020-03-08 14:44:17.288381: W tensorflow/core/framework/allocator.cc:107] Allocation of 38788923392 exceeds 10% of system memory.
2020-03-08 14:44:40.044527: W tensorflow/core/framework/allocator.cc:107] Allocation of 38788923392 exceeds 10% of system memory.
2020-03-08 14:44:40.044588: W tensorflow/core/framework/allocator.cc:107] Allocation of 38788923392 exceeds 10% of system memory.
2020-03-08 14:45:01.654185: W tensorflow/core/framework/allocator.cc:107] Allocation of 38788923392 exceeds 10% of system memory.

Qhull output at end
Qhull precision warning: repartition point p53 from f328 as a outside point above a hidden facet f228 dist 0.0004 nearest vertices 0.064

Qhull output at end
Qhull precision warning: repartition point p34 from f276 as a outside point above a hidden facet f166 dist 2.8e-07 nearest vertices 0.029

These messages however do not seem to noticeably affect the labels.

This evening I will probably test on larger GPUs since it really looks like a memory issue.

Eric

I tested again with jupyter notebook:

Even on a 32 GB GPU (Tesla V100) I need tiling to predict the (290, 1024, 1024) px volume. For me

labels, _ = model.predict_instances(X[0], n_tiles=(2, 2, 3))

worked fine. Further I needed approx 135 GB RAM for the instance calculations. The instance calculations on the CPU took longer (approx 5:43min) than the distance and probability predictions on the GPU (approx 3:23min).

Given enough RAM, I probably can use the smaller GPUs.

The problem with the over-segmentation remains.

Concerning the RAM issues:

Hi Eric, I’m glad that you’re somewhat happy with the results.

In some sense, StarDist wasn’t really designed for rod-shaped objects, but it seems to do a reasonable job. I’m not really surprised about the over-segmentation. Ideas to make this problem less severe:

  • I can see that the partial segments have “pointy” ends, which is likely due to there not being enough rays to produce a smooth shape. Try increase the number of rays, e.g. double to 196 and see if that helps at all. (I know this is problematic memory-wise. Try using a big GPU for that test.)
  • Try setting backbone='unet' and unet_n_depth=3 in the Config, because the ResNet might not have a large enough receptive field to see the entire object shape.
  • As @oburri suggested, experiment with smaller (~0.1) and larger (~0.7) values of nms_thresh for predict_instances. This will likely not solve the problem, but may give you options as to which kinds of mistakes you can deal better with later on.
  • What kind of data augmentation did you use? This might have a big effect on the results.

The messages are quite normal.

The non-maximum suppression on the CPU can require quite a lot of memory and compute time if there are a lot of object “candidates”. It is typically the limiting factor for densely-packed 3D images.
To make this problem less severe, we are currently developing a way to do that computation in overlapping blocks (see here for more details).

Sorry for the late reply, I’ve been traveling last week.

Best,
Uwe

1 Like