Training stardist on a mixture of cell types - class imbalance

Dear forum,

I have a 20Tb dataset containing fluorescence images of cells circulating in the blood of cancer patients. I would like to use StarDist to segment the whole cells from immunofluorescence images of DAPI (nucleus), CD45–APC, and epithelial cytokeratin-PE. The segmentation will be used as an input to a DL network for classification.

Stardist works amazing when I train and test it on culture cell images (thank you to the stardist team for making it so available!). However, in patient samples, stardist tends to segment the nucleus, see the image below for some CK+ cells. The vast majority of cells in patient samples is a nucleus (possibly with a thin shell of CD45-APC), and <1% of cells is a nucleus with cytokeratin-PE. So I think the data doesn’t contain enough examples of PE+ cells and training focuses on segmenting the CD45+DAPI+ or just DAPI+ white blood cells.

afbeelding

  • Is it possible to overweight the PE+ events during training to compensate for this imbalance? For example, would it make sense to add a mask to indicate the weight of each event so the loss can be weighted for different classes?
  • I could train on smaller cutouts of the images containing PE+ cells (and the nearby white blood cells). The larger the cutout, the more PE- cells I include, so how large does the cutout need to be? The train_patch_size is (256, 256), does this mean 256x256 is the minimum?
  • I could add images of epithelial culture cells to the dataset. They are PE+, DAPI+ so I would get more training data in the direction I need to improve, but culture cells tend to be much brighter in PE and DAPI and larger than patient tumour cells. Would that help?

Some background:
The data is acquired using an FDA IVD-cleared system, and the dataset contains images from 1000’s of patients so I can’ t acquire new data (e.g. with an additional stain). After segmentation, I plan to use a DL-classification network that will take the segmented event plus a margin around the event, so some over/under segmentation is not a big problem.

The stardist configuration is pretty close to default: grid = (2, 2); n_dim = 2; n_rays = 32; n_channel_in = 3; net_conv_after_unet = 128; train_background_reg = 0.0001; train_batch_size = 4; train_learning_rate = 0.0003; train_patch_size = (256, 256)

My training set contains 57 images of ~1250x1000 pixels with ~ 6000 events, ~100 are circulating tumor cells (CK-PE+DAPI+), the rest is CD45-APC+DAPI+ or DAPI+, both probably white blood cells.

An update: Training with a selection of smaller regions that contain the minority class (and some ot the other classes) to reduce the class imbalance in the training set fixed this issue.

Hi @frank_coumans, welcome to the forum and sorry that your original post from January somehow slipped through my radar!

Dear Uwe @uschmidt83,

No worries, you guys are already quite active doing support, and I managed with your comments elsewhere. My final implementation beat several other algorithms for this segmentation task, so thank you for all the tools that made it possible. Stardist is extremely good at differentiating cells from debris, and at making sense of very dense images. There is a lot of debris in my samples as these patients are very sick. The only task it doesn’t solve very well is spindle-like cells (which look like the green staining in images below). Stardist makes multiple objects from the spindles (because it cannot describe with a single star convex shape), so we’re looking into merging adjacent outlines that have green.

I changed the normalization in your example to the function below because some images are empty or may contain a single very bright event. I’m including it in case it is of use to you. Window was set at ~10x cell diameter, delta at ~10x sd of background, sigma at 4.

We also looked at splinedist for this problem, but it didn’t handle the background in these samples very well.

Best,

Frank

Image from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3589570/pdf/nihms443802.pdf/?tool=EBI

def normalize_window(im, window, delta, sigma):

if im.ndim == 3:

patch = (window,window,1)

smooth = (window/sigma,window/sigma,0)

else:

patch = (window,window)

smooth = (window/sigma,window/sigma)

im_min = scipy.ndimage.minimum_filter(im,size=patch)

im_min = scipy.ndimage.gaussian_filter(im_min, sigma = smooth)

im_max = scipy.ndimage.maximum_filter(im,size=patch)

im_del = im_max-im_min

im_del[im_del<delta] = delta

im_del = scipy.ndimage.gaussian_filter(im_del, sigma = smooth)

im_out = (im-im_min)/im_del

return im_out

1 Like