Can't get ZeroCostDL4Mic to work with RGB images and Stardist "Versatile_H&E_nuclei"

Hello,

I am testing Stardist in this ZeroCostDL4Mic notebook in Google Colab, using the Warwick GlaS dataset and the Versatile_H&E_nuclei pre-trained model. However, my training fails with a cryptic error which I can’t work out but I’ve seen described earlier: Stardist 3D Training Error with Original Config Values.

In case you want to test this at your end, you can add the dataset automatically to the Google Colab notebook using the following code in a cell:

#@markdown ###Dataset URL and destination folder:

#@markdown Files are extracted, converted to TIFF and saved in the Training_source or Training_target depending on the presence of "Annotation_pattern" in the filename.

import requests
from tqdm import tqdm
import os
from zipfile import ZipFile
import io
from PIL import Image

def download(file_name, chunk_size = 8192):
    headers = {'user-agent': 'Wget/1.16 (linux-gnu)'}
    r = requests.head(url)
    file_size = int(r.headers['content-length'])
    r = requests.get(url, stream=True, headers=headers)

    with tqdm(total = file_size, desc=file_name, position=0, leave=True) as pbar:
        with open(file_name, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size): 
                if chunk:
                    f.write(chunk)
                    pbar.update(chunk_size)

url = "https://warwick.ac.uk/fac/sci/dcs/research/tia/glascontest/download/warwick_qu_dataset_released_2016_07_08.zip" #@param {type:"string"}
Annotation_pattern = "_anno" #@param {type:"string"}
Training_source = "/content/gdrive/MyDrive/projects/stardist/WarwickQUDataset" #@param {type:"string"}
Training_target = "/content/gdrive/MyDrive/projects/stardist/output" #@param {type:"string"}

file_name = url.split('/')[-1]

if not os.path.exists(file_name):
    download(file_name)

# Some inspiration from https://thispointer.com/python-how-to-unzip-a-file-extract-single-multiple-or-all-files-from-a-zip-archive/
with ZipFile(file_name, 'r') as zipObj:
    # Get a list of all archived file names from the zip
    listOfFileNames = zipObj.namelist()
    # Iterate over the file names
    imageFileNames = []
    for fileName in listOfFileNames:
        # Check filename endswith bmp
        if fileName.endswith('.bmp') and not '__MACOSX' in fileName:
            imageFileNames.append(fileName)

    with tqdm(total = len(imageFileNames), desc="Converting", position=0, leave=True) as pbar:
        for fn in imageFileNames:
            #spliting the zip path and filename extension
            f,ext = os.path.splitext(os.path.split(fn)[1])

            #where to save the converted image (depending on presence of annotation pattern)
            if Annotation_pattern == '' or not Annotation_pattern in f:
                pno = os.path.join(Training_source,f+".tif")
            else:
                pno = os.path.join(Training_target,f.replace(Annotation_pattern,'')+".tif")

            pbar.set_description("Saving %s" % pno)

            image_data = zipObj.read(fn)
            bytes_io = io.BytesIO(image_data)
            im = Image.open(bytes_io)
            im.save(pno,compression='tiff_lzw', tiffinfo={317: 2})
            bytes_io.close()
            pbar.update()

One added small change to the ZeroCostDL4Mic notebook, you need to pip install and import imagecodecs before tifffile to deal with the TIFF LZW images my code generates.

So, in 2 “Install StarDist and dependencies”, add this:

!pip install imagecodecs # this is required by tifffile to open LZW compressed images

import imagecodecs

Then steps 3.1 and 3.2 seem to work with the GlaS dataset (3.1 shows training source / target, and 3.2 augments the data).

In 3.3, I thought I would use the pretrained_model_choice: Versatile_H&E_nuclei

But then in 4.2 (Start training), I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-41e3ae2ad566> in <module>()
     15 # 'input_epochs' and 'steps' refers to your input data in section 5.1
     16 history = model.train(X_trn, Y_trn, validation_data=(X_val,Y_val), augmenter=augmenter,
---> 17                       epochs=number_of_epochs, steps_per_epoch=number_of_steps)
     18 None;
     19 

3 frames
/usr/local/lib/python3.6/dist-packages/csbdeep/utils/utils.py in _raise(e)
     88 
     89 def _raise(e):
---> 90     raise e
     91 
     92 

TypeError: exceptions must derive from BaseException

In the other thread, @uschmidt83 suggested the X and Y data might not be the right size. Out of the box, my Xs are (ny,nx,3) and Ys are (ny,nx).

Did I miss a colour deconvolution step somewhere? I’ll recheck the Stardist Neubias presentation…

In 4.1, I tried to change the definition of X and Y to either (neither worked):

X_val, Y_val = [X[i] for i in ind_val]  , [np.expand_dims(Y[i],axis=2) for i in ind_val]
X_trn, Y_trn = [X[i] for i in ind_train], [np.expand_dims(Y[i],axis=2) for i in ind_train] 

or

X_val, Y_val = [np.swapaxes(np.swapaxes(X[i],0,2),1,2) for i in ind_val]  , [np.expand_dims(Y[i],axis=0) for i in ind_val]
X_trn, Y_trn = [np.swapaxes(np.swapaxes(X[i],0,2),1,2) for i in ind_train], [np.expand_dims(Y[i],axis=0) for i in ind_train] 

So that I have these shapes for Xs and Ys I can pass to model.train(): (522, 775,3) (522, 775,1) or (3, 522, 775) (1, 522, 775).

Maybe I should try and fix that BaseException error first? Any help welcome! :slight_smile:

  1. It doesn’t look like StarDist is a good fit, since the glands are not (always) star-convex. Also, you will need to adjust the image size (and/or grid parameter) to let the neural network “see” an entire gland.
  2. The pre-trained H&E model has been trained to segment nuclei, hence might be a bad choice to start from for further training.

I don’t know what the ZeroCostDL4Mic notebooks do, so can’t really comment here. But how does the config look like (model.config)?

That is already fixed in the StarDist repository and will be part of the next release.

1 Like

I just looked at some of those images, and StarDist is really unsuitable for this task.

1 Like

Thanks Uwe,

I was trying to tackle one problem at a time. :blush:

Patch size issue with model.train():

I wanted to see what I was dealing with, so I changed csbdeep.utils like this:

def _raise(e):
    print(type(e))
    if type(e) == str:
        e = ValueError(e)
    raise e

which allowed me to confirm that the problem wasn’t with how X_trn and Y_trn are defined (ny,nx,3) and (ny,nx) is absolutely fine but with the patch size:

ValueError: Some images are too small for given patch_size (496, 496)

The following code confirmed it (tests taken from stardist/models/base.py, lines 94-95):

nD = 2
x_ndim = X_trn[0].ndim

[print(x.shape, y.shape, x.shape[:nD], y.shape, x.shape[:nD] == y.shape, y.ndim==nD and x.ndim==x_ndim and x.shape[:nD]==y.shape, x.shape[:nD]>=tuple((patch_size,patch_size))) for x,y in zip(X_trn,Y_trn)];

(442, 581, 3) (442, 581) (442, 581) (442, 581) True True False
(522, 775, 3) (522, 775) (522, 775) (522, 775) True True True
(522, 775, 3) (522, 775) (522, 775) (522, 775) True True True
(453, 589, 3) (453, 589) (453, 589) (453, 589) True True False
(522, 775, 3) (522, 775) (522, 775) (522, 775) True True True
(522, 775, 3) (522, 775) (522, 775) (522, 775) True True True
(433, 574, 3) (433, 574) (433, 574) (433, 574) True True False

So, let’s see if I can either zap some of these images or pad them (with white or black) to reach the patch size.

then check how to resolve the:

WARNING: median object size larger than field of view of the neural network.

I saw people have already asked about this here on the forum, also on Gihub and it is one of the Stardist FAQs.

Glands are not convex / star-convex objects

You are absolutely right and I will take any advice onboard. Yes, not the best dataset to check Stardist with, and I promise I won’t complain if Stardist doesn’t work here :slight_smile:

But even if this is not the best example, I want to check if we can run an entire workflow, all the way from training Stardist on annotated images (through your notebooks or the ZeroCostDL4Mic one),
to recognising glands on new images from within QuPath (at least some of the round ones).

config.json in 2D_versatile_he:

{"n_dim": 2, "axes": "YXC", "n_channel_in": 3, "n_channel_out": 33,
"train_checkpoint": "weights_best.h5", "train_checkpoint_last": "weights_last.h5", "train_checkpoint_epoch": "weights_now.h5", "n_rays": 32, "grid": [2, 2],
"backbone": "unet", "unet_n_depth": 3, "unet_kernel_size": [3, 3], "unet_n_filter_base": 32, "unet_n_conv_per_depth": 2, "unet_pool": [2, 2], "unet_activation": "relu",
"unet_last_activation": "relu", "unet_batch_norm": false, "unet_dropout": 0.0, "unet_prefix": "",
"net_conv_after_unet": 128, "net_input_shape": [null, null, 3], "net_mask_shape": [null, null, 1],
"train_shape_completion": false, "train_completion_crop": 32, "train_patch_size": [512, 512],
"train_background_reg": 0.0001, "train_foreground_only": 0.9, "train_dist_loss": "mae",
"train_loss_weights": [1, 0.1], "train_epochs": 200, "train_steps_per_epoch": 200,
"train_learning_rate": 0.0003, "train_batch_size": 8, "train_n_val_patches": 3,
"train_tensorboard": true, "train_reduce_lr": {"factor": 0.5, "patience": 50, "min_delta": 0},
"use_gpu": false}

Does this make some kind of sense?

Hi @EP.Zindy,

Sorry for the late answer. Yes currently the #ZeroCostDL4Mic StarDist notebook has not been tested (or created) to work on RGB images. This is principally due to the fact that we do not have a dataset to play with.
If this is something you would like to be implemented, let me know.

Cheers
Guillaume

Hey @Guillaume_Jacquemet,

I understand that RGB histology images aren’t the prime focus of your notebook, but your code makes such a nice framework, I think it’s worth the effort for me to try and pursue this. The idea would be to train networks with your notebook and then to segment new images in QuPath. In my mind, this makes quite a nice workflow (we’ll see what happens).

I’ll get on with it and if there’s anything I think you might find useful, I’ll let you know.

Cheers,
Egor

Hi,

Actually I think it would only take very minor modification to make it work. If you are willing to share a few images then I could make it happen!

Cheers
Guillaume

1 Like

This is the cell I’ve added to your notebook to download the GlaS dataset, but the PanNuke dataset would make a better candidate I think (for Stardist at least). I’ll check if I can write something quick :slight_smile:

Edit: booboo in my download function :roll_eyes:

#@markdown ###Dataset URL and destination folder:

#@markdown Files are extracted, converted to TIFF and saved in the Training_source or Training_target depending on the presence of "Annotation_pattern" in the filename.

import requests
from tqdm import tqdm
import os
from zipfile import ZipFile
import io
from PIL import Image

def download(url, file_name = None, chunk_size = 8192):
    if file_name is None:
        file_name = url.split('/')[-1]

    headers = {'user-agent': 'Wget/1.16 (linux-gnu)'}
    r = requests.head(url)
    file_size = int(r.headers['content-length'])
    r = requests.get(url, stream=True, headers=headers)

    with tqdm(total = file_size, desc=file_name, position=0, leave=True) as pbar:
        with open(file_name, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size): 
                if chunk:
                    f.write(chunk)
                    pbar.update(chunk_size)

url = "https://warwick.ac.uk/fac/sci/dcs/research/tia/glascontest/download/warwick_qu_dataset_released_2016_07_08.zip" #@param {type:"string"}
Annotation_pattern = "_anno" #@param {type:"string"}
Training_source = "/content/gdrive/MyDrive/projects/stardist/WarwickQUDataset" #@param {type:"string"}
Training_target = "/content/gdrive/MyDrive/projects/stardist/output" #@param {type:"string"}

file_name = url.split('/')[-1]

if not os.path.exists(file_name):
    download(url)

# Some inspiration from https://thispointer.com/python-how-to-unzip-a-file-extract-single-multiple-or-all-files-from-a-zip-archive/
with ZipFile(file_name, 'r') as zipObj:
    # Get a list of all archived file names from the zip
    listOfFileNames = zipObj.namelist()
    # Iterate over the file names
    imageFileNames = []
    for fileName in listOfFileNames:
        # Check filename endswith bmp
        if fileName.endswith('.bmp') and not '__MACOSX' in fileName:
            imageFileNames.append(fileName)

    with tqdm(total = len(imageFileNames), desc="Converting", position=0, leave=True) as pbar:
        for fn in imageFileNames:
            #spliting the zip path and filename extension
            f,ext = os.path.splitext(os.path.split(fn)[1])

            #where to save the converted image (depending on presence of annotation pattern)
            if Annotation_pattern == '' or not Annotation_pattern in f:
                pno = os.path.join(Training_source,f+".tif")
            else:
                pno = os.path.join(Training_target,f.replace(Annotation_pattern,'')+".tif")

            pbar.set_description("Saving %s" % pno)

            image_data = zipObj.read(fn)
            bytes_io = io.BytesIO(image_data)
            im = Image.open(bytes_io)
            #im.save(pno,compression='tiff_lzw', tiffinfo={317: 2})
            im.save(pno,compression=None)
            bytes_io.close()
            pbar.update()
1 Like

Sorry if I’ve completely bastardized the set by combining all the types of nuclei into one. The images are 256x256x3, the masks 256x256. The complete set is actually 3 folders but they’re really massive, so that just extracts 250 images from the first folder (there’s a line to comment out to get the whole set).

#pannuke dataset

import requests
from tqdm import tqdm
import os
from zipfile import ZipFile
import io
from PIL import Image
import numpy as np

def download(url, file_name = None, chunk_size = 8192):
    if file_name is None:
        file_name = url.split('/')[-1]

    headers = {'user-agent': 'Wget/1.16 (linux-gnu)'}
    r = requests.head(url)
    file_size = int(r.headers['content-length'])
    r = requests.get(url, stream=True, headers=headers)

    with tqdm(total = file_size, desc=file_name, position=0, leave=True) as pbar:
        with open(file_name, 'wb') as f:
            for chunk in r.iter_content(chunk_size=chunk_size): 
                if chunk:
                    f.write(chunk)
                    pbar.update(chunk_size)


url = "https://warwick.ac.uk/fac/sci/dcs/research/tia/data/pannuke/fold_1.zip" #@param {type:"string"}
Training_source = "/content/gdrive/MyDrive/projects/stardist/pannuke" #@param {type:"string"}
Training_target = "/content/gdrive/MyDrive/projects/stardist/output" #@param {type:"string"}

file_name = url.split('/')[-1]
if not os.path.exists(file_name):
    download(url)

# Some inspiration from https://thispointer.com/python-how-to-unzip-a-file-extract-single-multiple-or-all-files-from-a-zip-archive/
with ZipFile(file_name, 'r') as zipObj:
    # Get a list of all archived file names from the zip
    listOfFileNames = zipObj.namelist()

    with tqdm(total = len(listOfFileNames), desc="Extracting", position=0, leave=True) as pbar:
        for fileName in listOfFileNames:
            if ".npy" in fileName:
                pbar.set_description("Extracting %s" % fileName)
                d,fn = os.path.split(fileName)
                prefix = d.split("/")[-1]
                if not os.path.exists(os.path.join('/content',fn)):
                    zipInfo = zipObj.getinfo(fileName)
                    zipInfo.filename = fn
                    zipObj.extract(zipInfo, '/content')
            pbar.update()

    types = np.load("/content/types.npy",mmap_mode='r')
    images = np.load("/content/images.npy",mmap_mode='r')
    masks = np.load("/content/masks.npy",mmap_mode='r')

    if not os.path.exists(Training_source):
        os.mkdir(Training_source)

    if not os.path.exists(Training_target):
        os.mkdir(Training_target)

    n_images = images.shape[0]
    n_images = 250 #comment this out if you need all the images

    with tqdm(total = n_images, desc="Converting", position=0, leave=True) as pbar:
        for i in range(n_images):
            fn = prefix + "_" + types[i] + "_%d.tif" % i
            pbar.set_description("Saving %s" % fn)

            im = Image.fromarray(images[i,...].astype(np.uint8))
            pno = os.path.join(Training_source,fn)
            im.save(pno,compression=None)
            # `masks.npy` an array of 6 channel instance-wise masks (0: Neoplastic cells, 1: Inflammatory, 2: Connective/Soft tissue cells, 3: Dead Cells, 4: Epithelial, 6: Background)
            mask = np.zeros((256,256),np.uint8)
            tmax = 0
            for j in range(5):
                tmask = masks[i,:,:,j]
                nz = np.nonzero(tmask)
                if nz[0].shape[0] > 0:
                    mask[nz] = tmask[nz]+tmax
                    tmax += np.max(tmask)

            im = Image.fromarray(mask.astype(np.uint8))
            pno = os.path.join(Training_target,fn)
            im.save(pno,compression=None)

            pbar.update()
1 Like

Hi,

Thanks a lot for the reply. For some reason I didn’t get any notification though :frowning:

Anyhow I slightly modified the notebook and it should now handle RGB images pretty well. I will test it on the dataset you suggested sometimes soon!

Cheers