Bio-formats `read_region` equivalent

I have a WSI i would like to tile.

I wrote a python function based on openslide-python read_region function, however, I would like to port it to its Bio-formats.

Is there any equivalent to this function in the Bio-formats library? I found some documentation that I think it do something similar but I am not quite sure if/how this could be done using python-bioformats. Some guidance would be very appreciated :slight_smile:

The python script would be very to the Java example for OverlappedTiledWriter which is on the docs you linked above.

I had also put together the following as an ImageJ macro example for converting to tiles with multiple sub resolutions. If you ignore the sub resolution parts then it should provide an example of how to achieve tiled conversion in python: https://github.com/dgault/bio-formats-examples/blob/6cdb11e8c64566611b18f384b3a257dab5037e90/src/main/macros/jython/OverlappedTiledPyramidConversion.py

Thanks for the feedback.

Is it necessary that the images are in OME-tiff? Most of my images are in Ventana bif and it would take some time to convert them all. Also, I work with Leica svs. It would be great if it was a โ€œgenericโ€ tiling tool.

Ah, I perhaps misunderstood. The Ventana bif are likely already tiled, so if simply want to read the image in tiles then you can use the below code.

file = "/path/to/inputFile.tiff"

# setup reader
reader = ImageReader()
omeMeta = MetadataTools.createOMEXMLMetadata()
reader.setMetadataStore(omeMeta)
reader.setId(file)

# set the tile sizes to be used (can be replaced with your own hardcoded values if needs be)
tileSizeX = reader.getOptimalTileWidth()
tileSizeY = reader.getOptimalTileHeight()
type = reader.getPixelType()


# read the tiles
for series in range(reader.getSeriesCount()):
	reader.setSeries(series)

	for image in range(reader.getImageCount()):
		width = reader.getSizeX()
		height = reader.getSizeY()

		# Determined the number of tiles to read and write
		nXTiles = int(math.floor(width / tileSizeX))
		nYTiles = int(math.floor(height / tileSizeY))
		if nXTiles * tileSizeX != width:
			nXTiles = nXTiles + 1
		if nYTiles * tileSizeY != height:
			nYTiles = nYTiles + 1;

		for y in range(nYTiles):
			for x in range(nXTiles):
				# The x and y coordinates for the current tile
				tileX = x * tileSizeX;
				tileY = y * tileSizeY;
				effTileSizeX = tileSizeX
				if (tileX + tileSizeX) >= width:
					effTileSizeX = width - tileX
				effTileSizeY = tileSizeY
				if (tileY + tileSizeY) >= height:
					effTileSizeY = height - tileY
				# Read tiles from the input file and write them to the output OME-Tiff
				buf = reader.openBytes(image, tileX, tileY, effTileSizeX, effTileSizeY)

Sorry, I think I am not explaining myself correctly.

The goal I want to accomplish is to tile a WSI in patches of a given size.

In the code I am using now I use read_region function from openslide-python and I saved the tile using .save function.

I would like to do this using bioformats, since it manages much better Ventana bif files than openslide does. I add here a code snippet of the function adapted from here in order to make things more clear:

def wsi2mosaic(image, size, overlap, level, drop_last=False, return_coords=False, only_list=False, check_tissue=True, prefix='', suffix='.png'):
    assert isinstance(image, openslide.OpenSlide), "input image should be an openslide wsi"
    assert level < len(image.level_dimensions), f"this image has only {len(image.level_dimensions)} levels"
    
    if type(size) is list:
        assert len(size) == 2, "size should be integer or [size_h, size_w]"
        s_h = size[0]
        s_w = size[1]
    else:
        assert isinstance(size, int), "size should be integer or [size_h, size_w]"
        s_h = size
        s_w = size
    
    if type(overlap) is list:
        assert len(size) == 2, "overlap should be integer or [overlap_h, overlap_w]"
        o_h = size[0]
        o_w = size[1]
    else:
        assert isinstance(overlap, int), "overlap should be integer or [overlap_h, overlap_w]"
        o_h = overlap
        o_w = overlap
    
    w_wsi, h_wsi = image.dimensions #! openslide image dimensions: WxH
    w_lvl, h_lvl = image.level_dimensions[level]
    
    box_coords_wsi = [0,0, h_wsi, w_wsi] #This way you avoid keeping only the biggest part of the tissue
    
    box_coords_wsi = [[box_coords_wsi[0], box_coords_wsi[1]],[box_coords_wsi[2], box_coords_wsi[3]]]
    box_coords_lvl = getScaledCoordinates(box_coords_wsi, [h_wsi,w_wsi], [h_lvl,w_lvl])
    h_box_lvl = box_coords_lvl[1][0] - box_coords_lvl[0][0]
    w_box_lvl = box_coords_lvl[1][1] - box_coords_lvl[0][1]
    assert h_box_lvl>s_h, f"tile height ({s_h}) should be less than box level height ({h_box_lvl})"
    assert w_box_lvl>s_w, f"tile width ({s_w}) should be less than box level width ({w_box_lvl})"
    
    x_ = np.arange(box_coords_lvl[0][0], box_coords_lvl[1][0]-s_h+1, s_h-o_h)
    y_ = np.arange(box_coords_lvl[0][1], box_coords_lvl[1][1]-s_w+1, s_w-o_w)
    
    if not drop_last:
        x_ = np.hstack([x_, [box_coords_lvl[1][0]-s_h]])
        y_ = np.hstack([y_, [box_coords_lvl[1][1]-s_w]])
    
    coords_ul = [(x,y) for x in x_ for y in y_]
    coords_br = [(x+s_h,y+s_w) for x in x_ for y in y_]
    coord_wsi_ul = getScaledCoordinates(coords_ul, [h_lvl, w_lvl], [h_wsi,w_wsi])
    coord_wsi_br = getScaledCoordinates(coords_br, [h_lvl, w_lvl], [h_wsi,w_wsi])

    coord_wsi = [(ul[0], ul[1], br[0], br[1]) for ul,br in zip(coord_wsi_ul, coord_wsi_br)]    
    
    if return_coords:
        return(coord_wsi)
    
    img_list = []
    f = open(f'{prefix}_coordinates.csv', 'w')
    f.write('coordinates\n')
    f.close()
        
    for COORD in coord_wsi:
        x_ul = COORD[0]
        y_ul = COORD[1]
        x_br = COORD[2]
        y_br = COORD[3]
        tile = image.read_region((y_ul, x_ul), level, (s_w, s_h))
        
        if check_tissue:
            tile_np = np.array(tile)
            if only_list:
                if hasEnoughTissue(tile_np):
                    f = open(f'{prefix}_coordinates.csv', 'a')
                    f.write('[{},{}]\n'.format(y_ul, x_ul))
            else:
                if hasEnoughTissue(tile_np):
                    tile.save(f'{prefix}_{level}_{x_ul}-{y_ul}-{x_br}-{y_br}_{suffix}')
                    f = open(f'{prefix}_coordinates.csv', 'a')
                    f.write('[{},{}]\n'.format(y_ul, x_ul))
                
        else:
                tile.save(f'{prefix}_{level}_{x_ul}-{y_ul}-{x_br}-{y_br}_{suffix}')
                f = open(f'{prefix}_coordinates.csv', 'a')
                f.write('[{},{}]\n'.format(x_ul, y_ul))

Ignoring the extra functions that are not pasted, the idea of the script is to tile an image at a given level. Here, what I would like to do is, ideally, change the tile = image.read_region((y_ul, x_ul), level, (s_w, s_h)) inside the for loop with something equivalent in python-bioformats.

That should be fairly straightforward, you will still need to setup the image reader at the start, but after that it should be a single call to openBytes to retrieve the tile:

# setup reader
reader = ImageReader()
omeMeta = MetadataTools.createOMEXMLMetadata()
reader.setMetadataStore(omeMeta)
reader.setId(file)

# rest of your code


# read a specific region
tile = reader.openBytes(level, x_ul, y_ul, s_w, s_h)
1 Like

Sorry if the question is quite naive but I am not able to run this commands.

Actually loci.formats imports do not work. What does loci mean? I am trying to perform the imports in a jupyter notebook, just in case is important and using python-bioformats (version 1.5.2)

The loci.formats etc are simply the package names for the particular classes being used. If you are using python-bioformats then you will instead need (from the python-bioformats docs: https://pythonhosted.org/python-bioformats/):

import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)

# your program goes here

javabridge.kill_vm()

And then import the different classes from the bioformats library right? Something like:

import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)

# setup reader
reader = bioformats.ImageReader()
omeMeta = bioformats.metadatatools.createOMEXMLMetadata()
reader.setMetadataStore(omeMeta)
reader.setId(file)

If this is the case Iโ€™ve got an error related to the ImageReader class which needs a path:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-f2349f40edb9> in <module>()
      1 # setup reader
----> 2 reader = bioformats.ImageReader()
      3 omeMeta = bioformats.metadatatools.createOMEXMLMetadata()
      4 reader.setMetadataStore(omeMeta)
      5 reader.setId(file)

~/anaconda3/envs/dlhisto_BioFormats/lib/python3.6/site-packages/bioformats/formatreader.py in __init__(self, path, url, perform_init)
    571         self.path = path
    572         if path is None:
--> 573             if url.lower().startswith("omero:"):
    574                 while True:
    575                     #

AttributeError: 'NoneType' object has no attribute 'lower'

I am working with Aperio (svs) and Roche (bif) but I am not quite sure how to specify it.

That error looks to be due to missing the path, so:

filename = "path/to/myFile.svs"
reader = bioformats.ImageReader(filename)

Coming back to this issue. So far I can manage to open a specific region of a WSI using the following code:

import javabridge
import bioformats
import bioformats.formatreader as F
from bioformats import metadatatools
javabridge.start_vm(class_path=bioformats.JARS, max_heap_size="50G")

file = "image1.svs"

ImageReader = F.make_image_reader_class()
reader = ImageReader()

omeMeta = bioformats.metadatatools.createOMEXMLMetadata()
reader.setMetadataStore(omeMeta)
#isRGB() = True and isInterleaved() = False
image = reader.openBytesXYWH(0, 2000, 6000, 10000, 10000)
image.shape = (3,10000,10000)
image = image.transpose(1, 2, 0)

Here, I have to mention that using openBytes give me a JavaException error no matter how increase the max_heap_size value of the java session. Instead, using openBytesXYWH works nicely (from here)

As I mentioned, this works nicely with svs files with reader.getImageCount() = 1 and reader.getSizeX() = 52199, in this case . My problem comes when I tried to use this code to some Roche Ventana images (bif) that have reader.getImageCount() = 12 and reader.getSizeX() = 1009. Plotting the image using openBytesXYWH shows only the sample ID from the scanned slide rather than any H&E image. Something like the upper part of the followihng image:
slide

How could I acces to any the H&E image from this kind of file (and any of the resolutions if possible) in order to extract the tiles from there?

To access the different images you will need to use setSeries to select the desired image. In this case you will have 12 images you can select from or iterate over:

reader.setSeries(seriesIndex);
sizeX = reader.getSizeX()
sizeY = reader.getSizeY()
image = reader.openBytesXYWH(0, 0, 0, sizeX, sizeY)
image.shape = (3,10000,10000)
image = image.transpose(1, 2, 0)

@dgault, thanks for your quick reply.
Sorry if I explain miself not properly.

If I do reader.getSeriesCount I got 1 in both svs and bif. However, when I do reader.getImageCount() i when I got 1 and 12, respectively.

To me it seems that inside the same bif file there are, at least, two images: one with the sample ID data (as mentioned in the previous post) and another with the WSI image per se. In this sense, I am not able to access the second one and I do not know how to specify the reader it in python bioformats.

To make things a little bit clearer, that is the pop up that I got when I open the image using QuPath:

As you may see, there are at least a couple of images in the bif file that you can select to open in QuPath, so I guess this should be also specified in python bioformats.

EDIT. Related to this, I found this issue where it seems that bioformats behaved similarly in previous versions. As mentioned in the issue

โ€œit seems that most of our Ventana samples have the XML on ifd 2 which is the first resolution (0 and 1 being a label and thumbnail)โ€

Could it be related?

Do you know which version of the Bio-Formats package you are using with python-bioformats? It may be worth trying to upgrade, the support for bif was added in Bio-Formats 6.2.0, which looks like it would only have been included in the latest python-bioformats 4.0.0 from Sept 2020.

1 Like

Thanks again. Updating to 4.0.0 (what a jump!) solves the issue. After update when I do reader.getSeriesCount() I got 12. This seems to be the issue in the previous version.

3 Likes