Tifffile OME-TIFF generation is taking too much RAM

While generating ome-tiff using tifffile library, the save method of TiffWriter it is asking for the nd-numpy array. In my scenario it’s 30k tile images each of 768KB (512x512x3). In order to generate OME-TIFF I have to load all the tile images into the memory then only I can pass it to the imsave method.

How to solve this problem or is it a limitation of tifffile library?

There’s an example in the docstring where a generator of tiles is used instead of the whole array. In that case you need to specify the image dtype and shape in addition to tile. For example:

import numpy
import tifffile

def tiles():
    for _ in range(2**15):
        yield numpy.arange(512*512*3, dtype='uint8').reshape(512, 512, 3)

tifffile.imwrite('slide.ome.tif', tiles(), bigtiff=True, dtype='uint8',
                 shape=(2**17, 2**16, 3), tile=(512, 512), compress='jpeg')

This will create a OME-TIFF compatible file. Note that tifffile cannot write pyramidal OME-TIFF because it cannot write SubIFDs.

3 Likes

Thanks @cgohlke. It worked.

Hi @cgohlke,

this may be worth extracting to a separate topic, but can you explain briefly if there is an inherent blocker for writing subIFDs via tifffile, or is it more an issue of needing a parallel implementation to your work in 2020.7.4?

All the best,
~Josh

I have started working on writing images to SubIFDs. There’s no inherent blocker. It’s just not trivial and will complicate the API and the code even more. The TiffWriter class is very monolithic and the API is designed around writing numpy ndarrays to TIFF. There’s no low level API for creating files from TIFF structures (IFDs, SubIFDs, Tags…) like libtiff. Tifffile will not support general IFD trees, just branches off the main IFD chain. That should be enough to support OME pyramids.

2 Likes

Makes sense. Thanks for the clarification, @cgohlke! ~Josh

tifffile v2020.9.3 can write numpy arrays to SubIFDs so it is now possible to create OME-TIFF pyramids.

5 Likes

Wow! Awesome @cgohlke. Can’t wait to give it a try.

Here’s an example to generate a pyramidal OME-TIFF. Not sure how to formally validate the file but QuPath loads it as expected:

import tifffile
import cv2  # for fast resizing

# load a 65536 x 65536 RGB example image
image = tifffile.imread('noise_yxs.tif')

with tifffile.TiffWriter('pyramidal.ome.tif', bigtiff=True) as tif:
    # use tiles and JPEG compression
    options = {'tile': (256, 256), 'compress': 'jpeg'}
    # save the base image
    tif.save(image, subifds=8, **options)
    # successively generate and save pyramid levels to the 8 SubIFDs
    for _ in range(8):
        image = cv2.resize(
            image,
            (image.shape[1] // 2, image.shape[0] // 2),
            interpolation=cv2.INTER_LINEAR
        )
        tif.save(image, **options)
5 Likes

Hi @cgohlke,

first of all, congrats on the release of tiffile v2020.9.3 and implementing the support for SubIFDs writing.

I did some testing and validation of the pyramidal data generated using the snippet mentioned above - see here for the exact conversion code.

Generally, the code worked well and without issue using both RGB and single channel source images. Typically, pyramidal OME-TIFF generated from images like this one (source: idr0082) are valid and render as expected using multi-resolution aware clients.

I encountered two categories of issues:

  • for very large image dimensions like this one or the Zebrafish TEM image (source: idr0053), imread fails with a memory error

    MemoryError: Unable to allocate 337. GiB for an array with shape (1, 1, 24, 70656, 71168, 3) and data type uint8
    
  • for images with asymmetrical XY dimensions like this one (source: idr0083) or this one (source: idr0082), a file is successfully is created and the metadata reported by tiffinfo looks legit. However, some of the intermediate resolutions seem to be distorted and partial - see the two screenshots below which are navigating between two resolution levels. I am unclear whether this is related to the OpenCV downsampling of the TIFF saving


As we are getting out of the scope of the initial topic, happy to migrate this report into either a separate forum post or a GitHub issue and keep helping with the testing/validation if useful. Let us know what your preference is.

4 Likes

OpenCV resize requires the output image size in (columns, rows) order. I corrected the example code.

That is expected as tifffile returns image data as numpy arrays. An option is to use a memory mapped numpy array on a fast drive as an output. There’s currently no high level API to access chunks of WSI images or nd-series but the lower level segments or decode functions can be used for individual tiles or strips. Rather than adding a proprietary chunked array API to tifffile, it’s probably better to map the tiles/strips in the TIFF file to a Zarr array, something like PyramidalOMETIFFStore.

2 Likes

I updated the code with the suggest fix and re-run the conversion for the two files mentioned in Tifffile OME-TIFF generation is taking too much RAM. The generated pyramidal OME-TIFF files now validate and load as expected.

2 Likes

Hi @cgohlke,

Thanks for supporting saving Bioformats pyramidal tiff, it’s very helpful.

On a related note, I’m working with many-channel images and would like to save one channel at a time with the following example snippet

import tifffile
import numpy

data = (
    numpy
    .arange(1024*1024*10, dtype='uint8')
    .reshape((10, 1024, 1024))
)

def per_channel(img):
    for c in range(len(img)):
        yield img[c]

with tifffile.TiffWriter('temp.ome.tif', bigtiff=True) as tif:
    tif.write(
        data=per_channel(data),
        shape=data.shape,
        dtype='uint8',
        tile=(256, 256),
        subifds=2
    )
    tif.write(
        data=per_channel(data[:, ::2, ::2]),
        shape=data[:, ::2, ::2].shape,
        dtype=np.uint8,
        tile=(256, 256),
        subfiletype=1
    )
    tif.write(
        data=per_channel(data[:, ::4, ::4]),
        shape=data[:, ::4, ::4].shape,
        dtype=np.uint8,
        tile=(256, 256),
        subfiletype=1
    )

but it errors out with this message

ValueError                                Traceback (most recent call last)
<ipython-input-121-9c55a8981bd6> in <module>
     21         dtype='uint8',
     22         tile=(256, 256),
---> 23         subifds=2,
     24     )
     25 

~/anaconda3/envs/ashlar/lib/python3.7/site-packages/tifffile/tifffile.py in write(***failed resolving arguments***)
   2519                             continue
   2520                         if chunk.nbytes != tilesize:
-> 2521                             chunk = pad_tile(chunk, tileshape, datadtype)
   2522                         fh.write_array(chunk)
   2523             elif compress:

~/anaconda3/envs/ashlar/lib/python3.7/site-packages/tifffile/tifffile.py in pad_tile(tile, shape, dtype)
  15187     """Return tile padded to tile shape."""
  15188     if tile.dtype != dtype or tile.nbytes > product(shape) * dtype.itemsize:
> 15189         raise ValueError('invalid tile shape or dtype')
  15190     pad = tuple((0, i - j) for i, j in zip(shape, tile.shape))
  15191     return numpy.pad(tile, pad)

ValueError: invalid tile shape or dtype

It runs through when tile=None - is it not possible to combine data as an iterable and saving as tiled tiff, or I was doing it wrong?

Correct, that case is currently not supported. Tifffile expects an iterator of tiles:

def per_channel(img, tile=(256, 256)):
    for c in range(img.shape[0]):
        for y in range(0, img.shape[1], tile[0]):
            for x in range(0, img.shape[2], tile[1]):
                yield img[c, y : y + tile[0], x : x + tile[1]]
2 Likes

Thank you very much @cgohlke replacing my per_channel function with yours worked :slight_smile: (I’m editing mine in the thread to match your better pattern) Thanks for the prompt reply and the great tool!

1 Like