OME Tiff Z stack to bigtiff

Hello,
We are recording with uManager volumes with 30 stacks per volume and across time. The timseries span for thousands of volumes, say 100000.
The data is bigger than 1 ome.tiff file (4GB) therefore it is written in several files, like: Stack.ome.tiff, Stack1.ome.tiff, Stack2.ome.tiff, etc.
We are trying now to convert this in python to a big tiff file.
Ideally we would like to save it that it has this format for an image that is 600 pixels width and 600 pixels height, with the following shape: 100000, 30, 600, 600. (t, z, x,y)

Before we had a code to convert ome.tiff to big tiff for a time series which did not have z-stacks:

with tiff.TiffWriter(output_filename, bigtiff=True) as output_tif:
        for file in natsorted(os.listdir(path)):
                with tiff.TiffFile(os.path.join(path,file), multifile=False) as tif:
                    hyperstack = tif.asarray()
                    output_tif.save(hyperstack, photometric='minisblack')

This code however does not work with our z-stack, we think because every ome.tiff file does not always have a complete volume in the last volume.

Like it might be that Stack.ome.tiff was full after 10 planes of the last volume, and therefore it has 20 planes missing. The next file Stack1.ome.tiff starts with the 20 last planes of the volume and then starts with new full volume again.

To solve that we declare a ‘buffer’ variable that keeps all the frames and only writes them when it is full:

    actually_write=True
    num_slices=30
    total_num_frames = 0
    buffer = []
    with tiff.TiffWriter(output_filename, bigtiff=True) as output_tif:
        for i_file, file in enumerate(natsorted(os.listdir(path))):
            this_ome_tiff = os.path.join(path,file)
            print("Currently reading: ")
            print(this_ome_tiff)
            with tiff.TiffFile(this_ome_tiff, multifile=False) as tif:
                for i, page in enumerate(tif.pages):
                    print(f'Page {i}/{len(tif.pages)} in file {i_file}')
                    # Bottleneck line
                    img = page.asarray()
                    total_num_frames += 1
                    if num_slices is None:
                        if actually_write:
                            output_tif.save(img, photometric='minisblack')
                    else:
                        buffer.append(img)
                        if len(buffer) >= num_slices:
                            print(f"Writing {num_slices} frames from buffer...")
                            for img in buffer:
                                if actually_write:
                                    output_tif.save(img, photometric='minisblack')
                            buffer = []
            if len(buffer)>0:
                print(f"{len(buffer)} frames not written")

This function works but we would like it to also do:

  1. Save the stacks as another dimension so that the file instead of having dimensions (t=total_planes,height,width) would have (t=volumes, z=planes/volume, height, width)
  2. Somehow the current format saves every plane as tiff series, instead of 1 series several pages, which is what we are used to.
    Basically If we run the code below with one of the generated big tiff files (with 640 planes):
input_filepath='result.btf'

with tiff.TiffFile(input_filepath) as tif:
    print(tif)
    print(len(tif.series))
    print(len(tif.series[0].pages))

we get:

TiffFile 'result.btf'  769.29 MiB  BigTiff  shaped  640 Pages  640 Series
640
1

And we would like to get 1 Series and 640 pages.

When we do for a file with 100000 it stays forever loading and we can’t see any output. (The file is ~80GB). We would like to loop through every volume or plane to do some operations.

Any help would be highly appreciated, thank you!

I have only had a quick read through but a couple of initial thoughts, the standard Tiff/BigTiff wont be able to recognise the Z dimension so that is likely part of the problems. Also the OME-Tiff format itself can write BigTiff, attempting to write a file size > 4GB will automatically switch to using BigTiff, or using the extensions .ome.tf2, .ome.tf8 or .ome.btf when writing the files will also use BigTiff.

It doesn’t look like MicroManager currently supports this which is why you get the files in 4GB blocks, but performing a single conversion with the bfconvert command line tool should give a single bigTiff with the Z stacks (available from: Bio-Formats Downloads | Open Microscopy Environment (OME) docs: Converting a file to different format — Bio-Formats 6.6.1 documentation) :

you may need to create a .pattern file for grouping the files (Grouping files using a pattern file — Bio-Formats 6.6.1 documentation)
then run bfconvert path/to/input/stack.pattern path/to/output/stack.ome.btf

1 Like

Storing an array of shape (100_000, 30, 600, 600) as BigTIFF is very inefficient. The file will contain 3_000_000 pages/IFDs. HDF5 or zarr should perform better…

If you want to view or process the data in ImageJ, consider the ImageJ hyperstack TIFF based format. It can be created and manipulated via memory-mapping in tifffile and opened as a virtual stack in ImageJ. In the following example hyperstack is a numpy array stored inside the TIFF file:

import tifffile

# create an empty TZYX ImageJ hyperstack
hyperstack = tifffile.memmap(
    'imagej.tif',
    shape=(100000, 30, 600, 600),
    dtype='uint8',
    imagej=True,
    metadata={'axes': 'TZYX'},
)

print(type(hyperstack), hyperstack.shape)

# write an image to the numpy.memmap array
hyperstack[t_index, z_index] = image
hyperstack.flush()
1 Like

Hi, Thank you all for the quick response.

@dgault Thanks. Looks like this would be a good option. I crated the .pattern file and run the code but I get the following error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
	at java.lang.StringCoding.encode(StringCoding.java:344)
	at java.lang.String.getBytes(String.java:918)
	at loci.common.xml.XMLTools.parseDOM(XMLTools.java:213)
	at loci.formats.services.OMEXMLServiceImpl.getOMEXMLVersion(OMEXMLServiceImpl.java:431)
	at loci.formats.services.OMEXMLServiceImpl.transformToLatestVersion(OMEXMLServiceImpl.java:199)
	at loci.formats.services.OMEXMLServiceImpl.createOMEXMLMetadata(OMEXMLServiceImpl.java:373)
	at loci.formats.services.OMEXMLServiceImpl.createOMEXMLMetadata(OMEXMLServiceImpl.java:360)
	at loci.formats.in.OMETiffReader.isThisType(OMETiffReader.java:232)
	at loci.formats.FormatReader.isThisType(FormatReader.java:631)
	at loci.formats.in.OMETiffReader.isThisType(OMETiffReader.java:172)
	at loci.formats.ImageReader.isThisType(ImageReader.java:864)
	at loci.formats.ImageReader.getReader(ImageReader.java:193)
	at loci.formats.ImageReader.fileGroupOption(ImageReader.java:614)
	at loci.formats.ReaderWrapper.fileGroupOption(ReaderWrapper.java:395)
	at loci.formats.FileStitcher.initFile(FileStitcher.java:946)
	at loci.formats.FileStitcher.setId(FileStitcher.java:925)
	at loci.formats.in.FilePatternReader.initFile(FilePatternReader.java:230)
	at loci.formats.FormatReader.setId(FormatReader.java:1421)
	at loci.formats.ImageReader.setId(ImageReader.java:849)
	at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:506)
	at loci.formats.tools.ImageConverter.main(ImageConverter.java:1095)

I run this in a cluster job where I allocated 600GB (All ome tiff files together add up to 250GB), so I don’t understand where this OutOfMemoryError could originate.

The pattern file is a text file where I have written:

2021.*ome.tif

(All files start with 2021)

@cgohlke Thank you too. I was able to create the image with the following code:

import tifffile as tiff
from natsort import natsorted
import os
import numpy as np
import matplotlib.pyplot as plt

#create an empty TZYX ImageJ hyperstack
hyperstack_path='ImageJ_Stack16bit.tif'
hyperstack = tiff.memmap(
    hyperstack_path,
    shape=(10000, 20, 700, 900), #my new recording has 20 planes and has 700px width and 900 px height
    dtype='uint16',# my data is 16bits
    imagej=True,
    metadata={'axes': 'TZYX'},
)

print(type(hyperstack), hyperstack.shape)

c=0
z_index=0
t_index=0
for file in natsorted(os.listdir(path)):
    if file.endswith('ome.tif'):
        print(os.path.join(path,file))
        with tiff.TiffFile(os.path.join(path,file), multifile=False) as tif:
            for idx,page in enumerate(tif.pages):
                img = page.asarray()
                hyperstack[t_index, z_index] = img
                hyperstack.flush()
                c=c+1
                z_index=z_index+1
                if z_index==20:#20 is the number of planes per volume
                    z_index=0
                    t_index=t_index+1

However I still don’t manage to loop (in python) through the file that I have created.

input_filepath='ImageJ_Stack16bit.tif'

with tiff.TiffFile(input_filepath) as tif:
    print('looping through reader object..')
    for idx, page in enumerate(tif.pages):
        img=page.asarray()
        plt.imshow(img)
        plt.show()

Only shows the first frame. I don’t know how to access the rest of the frames (which I guessed they would be pages)
if I do:

tif.series[0].pages.shape

(10000, 20, 700, 900)

I get the correct dimensions, but I don’t know how to go frame by frame or volume by volume.

Any help on how to loop through the big tiff file frames would be appreciated, Thank you.

(Minor thing: When I open it in Fiji I have 20 color channels and 10000 z stacks).

You will need to set the BF_MAX_MEM environment variable (Command line tools — Bio-Formats 6.6.1 documentation)

You can do this using:
export BF_MAX_MEM=30g (or set BF_MAX_MEM = 30g on Windows)

You are likely using an outdated version of tifffile that doesn’t recognize the axes metadata.

There is only one TIFF IFD/page in the ImageJ hyperstack format if the file size exceeds 4 GB. The idea is to avoid creating and looping through millions of TIFF pages but instead access the image data though memory mapping (or the zarr interface):

hyperstack = tifffile.memmap('imagej.tif')
for t in range(hyperstack.shape[0]):
    for z in range(hyperstack.shape[1]):
        frame = hyperstack[t, z]

Thanks again for quick replies.

@dgault I hope this will solve it. Unfortunately I don’t know where I should write the code (I’m on a Unix server)
I have tried:

bash export BF_MAX_MEM=300g
bash /code/bftools/bfconvert file.pattern stack.ome.btf

and

bash /code/bftools/bfconvert -BF_MAX_MEM=300g file.pattern stack.ome.btf

or

bash /code/bftools/bfconvert -BF_MAX_MEM 300g file.pattern stack.ome.btf

none of them works. I am sure it is super easy but I did not figure it out.

@cgohlke Thanks, this worked beautifully.

I checked the version of tifffile (2020.10.1) and upgraded it. It is running now…

The memory mapping is very cool. I guess in my task it does not make a huge difference because I still want to loop through all planes, but if I only wanted to check the middle plane of a volume it should help a lot, is this right?

Before memory mapping I would do the following to access middle plane:

input_filepath='ImageJ_Stack16bit.tif'

with tiff.TiffFile(input_filepath) as tif:
    for idx, page in enumerate(tif.pages):
        if idx%10==0: #only do something if it is the middle plane (divisible by 10 for 20 stacks volume)
            img=page.asarray()
            plt.imshow(img)
            plt.show()

Where can I learn more about memory mapping and memory mapping with the tifffile packag? I have read this thread: How to read TIFF files from MicroManager with memory-mapping · Issue #52 · cgohlke/tifffile · GitHub

You shouldn’t need the bash in front of the export, just export BF_MAX_MEM=300g should be fine, or instead BF_MAX_MEM=300g /code/bftools/bfconvert file.pattern stack.ome.btf should also work