What is the recommended/best way to open OME-TIFFs recorded with MicroManger 1 and 2 from Python 3?

This is a (edited) cross-post of this one and I apologize, but I could neither edit it nor delete it. Maybe @team can support me with this?

Dear all,

I am looking for a good (meaning simple and robust) way to read multi-file OME-TIFFs recorded with MicroManager from Python with support for memory mapping. It should support files from both MicroManager 1.x and 2.x.

The OME-TIFF stacks consist of >60 files with 4.3 GB each, so memory mapping is a must. The axis-order is 'STCZYX' with shape (32, 480, 2, 1, 2048, 2048) (see also below)

Overall, trying to get this to work is turning out to be much more painful than anticipated, so I would really appreciate, if anyone could help me or comment on options to get this working.

Possible options I have tried/found so far:

  1. I have seen that there is the python-bioformats library. I tried installing this from bioconda (following instructions here), but got the error, that it only supports Python 2.x (although reading this thread it seems, like it coudl/should be supported(?!)):
    UnsatisfiableError: The following specifications were found
    to be incompatible with the existing python installation in your environment:
    
    Specifications:
    
      - python-bioformats -> python[version='<3']
    
    Your python: python=3.7
    
  2. The tifffile library, which I have already used and like. It supports memory mapping and runs, but I am having trouble using it. See my support thread here.
  3. pycromanager: I am not sure, if this can do what I need (and I have not yet tried it). It seems that this is primarily targeted at controlling MicroManager through a wrapper, but it seems that likely would also supports reading files written by MicroManager(?).
  4. aicsimageio: This seems to correctly load the images using this example code from the Github page:
    >>> img = AICSImage('/path/to/first_ome_multifile.ome.tif')
    >>> img.dims
    'STCZYX'
    >>> img.shape
    (32, 480, 2, 1, 2048, 2048)  # these are the expected dimension!
    
    >>> lazy_s0t0 = img.get_image_dask_data("CZYX", S=0, T=0)  # returns 4D CZYX dask array
    >>> s0t0 = lazy_s0t0.compute()  # returns 4D CZYX numpy array
    
    But it is extremely slow to load (on the order of minutes with significant CPU load) and for some reason s0t0 is empty even though img.dims reports expected dimensions.

Best regards and thanks,

Michael

1 Like

Thanks for notifying us. The default settings in this Discourse forum allow editing your posts for a limited time only:

(We might discuss whether we want to change this.)

@michaelmell I bumped your trust level from 1 to 2 now, hoping that this allows you to edit/delete the post as required.

2 Likes

Thank you. I just tried deleting it (I now could see the trash-icon), but it mentioned that I have to raise a flag to the mods (which I did). I assume this expected behavior(?).

@michaelmell

I deleted the previous post for you… so don’t worry. :slight_smile:

It’s straighttforward to open pycromanager data in python, but you have to have collected it in the first place with pycromanager. If you can do that, you can try using pycromanager to pull the opened data in micromanager into python (as in this example)

Opening data acquired with “standard” Micro-Manager is also straight forward with pycromanager. The api lets you specify the image (based on time, z, position, channel coordinates) in the data set, which will then be read and transferred to python where you will get it as a numpy array. Not sure if that approach is the memory mapping you were looking for.

Short example to get your started:

bridge = Bridge()
mmc = bridge.get_core()
mm = bridge.get_studio()

data_path = "D:\\tmp\\MyData"

store = mm.data().load_data(data_path, True)
max_t = store.get_max_indices().get_t()
cb = mm.data().get_coords_builder()
cb.t(0).p(0).c(0).z(0)

for t in range(0, max_t):
    img = store.get_image(cb.t(t).build())
    pixels = np.reshape(img.get_raw_pixels(), newshape=[img.get_height(), img.get_width()])

Hi Michael,

We are also in this situation. I would love to hear any more conclusions you make as they come up.

Tifffile and aicsimageio are the two we’ve evaluated. Our take so far is aicsimageio needs an optimization for its chunked reads into dask (or into zarr). See an issue I started here

Tifffile can load as zarr in a memory performant way, but it does not properly parse micro-manager ome-tiff metadata – it does not recognize scenes nor will it correctly parse non-master-ome-tiff files. In other words, I’m not sure it can be useful for multi-scene ome-tiffs from micro-manager. I plan to start an issue there as well

Regarding pycromanager, I believe it uses the new NDTiffStorage here, which seems to save as multi-page tiff with ome-tiff-like metadata, but is not formally ome-tiff. @henrypinkard can chime in more here.

Please validate the OME-XML and/or post a dataset.

Hi Christoph,

I’m not sure how to validate the OME-XML other than to load the same data into FIJI/Bioformats and see if that properly parses it. It does:

This dataset is available here on my gcloud
dimensions: (T, S, Z, C, Y, X) = (50, 2, 5, 3, 512, 512)

Is “S” the accepted abbreviation for “position” (i.e. XY location in physical space)? Any idea where that came from and what “S” stands for?

Hi Nico!
I only recently discovered that S is used in place of P in some notations. I believe it means “scene”.

Hi @bchhun ,

This is tentative, but I believe I have found a solution that works (still needs a more testing though). Thanks to the support of @cgohlke here (thank you very much!), I figured out that I can use tifffile after all to read images like in this minimal example:

import tifffile as tff
import zarr
import matplotlib.pyplot as plt

tif = tff.TiffFile(image_path)

s = 0  # position/scene/series
t = 0  # time-frame
c = 0  # channel

len(tif.series)
#Out: 32  # my tiff has 32 positions
position1_series = tif.series[s]
position1_series.axes
#Out: 'TCYX'  # my tiffs do not have a z-axis

# this is not memory-mapped (judging from memory usage)
position1 = position1_series.asarray()  # read as numpy-array
position1.shape
#Out: (480, 2, 2048, 2048)
img = position1[t, c, ...]
plt.imshow(img)
plt.show()


# this should be memory-mapped (judging from memory usage)
position1_zarr = zarr.open(position1_series.aszarr(), mode='r')  # read as zarr
position1_zarr.shape
#Out: (480, 2, 2048, 2048)
img_mm = position1_zarr[t, c, ...]
plt.imshow(img_mm)
plt.show()

I am not 100%, if my comments are correct and if this is the best (as in intended by the API) way of going at this. But it does what I need and reads TIFFs recorded with both MicroManager 1 and 2.

I hope this helps others and I would love to get comments on this, since I am still new to this.
Also, I will post here, if I find out anything else/run into issues.

Best and thanks,
Michael

Agreed, S means scene. This term is used at least in aicsimageio and in the ilbCZI implementation of the Zeiss CZI format. In other open-source representations, I think it maps to the concepts of multi-position § in Micro-Manager, multi-image in the OME model or multi-series in the Bio-Formats API.

There might be small variations in terms of expectations between these concepts but as far as I am aware they are largely overlapping.

Hey everyone!

For those following this thread I just wanted to drop a link to where I am tracking this conversation as it relates to aicsimageio specifically.

Here is the GitHub issue for “AICSImageIO Dask Reads on TIFF are slow”.

Now, addressing the S dimension.

So don’t hate me but yes in 3.x series of aicsimageio releases we refer to exactly as @s.besson describes, “S” is “Scene”, tifffile.series, or OME-equivalent “multi-Image”. Unfortunately (or depending on how you look it at, fortunately) we are changing this in 4.x.

Our 4.x API is now stateful to the selected image in a “multi-Image” file. I.E.
The planned aicsimageio 4.x API regarding “Scene” management is:

img = AICSImage("some-multi-image-file.ext")
img.scenes  # returns a tuple of all scene IDs
img.current_scene  # returns the scene ID for the active image
img.set_scene("some-other-image-id")  # updates the active image

This is because multi-image files can have different dtypes, shapes, dimensions, etc. and so we needed to address the fact that users were getting confused about why some multi-image files naively worked with our 3.x API and other didn’t (the most common occurance being a multi-image file with variable shapes per image).

So you might think, “no more S dimension then?”. Nope, in the upcoming 4.x API we now unpack RGB / RGBA / BGR data into an S dimension which the 4.x world now stands for “Samples”.

This would be a great time to get feedback though considering we haven’t released the 4.x API yet! If you hate the fact that we are remapping “S” to something else, please let me know and provide potentially a replacement dimension name / character.

1 Like

Remapping S into something that is not Scenes is obviously not what I would like to see… (I will have to adapt quite some code I guess)

We at Zeiss invested a lot in making our data format open and transparent. We at Zeiss will most likely stick to using S as the dimensions index for scenesfor CZI and libCZI and ZeissImgLib.

I struggle a bit with an alternative solutions… If in AICSImageIO the S refers now to something else, what would be the new index for scenes of an CZI.

1 Like

If in AICSImageIO the S refers now to something else, what would be the new index for scenes of an CZI.

Can’t tell if this is a question or not but will answer.

Reiterating, we just don’t have “Scene” dimension in the 4.x API. It’s handled through state of the object (img.scenes, img.current_scene, and img.set_scene)

OK, that sounds reasonable. Maybe I did not think long enough about.
Sure it will required adaptations but this is life. :blush:

Hi both,

I don’t have any immediate suggestions (especially not for OME-TIFF which is unlikely to see major model changes any time soon and especially not at this time on a Friday evening…) but I do wonder if there would be interest in capturing Zeiss’ learned wisdom of the S dimension and incorporate that into an #ome-ngff spec so that for the OME APIs the transformations cease to be lossy.

Hope all are well.
~Josh

cc: @dsudar

To follow up on this: at least for the files provided, tifffile does properly parse micro-manager ome-tiff metadata and does recognize all scenes. Tifffile does correctly parse the the non-master-ome-tiff file. The file is a BinaryOnly OME-TIFF and does not contain usable OME metadata or a reference to a master OME-TIFF file. Tifffile used a generic TIFF series parser as a fallback after logging a warning. The latest version will also try to fall back to ImageJ metadata if any. Besides that, both files contained corrupted MicroManager DisplaySettings metadata.

1 Like