Converting other IDR images into public .zarr

Hi all, hi @manics, @joshmoore ,

I am playing with zarr (like quite a lot of people it seem?) and stumbled upon the images from IDR that you converted to zarr as examples (as described here for example: Mirroring IDR .zarr datasets) and I was wondering if you could give image 9846151 (from IDR0048) that public zarr treatment? I could then use it along with the STLs of the segmented volumes to build examples of our upcoming library; it could also serve as an example of re-analysis of IDR data.

Many thanks

1 Like

Thanks for the request, @Anatole_Chessel. Great to hear what people are interested in trying out! I’ve kicked off the conversion of and will get it uploaded to S3 when its done. If you or anyone else is interested in running the conversion locally, whether via IDR, OMERO or bioformats2raw, just say the word.

All the best,

I would like to know how to convert datasets locally to ZARR. Is there script or python package to achieve this?

1 Like

It really seems that many people are interested in this at the same time, see this thread Downloading RAW timelapse movies from IDR / Omero (ideally in Python)

Following that discussion last week I wrote a python function a that will download an IDR image and return it as a numpy array. It is part of a larger example (that is still not that polished) but the download function works and you can also see how to save it as .zarr. One of the more difficult parts is installing omero-py for anything other than python 3.6 due to dependencies


I just noticed that this image has more than 4000 planes. This will require a more sophisticated approach than my code snippet above, i.e. writing into a .zarr store plane by plane rather than building a complete numpy array that is subsequently saved as .zarr.

1 Like

bioformats2raw --file_type=zarr --dimension-order=XYZCT … will produce a multi-series zarr; each of those series is an ome-zarr. has some methods for writing the multi-scale images, but it’s not yet a full API. I don’t imagine you would have issues adding to it, @sebi06.

Agreed! Thanks for the link. I was away last week and am still in the process of catching up. (This has not helped of course…)

If there’s anything we can do to help, let us know. Conda is usually the easiest way, but there are a number of wheels available for various platforms like which speed things up tremendously. (Probably best to create a new thread if we want to go into that in more depth, though)

Nice! I’ll try to go through the code but I assume this is quite close to what is doing. There’s also work on downloading masks in that matches Multi-scale image labels v0.1 If there are you see things that need refactoring or integrating from one location to another, let us know.

The process is on plane 1300 now… :smile: I’m pretty sure that this will be an example where we should move to a different chunking. Very open to suggestions once it’s available for everyone to test it.


1 Like

@joshmoore thanks a lot!
@VolkerH useful bit of code, thanks for sharing :slight_smile:

And yes when testing on similar sized images I ended up rechunking I think, although that would depend on the end usage…

9846151 has now been uploaded. Give it a try. Suggestions for better chunkings welcome.

A post was split to a new topic: Installing omero-py on python 3.7+

Hi @VolkerH,
Have you checked the IDR download tool for Galaxy?
@joshmoore As this seems a recurring issue, it seems to me that it may be worth combining all these download tools into one “official” IDR data download tool.

@jkh1, no I was not aware of that tool, thanks for the pointer.

It’s taking over a minute to load the chunks, but gets there in the end:

NB: multi-Z/T support for the vizarr viewer is in progress to allow you to see all 4000 planes

Certainly no objections! , but there will be questions around the scope of that. Probably best to start outside this thread with a GH issue or another post if anyone wants to gather ideas on this.

Yeah, seeing similar behavior on just switching a single Z. @will-moore’s likely right that this is due to the flat structure producing many files in one directory.