Converting PE Opera Phenix tiff exports to zarr


I’m generating ~250-500GB/day using our Opera Phenix. Due to the PE/Harmony ecosystem, we’re saving all data as 'Harmony Archive’s, essentially an sqlite dump of the Harmony database, which I then have to option of converting to TIFFs with human-readable filenames of the form r02c01f04p05-ch3sk1fk1fl1.tiff but looping through the sqlite and xml data.

Even for small-medium sized datasets, this results in 10s-100s of thousands of images in the same directory. I hate this, the filesystem hates this, and anyone who receives this data hates this. I was reading about zarr and using was wondering if anyone has stored data of this type as a (nested) directory store with data chunked for each well (row, column). This would make data archival and transfers much more logical and we could also benefit from the ecosystem of great tools that support zarr. Any alternatives, better formats, considerations?


I’ve tried searching around the forums for related topics, but couldn’t find a topic directly related; feel free to point me in other directions as I’m a noob in this field.

Hi Bill,

You might be interested in the various posts under the #ome-ngff tag to read about our work to define an OME format using Zarr. Any formats that can be read by Bio-Formats can be converted to this OME-Zarr format using GitHub - glencoesoftware/bioformats2raw: Bio-Formats image file format to raw format converter, but that won’t cover the database export.

Happy to discuss more.

