I’m generating ~250-500GB/day using our Opera Phenix. Due to the PE/Harmony ecosystem, we’re saving all data as 'Harmony Archive’s, essentially an sqlite dump of the Harmony database, which I then have to option of converting to TIFFs with human-readable filenames of the form
r02c01f04p05-ch3sk1fk1fl1.tiff but looping through the sqlite and xml data.
Even for small-medium sized datasets, this results in 10s-100s of thousands of images in the same directory. I hate this, the filesystem hates this, and anyone who receives this data hates this. I was reading about zarr and using was wondering if anyone has stored data of this type as a (nested) directory store with data chunked for each well (row, column). This would make data archival and transfers much more logical and we could also benefit from the ecosystem of great tools that support zarr. Any alternatives, better formats, considerations?
I’ve tried searching around the forums for related topics, but couldn’t find a topic directly related; feel free to point me in other directions as I’m a noob in this field.