We had some discussions on packaging screen data from OMERO, and transferring to another OMERO-server. Thanks to the OME-team, two solution for packaging were proposed:
- using downloader – GitHub - ome/omero-downloader: An OMERO client for downloading data in bulk from the server.
- using zarr-cli – omero-cli-zarr · PyPI
I have tried both options and your help would be appreciated on what would be best way to proceed. And it would be nice to have a open conversation on this topic here on the forum.
Tried so far:
The files downloaded using these options from omero-server and original-data can be found using link below (are not zipped for easy/partial access): https://surfdrive.surf.nl/files/index.php/s/vrpKkBGZmYBMyzN , and if prompts for password: it is “OMERO”.
Folder contains 3 folders, named accordingly:
Download_zarr, using zarr-cli option
Download_downloader, using downloader option
Download_original, using direct download from server as original format
Downloader option gives binary data (.ids files) and ome-xml files for each image. No other metadata (Screen, plate, well) could be downloaded using this option. It allows downloading entire screen, but the data is not organized as plate layout (and plate metadata is not provided either). In addition for this specific dataset (.ics/.ids file format) .ics files are not retrieved by downloader.
Zarr option can only be performed for each individual plate only, not entire screen. Nice thing about this option is that it already contains main part of plate, and well metadata (images are exported for each well in appropriate subfolders and attr data is included at each level). However the original format (and metadata) is lost, which is up to users to decide, whether that is something they are comfortable with. For most datasets it would only make sense if the data is analyzed/processed using/accessing directly data from OMERO-server. For data processed using original format outside of OMERO, (e.g. deconvolutions, segmentations, etc) the analysis pipeline might not be directly usable for data in zarr format and could result in reduced reproducibility of current analysis pipelines/workflows. Beside that, it would be nice to also have a look on how to maintain (as much as possible) the original metadata, since bioformats only extracts a portion of original metadata.
Both options are missing the screen metadata and attachments for the entire screen, these attachments (.txt/.xlsx files) include compound/RNAi libraries used for entire screen, as well as (minimal) metadata templates. Using python API, these data can be easily retrieved. For Zarr-CLI option, I have included some screen metadata (screen_meta.json) and included the files attached to screen in the corresponding folder. Of course, this can be extended depending on requirements for importing package back into OMERO. Something similar can be included for other options as well.
I think the major issues here would be the requirements for importing the data back into OMERO. If we could have requirements for importing screen-data from a folder-like structure back into OMERO, we can work towards a solution for extracting those data appropriately for packaging. I have made a quick overview of screen data structure using these two options (attached as .png file).
- Is there a way to import these data back into OMERO, so we can visually inspect the data in OMERO after importing the package? Using the downloader option? or using zarr-cli option?
- What would be the best option to select? Keeping original format, zarr-format or something else (e.g. ome-tif or keeping both original and zarr)? (open discussion, and suggestions/thoughts are welcome)
- Is there any other known option/solutions for packaging that I have missed, or you would like to share?
Your help and/or suggestions would be appreciated.