Packaging screen data for transfer between different OMERO-instances?

Introduction:

We had some discussions on packaging screen data from OMERO, and transferring to another OMERO-server. Thanks to the OME-team, two solution for packaging were proposed:

  1. using downloader – GitHub - ome/omero-downloader: An OMERO client for downloading data in bulk from the server.
  2. using zarr-cli – omero-cli-zarr · PyPI

I have tried both options and your help would be appreciated on what would be best way to proceed. And it would be nice to have a open conversation on this topic here on the forum.

Tried so far:
The files downloaded using these options from omero-server and original-data can be found using link below (are not zipped for easy/partial access): https://surfdrive.surf.nl/files/index.php/s/vrpKkBGZmYBMyzN , and if prompts for password: it is “OMERO”.
Folder contains 3 folders, named accordingly:
Download_zarr, using zarr-cli option
Download_downloader, using downloader option
Download_original, using direct download from server as original format

Downloader option gives binary data (.ids files) and ome-xml files for each image. No other metadata (Screen, plate, well) could be downloaded using this option. It allows downloading entire screen, but the data is not organized as plate layout (and plate metadata is not provided either). In addition for this specific dataset (.ics/.ids file format) .ics files are not retrieved by downloader.

Zarr option can only be performed for each individual plate only, not entire screen. Nice thing about this option is that it already contains main part of plate, and well metadata (images are exported for each well in appropriate subfolders and attr data is included at each level). However the original format (and metadata) is lost, which is up to users to decide, whether that is something they are comfortable with. For most datasets it would only make sense if the data is analyzed/processed using/accessing directly data from OMERO-server. For data processed using original format outside of OMERO, (e.g. deconvolutions, segmentations, etc) the analysis pipeline might not be directly usable for data in zarr format and could result in reduced reproducibility of current analysis pipelines/workflows. Beside that, it would be nice to also have a look on how to maintain (as much as possible) the original metadata, since bioformats only extracts a portion of original metadata.

Both options are missing the screen metadata and attachments for the entire screen, these attachments (.txt/.xlsx files) include compound/RNAi libraries used for entire screen, as well as (minimal) metadata templates. Using python API, these data can be easily retrieved. For Zarr-CLI option, I have included some screen metadata (screen_meta.json) and included the files attached to screen in the corresponding folder. Of course, this can be extended depending on requirements for importing package back into OMERO. Something similar can be included for other options as well.

I think the major issues here would be the requirements for importing the data back into OMERO. If we could have requirements for importing screen-data from a folder-like structure back into OMERO, we can work towards a solution for extracting those data appropriately for packaging. I have made a quick overview of screen data structure using these two options (attached as .png file).

Questions:

  1. Is there a way to import these data back into OMERO, so we can visually inspect the data in OMERO after importing the package? Using the downloader option? or using zarr-cli option?
  2. What would be the best option to select? Keeping original format, zarr-format or something else (e.g. ome-tif or keeping both original and zarr)? (open discussion, and suggestions/thoughts are welcome)
  3. Is there any other known option/solutions for packaging that I have missed, or you would like to share?

Your help and/or suggestions would be appreciated.

Best,

Rohola

1 Like

Hi,
I wonder if something like this is useful: omero-web/util.py at b423a641b2d0ed50f8af29fdbd553911ff1e989e · ome/omero-web · GitHub

This is code that is used to download a zip of the original data for a bunch of files. It tries to preserve the directory structure that is in the OMERO managed repository, using the common parent directory of all the files you’re downloading. The idea for HCS formats is that this structure would allow re-import as HCS.

So you try could adapt this to download all the .ics and .ids files for your Plate, possibly creating a directory structure that represents the Plate layout (similar to what we’ve done for OME-Zarr Plate).

You’d need additional code to export attachments and also some way to re-create the Screens and Plates on re-import. Again, this wouldn’t be too hard in Python for your use case, but making this generally useful with all formats would take a bit more thought.

Will

If we add in @will-moore’s suggestion as a third branch to your nice mindmap:

Currently, none of the above. There is a ZarrReader being developed which will allow the re-import of the exported ome-zarr though.

I guess this is the real question, right? As we discussed separately, nothing that is in place right now will provide a full working solution. As a community, something will need to be developed.

@ahamacher is working on a similar issue (copying data exactly between OMEROs, including DB primary keys). You’ve mentioned having a “must-save-the-original-files-for-10-years” requirement. If that’s the case, then either a scripted approach working from the original files or an extension to downloader might be best.

At the moment, however, much of our focus is on defining the OME-NGFF container that will be able to hold “all of the above” precisely because each partial solution (downloader, zipping, etc.) at the moment defines an ad hoc format that doesn’t interoperate with other solutions. Additionally, having a single format works along with our outlined strategy of moving away from proprietary file formats. By converting the images to a supported format on transfer between OMEROs (assuming no dataloss), the maintenance burden will decrease over time rather than the situation we have at the moment.

All that being said, it’s also clear that (a.) you have a job if not several to get done (b.) what you can contribute to depends on your familiarity with the libraries, etc. (c.) no one person is likely to build all of it.

My suggestion would be to decide first if you want to work with:

  • (1) transformed data (more modern, more hands, but more effort)
  • (2) the original data (more immediate but also more ad hoc)

If 1, then:

  • (1.1) OME-TIFFs (will require supporting attachment imports/exports)
  • (1.2) OME-Zarrs (will require defining metadata specifications for screens, etc.)

If 2, then

  • (2.1) downloader (Java, but usable by others)
  • (2.2) scripts (Python and fairly straight-forward)

Both 2.1 and 2.2 will require the complementary uploader function.

I leave this as an open question to the community. Anyone else working on moving data from one OMERO to another?
~J.

cc: @dsudar @nitschro and probably several others.

I wish I could provide a solution but alas. Since @joshmoore mentioned me, let me say what I currently do: we collect our screen data in a very structured directory structure on the “primary” OMERO server in its local file system (that filesystem is accessible from the various (InCell, Nikon HCA, and ScanR) high-content microscopes via Samba). Any analysis results that we run on the SPW images is also deposited in a very structured way in that same directory structure (but no clobbering the original image files from the microscopes). On that “primary” OMERO server we import everything using inplace import to avoid duplication and also use scripts to “import” the analysis results such as segmentation masks, K-V pairs for metadata and results, and feature tables (as OMERO.tables) into that server. Then, when I need to “copy” all that to another OMERO server (e.g. our public-access server at: lincsclarion.org), I simply run the same image import and other stuff import scripts (NOT using inplace) to that other OMERO server. I hope that made some sense but I’m not sure it’s of any use for what you want to do.
Damir

Thanks for your suggestion, I will definitely have a look at this and try it as well. Do you know/aware of any examples-scripts for importing HCS datasets from directory structure? that would be really appreciated.

Hi Josh,

Thanks for your elaborated reply. The “must-save-the-original-files-for-10-years” requirements is something we have been discussing here and is based on interpretation of the regulations. I think the arguments you mentioned:

next to having many proprietary files formats and insufficient support for these files in 10 years from now, should be justifying, going for an
open-file format where possible.

Regarding the possible solutions

I think we need both solutions, short-term (scripts might be straightforward solution) and long-term (OME-zarrs). With the short-term solutions, it might also help us to define metadata requirements and specifications enabling long-term solutions.

I will try also @will-moore proposed solution and see how far I get. I can do some scripting but developing tools is beyond my skills. I will be happy to get more involved and help where possible.

1 Like

Hi Damir,
Thanks for sharing your experience and the way you organize your OMERO-instance. It a very interesting approach, which we have considered here as well using inplace imports, but did not continued because in our opinion it requires more maintenance and difficult to organize for a large number of users. Based on what you are describing, it does make it more flexible. So happy to see it works out fine with your setup. Maybe we should have a more elaborated chat on this issue (on different setups for OMERO), but then in other topic or a different setup.
Best,
Rohola

:+1:

Assuming we’d start with a short-term solution, it sounds like you are leaning toward 2.2?

It depends on how fast it is going with zarrReader development, I could not find anything about it yet. For current screen that is going to IDR, I can send a package including all plates in zarr format and metadata files accordingly, in similar way as example screen with 2 plates.

The @will-moore option, I get the code now and I am trying to add screen metadata to it plus adding the files as folder structure similar to zarr-format.

For both options I can not test the “re-import” back to another OMERO-instance. any idea’s on this?

yes. I am freeing up some more time to work on it.

There is some code at import.py · master · openmicroscopy / incubator / omero-python-importer · GitLab that you might find useful.

That code should allow you to do:

# will get asked to login if not already logged-in
$ python import.py path/to/directory

And the files in that directory will be uploaded to create a new FileSet in OMERO, then that FileSet will be imported.
NB: this doesn’t use BioFormats, which is used on most OMERO imports to decide which files go together in a FileSet. So you should only include the correct files in each FileSet, (any additional files will be ignored). E.g. an .ics and a .ids might make a FileSet?

A short-term solution for re-importing zarr (in the absence of a ZarrReader) might be to read each zarr as a numpy array and use conn.createImageFromNumpySeq() to create each image in the Plate. Although this would “duplicate” your data, creating pixels data in OMERO instead of using the Zarr data.