Same image file in multiple OMERO groups

Hi OME team,
A few scientific projects at OHSU want to use the exact same images each for their own purposes but they do NOT want to share annotations and whatever else gets attached to those images (due to PHI concerns). However, ideally they would like NOT to have to create multiple copies of the large image files. So is there a way to create separate Image objects (each in their own Group with their own annotations, etc.) but have those Image objects connected to the same file in the ManagedRepository? I’m imagining of some kind of hard link situation in the ManagedRepository so that if one of the Image objects is deleted, the file remains on disk without trouble.
Thanks,
Damir

Damir,
I am not a part of the OME team, but I think I know the answer. You want to use “in-place” importing.

I recommend the soft-linking strategy, which we use a ton. But hard linking would work just fine if you aren’t linking across file systems.

1 Like

Hi Dave,
Thanks, yes, I had been thinking about that option but wasn’t entirely sure how to implement the entire workflow. So I guess I would do a regular import first so the image is in the ManagedRepository and then do an inplace import with hard links from that original import to get the second copy into the 2nd Group and so on. Figuring out all the specifics of that step 2 isn’t obvious.

I did meanwhile find the omero-cli-duplicate plugin and will see if I can subvert that to do what I need.
Thanks,
Damir

In general, I think the “reimport in-place” workflow has some merit, and I think I’ve suggested it elsewhere on image.sc. There aren’t however any formalizations of that yet.

However before he left, @mtbc put significant work into making omero duplicate also handle binary data. See Release of OMERO.server 5.6.3 for more information. I do think it should do what you want.

~Josh

@dsudar: Additionally to what @joshmoore wrote: If you use the omero duplicate it will allow you to duplicate the images with or without annotations as you wish. Also, the omero duplicate attempts to create a hard link in ManagedRepo instead of doubling the necessary storage whenever possible as long as the images are concerned. Of course, after the duplication, you still have to move the images into the desired group.

Best
Petr

1 Like

Hi Josh and Petr,
Yes, it indeed looks like omero-cli-duplicate will already do exactly what I need. I just found out about it last night. One quick new feature request after reading the docs in omero-guides: can there be an option to provide the destination Group for the duplicate as part of the duplicate process?
Thanks,
Damir

Hi @dsudar,

“PRs welcome”? :smile: More seriously, at the CLI level, I’m hesitant to have each command learn how to run other tasks (chgrp, chmod) since they can be chained together fairly easily. That being said, the duplicate command’s output is currently not ideal:

$ omero duplicate Fileset:123
omero.cmd.Duplicate Fileset:123 ok

You can pass --report and capture the value you’re interested in:

$ DUPE=$(omero duplicate Fileset:123 --report | grep "  Fileset:")

but it’s not ideal. I opened https://github.com/ome/omero-cli-duplicate/pull/18 to allow the likes of:

$ DUPE=$(omero duplicate Fileset:123)
omero chgrp "New Group" $DUPE

when --report is not passed. We will need to review all of the CLI commands to make sure that “Class:ID on stdout” is a standard contract like it is with omero obj, omero import, etc.

In the web, it’s a different story since piping isn’t possible. We began scoping this work, but there were some concerns about the overall scaling of these long-running tasks in the web, which is something we need to consider first.

All the best,
~Josh

2 Likes

Hi Josh,

Yes, coming from a Unix background, I completely agree with that sentiment.

But the temporary workaround until your PR makes it through is perfectly fine for us.

And yes, I see that providing this functionality on web is quite a non-trivial thing.

Thanks,
Damir

2 Likes

Hello,

Looks like @dsudar and I have similar workflows :slight_smile:
I like the OMERO.duplicate tool and see a lot of potentials here. Did you consider adding server URL as a param to duplicate the data between servers? If so, could you point me to the source code so I can try maybe to open a PR?

Thanks
Ola

2 Likes

Hi @olatarkowska,

The client-code is in omero-cli-duplicate but it’s currently an almost completely server-side command in omero-blitz: https://github.com/ome/omero-blitz/blob/5d27e4771af5c4c457464090edb5b450d1c11e1d/src/main/java/omero/cmd/graphs/DuplicateI.java

I think the first question here will be: are you looking for a push or a pull model. i.e. which server are you giving credentials for another?

The easiest, I think, as things stand would be to have a RemoteDuplicateI class (possibly a subclass of DuplicateI) which has remote versions of all the individual methods in https://github.com/ome/omero-blitz/blob/5d27e4771af5c4c457464090edb5b450d1c11e1d/src/main/java/omero/cmd/graphs/DuplicateI.java#L1034. I don’t know to what extent though the multiple transactions that are involved will cause issues.

However, I don’t know if this is the optimal solution in general. Two other options:

  • https://github.com/ome/omero-downloader uses a different strategy for loading collections of metadata & data. Combined with an “omero-uploader” this could provide a server-to-server streaming capability.
  • Work on the ZarrReader is underway. Combined with omero-ms-zarr, this should be a fairly straight-forward way to move data if the original files are not necessary. (Conceivably, the data could even stay on the original server.)

~Josh

cc: @mtbc in case he had any thoughts before changing careers
cc: @ahamacher who’s interested in something similar (e.g. Transfer a project / dataset between OMERO instances)

I’m afraid that [trello] duplicate to another server is probably about where my thoughts got to. For the original thought, the main issues I see are just that the DuplicateI code is using a Hibernate session to write to, so that’d somehow have to happen on the destination server, not sure how well the update service API could substitute. Also, the managed-repository-aware parts will be a difficulty, so that the files can get written across in an appropriate place. There was much unavoidable hairiness in the implementation, altogether it’s probably the most complex thing I created at OME.

How much help omero-downloader can be depends much on what needs transferred. For example, if there are millions of ROIs, its current approach via locally stored OME-XML may not be the way to go. It’s largely just reusing the metadata store/retrieve code from OMERO.server.

At least both codebases do have some associated developer documentation!

2 Likes