Same image file in multiple OMERO groups

Hi OME team,
A few scientific projects at OHSU want to use the exact same images each for their own purposes but they do NOT want to share annotations and whatever else gets attached to those images (due to PHI concerns). However, ideally they would like NOT to have to create multiple copies of the large image files. So is there a way to create separate Image objects (each in their own Group with their own annotations, etc.) but have those Image objects connected to the same file in the ManagedRepository? I’m imagining of some kind of hard link situation in the ManagedRepository so that if one of the Image objects is deleted, the file remains on disk without trouble.
Thanks,
Damir

1 Like

Damir,
I am not a part of the OME team, but I think I know the answer. You want to use “in-place” importing.

I recommend the soft-linking strategy, which we use a ton. But hard linking would work just fine if you aren’t linking across file systems.

1 Like

Hi Dave,
Thanks, yes, I had been thinking about that option but wasn’t entirely sure how to implement the entire workflow. So I guess I would do a regular import first so the image is in the ManagedRepository and then do an inplace import with hard links from that original import to get the second copy into the 2nd Group and so on. Figuring out all the specifics of that step 2 isn’t obvious.

I did meanwhile find the omero-cli-duplicate plugin and will see if I can subvert that to do what I need.
Thanks,
Damir

In general, I think the “reimport in-place” workflow has some merit, and I think I’ve suggested it elsewhere on image.sc. There aren’t however any formalizations of that yet.

However before he left, @mtbc put significant work into making omero duplicate also handle binary data. See Release of OMERO.server 5.6.3 for more information. I do think it should do what you want.

~Josh

1 Like

@dsudar: Additionally to what @joshmoore wrote: If you use the omero duplicate it will allow you to duplicate the images with or without annotations as you wish. Also, the omero duplicate attempts to create a hard link in ManagedRepo instead of doubling the necessary storage whenever possible as long as the images are concerned. Of course, after the duplication, you still have to move the images into the desired group.

Best
Petr

1 Like

Hi Josh and Petr,
Yes, it indeed looks like omero-cli-duplicate will already do exactly what I need. I just found out about it last night. One quick new feature request after reading the docs in omero-guides: can there be an option to provide the destination Group for the duplicate as part of the duplicate process?
Thanks,
Damir

Hi @dsudar,

“PRs welcome”? :smile: More seriously, at the CLI level, I’m hesitant to have each command learn how to run other tasks (chgrp, chmod) since they can be chained together fairly easily. That being said, the duplicate command’s output is currently not ideal:

$ omero duplicate Fileset:123
omero.cmd.Duplicate Fileset:123 ok

You can pass --report and capture the value you’re interested in:

$ DUPE=$(omero duplicate Fileset:123 --report | grep "  Fileset:")

but it’s not ideal. I opened https://github.com/ome/omero-cli-duplicate/pull/18 to allow the likes of:

$ DUPE=$(omero duplicate Fileset:123)
omero chgrp "New Group" $DUPE

when --report is not passed. We will need to review all of the CLI commands to make sure that “Class:ID on stdout” is a standard contract like it is with omero obj, omero import, etc.

In the web, it’s a different story since piping isn’t possible. We began scoping this work, but there were some concerns about the overall scaling of these long-running tasks in the web, which is something we need to consider first.

All the best,
~Josh

3 Likes

Hi Josh,

Yes, coming from a Unix background, I completely agree with that sentiment.

But the temporary workaround until your PR makes it through is perfectly fine for us.

And yes, I see that providing this functionality on web is quite a non-trivial thing.

Thanks,
Damir

2 Likes

Hello,

Looks like @dsudar and I have similar workflows :slight_smile:
I like the OMERO.duplicate tool and see a lot of potentials here. Did you consider adding server URL as a param to duplicate the data between servers? If so, could you point me to the source code so I can try maybe to open a PR?

Thanks
Ola

2 Likes

Hi @olatarkowska,

The client-code is in omero-cli-duplicate but it’s currently an almost completely server-side command in omero-blitz: https://github.com/ome/omero-blitz/blob/5d27e4771af5c4c457464090edb5b450d1c11e1d/src/main/java/omero/cmd/graphs/DuplicateI.java

I think the first question here will be: are you looking for a push or a pull model. i.e. which server are you giving credentials for another?

The easiest, I think, as things stand would be to have a RemoteDuplicateI class (possibly a subclass of DuplicateI) which has remote versions of all the individual methods in https://github.com/ome/omero-blitz/blob/5d27e4771af5c4c457464090edb5b450d1c11e1d/src/main/java/omero/cmd/graphs/DuplicateI.java#L1034. I don’t know to what extent though the multiple transactions that are involved will cause issues.

However, I don’t know if this is the optimal solution in general. Two other options:

  • https://github.com/ome/omero-downloader uses a different strategy for loading collections of metadata & data. Combined with an “omero-uploader” this could provide a server-to-server streaming capability.
  • Work on the ZarrReader is underway. Combined with omero-ms-zarr, this should be a fairly straight-forward way to move data if the original files are not necessary. (Conceivably, the data could even stay on the original server.)

~Josh

cc: @mtbc in case he had any thoughts before changing careers
cc: @ahamacher who’s interested in something similar (e.g. Transfer a project / dataset between OMERO instances)

1 Like

I’m afraid that [trello] duplicate to another server is probably about where my thoughts got to. For the original thought, the main issues I see are just that the DuplicateI code is using a Hibernate session to write to, so that’d somehow have to happen on the destination server, not sure how well the update service API could substitute. Also, the managed-repository-aware parts will be a difficulty, so that the files can get written across in an appropriate place. There was much unavoidable hairiness in the implementation, altogether it’s probably the most complex thing I created at OME.

How much help omero-downloader can be depends much on what needs transferred. For example, if there are millions of ROIs, its current approach via locally stored OME-XML may not be the way to go. It’s largely just reusing the metadata store/retrieve code from OMERO.server.

At least both codebases do have some associated developer documentation!

2 Likes

A bit delayed, but is there actually a server-side-script available at the OME team or somewhere else, that is calling omero-cli-duplicate via the web interface? Until now we use the duplicate function on command line, but we have meanwhile more users requesting that feature who need to have some kind of web GUI to duplicate data across groups. I was wondering if this cannot be actually easily implemented as a server-side-script to be available basically for all omero users.

Regards, Anna

Hi @ahamacher,

It doesn’t use the CLI (though that’s possible) but this is a rough outline of invoking omero.cmd.Duplicate via a script:

#!/usr/bin/env python

import omero.scripts as scripts
from omero.gateway import BlitzGateway
from omero.cmd import Duplicate
from omero.rtypes import rstring, rlist, rlong, robject
from collections import defaultdict


def duplicate(conn, script_params):
    dtype = script_params['Data_Type']
    ids = script_params['IDs']
    targets = defaultdict(list)
    for obj in conn.getObjects(dtype, ids):
        targets[dtype].append(obj.id)

    cmd = Duplicate()
    cmd.targetObjects = targets
    cb = conn.c.submit(cmd)
    return cb


def run_script():
    """The main entry point of the script."""
    data_types = [rstring('Image'), rstring('Dataset')]

    client = scripts.client(
        'dupe.py',
        """
Duplicate objects using omero.cmd.Duplicate
    """,

        scripts.String(
            "Data_Type", optional=False, grouping="1",
            description="The data you want to work with.", values=data_types,
            default="Image"),

        scripts.List(
            "IDs", optional=False, grouping="2",
            description="List of Image or Dataset IDs").ofType(rlong(0)),

        version="0.1.0",
        authors=["OME Team"],
        institutions=["University of Dundee"],
        contact="ome-users@lists.openmicroscopy.org.uk",
    )

    try:
        conn = BlitzGateway(client_obj=client)
        script_params = client.getInputs(unwrap=True)
        cb = duplicate(conn, script_params)

        objects = []
        for dtype, ids in cb.getResponse().duplicates.items():
            if "." in dtype:
                dtype = dtype[dtype.rindex(".")+1:]
            objects.extend(conn.getObjects(dtype, ids))

        message = ""
        if len(objects) == 0:
            message = ("Found no %ss with IDs: %s" % (script_params['Data_Type'], script_params['IDs']))
        else:
            client.setOutput("Target", rlist([robject(obj._obj) for obj in objects]))
        client.setOutput("Message", rstring(message))

    finally:
        client.closeSession()


if __name__ == "__main__":
    run_script()

~Josh

2 Likes

@joshmoore : You’re response time is amazing! Even on Friday afternoons! The script works like a charm for Images. Unfortunately the script throws following message every time, when it is executed:

Traceback (most recent call last):
  File "./script", line 71, in <module>
    run_script()
  File "./script", line 57, in run_script
    objects.extend(conn.getObjects(dtype, ids))
  File "/OMERO/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3307, in getObjects
    obj_type, ids, params, attributes, opts)
  File "/OMERO/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3350, in buildQuery
    "E.g. use 'Image' etc" % obj_type)
KeyError: "obj_type of ObjectiveSettings not supported by getOjbects(). E.g. use 'Image' etc"

In addition the duplicate command does not work for Datasets (same message, but basically nothing is created). I will look into this next week in more detail, but if you have a hint straight away, I’m happy to hear it. And just to be sure - using omero.cmd.Duplicate does create only a hardlink or does it replicate the underlying data?

Thank you very much and have a wonderful weekend
Anna

:heart: It was a nice distraction to do some coding!

For what it’s worth, I only tested on Datasets.

Interesting typo…

You updated the script for “ObjectiveSettings”? If that’s what you are trying to do, you’ll likely need to replace the use of conn.getObject() with lower level methods (conn.c.sf.getQueryService()) or not worry about looking up the objects.

It tries very hard to hardlink, yes.

You’re welcome, and ditto.
~J.

Hi Josh,

I had now some more time to look into the topic:

No, I actually didn’t change anything. I tried to understand what this parameter “ObjectiveSettings” should be, but I have no clue to be honest. Doing some more tests today, I realized that this error appears every time I call the script, not only for Dataset but also for Images.

The good news is, that duplicating of a dataset worked several times today. I have no clue why the duplication didn’t work on Friday, but I’m really happy that it does now.

So for me there is only one open question left: how can I get rid of the error message with the “ObjectiveSettings” without changing the functionality of the script? Any hints are much appreciated - as usual :slight_smile:

Thanks, Anna

:+1:

wow. I’m kinda stumped. I have no idea where is coming from. You’re launching the script from the UI, right? Does that show “ObjectiveSettings” anywhere? What happens if you launch from the CLI? e.g.

omero script launch /duplicate.py Data_Type=Dataset IDs=1390

where /duplicate.py is the path of your scripts under lib/scripts.

~Josh

Yes, I start the script via the OMERO.web UI. Unfortunately I cannot see this parameter anywhere. I was already thinking if this might be file related and tried different file formats and source microscopes. Interestingly with an EM image I got another error referring to the same source code but to a different parameter:

Traceback (most recent call last):
  File "./script", line 71, in <module>
    run_script()
  File "./script", line 57, in run_script
    objects.extend(conn.getObjects(dtype, ids))
  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3307, in getObjects
    obj_type, ids, params, attributes, opts)
  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3350, in buildQuery
    "E.g. use 'Image' etc" % obj_type)
KeyError: "obj_type of QuantumDef not supported by getOjbects(). E.g. use 'Image' etc"

And executing it on command line as suggested by you, I got a even different one, again referring to the same source code:

Job 9239 ready
Waiting....
Callback received: FINISHED

        *** start stderr (id=9204)***
        * b'Traceback (most recent call last):\n  File "./script", line 71, in <module>\n    run_script()\n  File "./script", line 57, in run_script\n    objects.extend(conn.getObjects(dtype, ids))\n  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3307, in getObjects\n    obj_type, ids, params, attributes, opts)\n  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 3350, in buildQuery\n    "E.g. use \'Image\' etc" % obj_type)\nKeyError: "obj_type of FilesetJobLink not supported by getOjbects(). E.g. use \'Image\' etc"\n'
        *** end stderr ***


        *** out parameters ***
        ***  done ***

It is really strange and in both cases the data is duplicated… If I can provide you any more details, please let me know.

Anna

Ah, ok. I see. The problem is:

            objects.extend(conn.getObjects(dtype, ids))

it’s listing every object that got duplicated. You will need to simplify it to choose only the types of objects (e.g. just images) you want to see in the response (or remove this part completely).

~Josh

Thanks Josh. I removed that part and modified the output message, so that in case of success only the new image and dataset IDs are posted in the Activities log. It is probably now not the most elegant code, but will make many people happy in our facility.

Still I have to bother you with one more thing. In some cases the script runs into a timeout (even without my modifications):

Traceback (most recent call last):
  File "./script", line 76, in <module>
    run_script()
  File "./script", line 51, in run_script
    cb = duplicate(conn, script_params)
  File "./script", line 19, in duplicate
    cb = conn.c.submit(cmd)
  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/clients.py", line 1001, in submit
    closehandle=True)
  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/clients.py", line 1020, in waitOnCmd
    callback.loop(loops, ms)  # Throw LockTimeout
  File "/OMERO.venv/server_venv3/lib64/python3.6/site-packages/omero/callbacks.py", line 260, in loop
    5000, int(waited))
omero.LockTimeout: exception ::omero::LockTimeout
{
    serverStackTrace = None
    serverExceptionClass = None
    message = Command unfinished after 5.0 seconds
    backOff = 5000
    seconds = 5
}

Is there an easy way to increase the waiting time?

Thanks, Anna