Automated import strategies for omero

Continuing the discussion from Automated uploader to OMERO in python:

As @will-moore suggested, I continue the discussion in a new topic, with some more context.

The problem

I’m tasked to setup omero on a rather large perimeter (multiple institutes), where people decided before not to use a DB for their microscopy data. To drive adoption, I’d like to make the import of data as seamless as possible, while trying to gather as much metadata as possible

The solution I want to implement

In the institutes, the data are stored on a windows network file system (samba), usually with a research group granularity.

An experimental form filling strategy has already been tested here to help researcher setup tags / labels for their microscopy data. I plan on using this to generate a per-user configuration file that would also sit on the network drive (maybe on a per project basis?).

I want to task a bot / daemon / cron job to parse these disks, and automatically import the data in the DB. Note that I don’t want to treat those drives as DrobBoxes, but actually import the images into a ManagedRepository.

My main concern is I want the import to be robust with regard to nested sub-directories on the source drive, maybe by translating anything lower than the group / user / project / dataset hierarchy to tags (?)

I’m quite proficient in python - less so in bash though I can read it - and even less in java :slight_smile:

Are there similar usage / code snippets out there? Pointers?

Thanks a lot!

Guillaume

(As an intermezzo I’ll shout out to @ehrenfeu by pointing to his 2018 talk and https://github.com/imcf/auto-tx)

1 Like

Now this looks shiny :star_struck: !!

A slide says coupling with OMERO was in the pipeline?

Just a couple of thoughts / questions:

The python import code I mentioned on the previous discussion doesn’t use Bio-Formats on the client. This is a major limitation and means the code has to know which files to go together to make a FileSet.

So, if you’re using Python, you’re probably going to want to do the equivalent of omero import on the command line. The $ omero import CLI command is implemented at https://github.com/ome/omero-py/blob/f44d09e7c39d068a71a191ac979801dfed976705/src/omero/plugins/import.py
This means you’ll need the OMERO Java client jars available as described at https://docs.openmicroscopy.org/omero/5.6.1/users/cli/installation.html

You can call the cli via python like this: https://github.com/ome/omero-py/blob/f44d09e7c39d068a71a191ac979801dfed976705/src/omero/util/import_candidates.py#L40

I wonder how you might handle any import errors. e.g. how to let the user know something failed to import?

Are you typically importing large Filesets such as HCS data? Or single files?

Will.

So these are not finished products nor they do exactly what you want, but between my previous job and my current one we’ve done all the individual pieces of what you want:

  1. scripts that run in cron jobs to automate importing: we used the cli importer (from Python scripts) from a Linux machine that just went through a bunch of smb shares, imported things on them and marked them as “imported” upon success by moving files to an “imported” folder in the same share that replicated folder structure. https://github.com/erickmartins/omero_autoimport

  2. we regularly parse spreadsheets into k-v pairs to be added to imported data. our import utils are at https://github.com/TheJacksonLaboratory/jax-omeroutils - we’ve been writing a lot of convenience functions around omero-py for our own work, so feel free to make use of them!

3 Likes

Good question … I assume parsing the logs and sending emails? With @erickratamero’s solution I imagine files that fail to import are not moved to the ‘imported’ folder?

The instrument park is very diverse but I am not aware of any HCS experiments (at least for now), but we have light-sheet microscopes and whole slides / histopathology, so yes to big files / fileset. I’m still prototyping so won’t run into performance issues just yet, if this is why you ask? Using python can imply a performance overhead?

Thanks a lot! That looks absolutely like what I need to jump start my project!

3 Likes

Yeah, that’s how we kept track of failures - of course it’s not ideal and with more time I’d have implemented log parsing and emailing (upon success or failure), but imports failing were relatively uncommon and easy to flag. I was also piping all import output to a log file daily, so after imports happened overnight I could come in the morning, have a quick glance over the log and see if anything looked out of place.