# Setting up OMERO in a new lab - suggestions, stories and advice welcome

I’d say keeping everything on this thread at the moment could make for good reading for someone else trying to do the same thing. I’ve added the #correlative tag so hopefully others can find it. (“correlative-microscopy” is too long. Other suggestions?)

At the moment, OMERO doesn’t provide anything out of the box, so someone else will probably have to say if there’s a better choice out there. That being said, we’d certainly look forward to both capturing what it is that you’d like to see OMERO look like with your data, as well as find workarounds that would bridge the gap for you and others until there is a complete solution.

~Josh

2 Likes

I think since then the biggest thing for us was finding a way to get data automatically from the microscope computers - it lowered the entry barrier and solved our biggest problem at the time (adoption). Now we have a core of heavy users who love it and a large amount of people who use it because they don’t need to worry about data transfer, external storage or anything else.

1 Like

that is very interesting!

What were the challenges there?
How would a workflow then be for a user, from acquisition to analysis?

Cheers,
Christopher

There were a few different challenges - we’re running multiple operating systems on our microscopes, with computer that range from brand-new to, well, relatively old - there’s everything from Windows XP and CentOS 5 to up-to-date Windows 10. We’ve looked at adapting existing tools like AutoTx or the existing CLI importer, but we had nothing that would work in every single system. We settled for having an extra Ubuntu 18.04 machine that mounts data directories on the microscope computers as SMB shares inside a local network and runs the CLI importer for individual users. It’s extremely improvised and it breaks more often than I’d like, but 90% of the time it works every time.

For the user, they just need to save their data onto their folder inside a specified “Data” directory in the microscope computer - overnight, the data is imported to their user account in OMERO and (upon a successful import) moved to an “imported” directory on the microscope computer (in case they want a local copy). From there, they just need to access the OMERO server (using their university credentials) in any way they prefer.

4 Likes

Thank you!

Performing an automated import is for sure the best way to ensure proper usage.

I just wonder how useful it is to import everything directly from the acquisition.
I mean one needs to have some metadata from the experiment in order to make use of the data. Also not all images are of useful quality… So I would be afraid of drowning in useless stuff, either by missing information or by useless quality.

Do you have any feedback from the groups? How actively are they using the OMERO server or is this just a data graveyard?

I don’t think the “data graveyard” issue is any worse than it was before OMERO - people who aren’t doing much with their data now are the people who would just have transferred all their acquired data to a storage server somewhere and not do much with it anyway. We’ve found that, even for the “data graveyard” users, they’re still coming back to OMERO to prepare figures for talks/publications. In addition to that, we have a good core of heavy users who actually care about their data management, and for them life is now much easier than it was. Sharing, annotating and publishing take a fraction of the time it used to. Of course, there’s no way to force anyone to use anything, so we still have users who will just acquire data, save it outside the specified Data folder and do whatever they used to anyway.

In general, I don’t think life is worse than it used to be for any kind of user, and it’s definitely much better for a certain subset. I’ll consider that a success!

2 Likes

Cool! Thanks a lot for the very nice discussion I would love to push such an image database at some point at my place too. Have to prepare the ground first though.

1 Like

Thanks much for the information! How do you handle populating metadata? I am thinking about more study-specific data and not that which is auto-populated by the imaging software.

Many good comments, thanks for sharing everyone! We run Omero on Centos7 in AWS using the dual server and webserver configuration. I struggled to install it, but it was only mildly difficult for the capable sysadmin person I have access to. It’s been super stable and big timesaver. Our images are often a few Gb in size and viewing the images is fast and powerful. I wish I had tried Omero sooner. The Figure tool alone is worth the price of admission. Promote whoever designed that immediately.

Arborvitae’s question about study metadata was a big one for us. For plate based work there are built-in annotation tools, but bulk uploading annotations for a “Dataset” thru the webclient is not currently possible. It is easy to annotate manually using the UI, but that is not sustainable if there are many images and/or many annotations. I tried using tables, but it turns out that the Parade tool for searching annotations can’t handle a mix of numbers and strings (numbers only, again future versions may include this function) so I switched to key-value pairs and figured out how to use the command line interface (CLI). The CLI is bundled with the server and is very useful.

This is an example of how to make that work.

Prerequisites:

• Omero and the metadata module. The metadata module is not the same as the Populate Metadata function and which can import metadata for a plate or a screen (many plates), but not a Dataset at this time.

• Images that have been uploaded to Omero. Many ways to do this; we use the InSight client.

• a .csv with the annotations. If using excel save as a “Comma Separated Values (.csv)”. Other formats might not work and always use a text editor that plays nice with the unix kids in the sandbox. Image Name and Dataset Name are required. Use the image name given by omero and the dataset created during image upload. This is a screenshot of what that might look like in excel:

• a .yml file to tell the metadata module what to do with the annotations. Shown is a screenshot in TextEdit that I adapted from a dataset in the IDR (website mentioned earlier worth investigating). Add as many columns as necessary, but make sure the name matches the column name (exactly) in the .csv file. Everything after the last “include: true” is required, but not doing anything in this example:

Both the .csv and .yml files need to be in a place that the CLI can access. It’s easy to upload files through the webclient or InSight tool. I struggled to link them to CLI commands. I use sftp to move the files to the omero server from my laptop and I make sure they are owned by the omero user.

Then, finally, these commands can be run on the server using the CLI (as the omero user with admin privileges):

[omero@ip home]$~/OMERO.server/bin/omero metadata populate Dataset:382 --cfg omero_test-bulkmap-config.yml --file omero_test_annotations.csv [omero@ip home]$ ~/OMERO.server/bin/omero metadata populate --context bulkmap --cfg omero_test-bulkmap-config.yml Dataset:382

and then, if all goes well, the images are annotated with the key value pairs from the .csv:

now the annotations can be used to search for images with the Parade tool (another module that needs to be installed). Here the image matching two key value filters is shown in the middle panel:

if the annotations need to be updated this command will delete them and one can start over with the above commands:

~/OMERO.server/bin/omero delete Dataset/MapAnnotation:382

7 Likes

At the moment we don’t - we leave that up to the user. Now seeing the work @johnmc has done we might actually give it a shot!

1 Like

There are some nice OMERO scripts for users to add Key-Value pairs from CSV files by @evenhuis at

with user instructions at

4 Likes

This could be very handy, thanks for sharing!

Hi,

just a couple of points from our experience with OMERO at a ~100 user facility.

## Browsing and sharing

Omero is fantastic for researchers to be able to easily browse, organise and do basic adjustments through their images from a range of microscopes. This is great for correlative imaging where you need to compare images from a range of microscopes.
Make sure you in install OMERO.figure. This is great, intuitive way to prepare publication quality figure in seconds and makes collaborating easy.

Integrating OMERO with Fiji (or setting up OMERO.grid) is great for automating analysis and avoiding having multiple copies of files scattered over computers.

The browsing capability and friction-free figure creation makes it an easy sell to researchers who are reluctant give up their USB drives.

## File hierarchy, metadata and structuring

The two levels of hierarchy (projects/datasets and screens/plates) makes it challenging to handle data. Everyone is used to organising their files into a hierarchy, the median directory depth for users on our network drive is 8 deep. To transfer this to OMERO there is an exponential increase is the number of items at the project / dataset / images level which makes organisation very difficult.

The best solution is to add metadata so sets of images can be pulled up by searched. As @will-moore mentioned, there’s a couple of scripts that I wrote for this that allow users to add key-value metadata from OMERO.web. Convincing users to curate metadata is a struggle though.

In terms of organisation, we’ve been approaching it like a key-value store / NoSQL-style database rather than traditional relational database with a schema. The reason for this is getting researchers to agree on a schema is impossible and the more detailed the schema the lower the compliance.

We have been rather permissive and encourage research groups to come up with their organisation conventions, basically come up with a set of keys that describe your research.

I plan to writing some scripts to examine the keys from data in group to

• compare them against a list of agreed keys to check for typos
• check for consistency and emergence of new keys
• identify users that need assistance adding metadata to their image or deleting them

There is spectrum of how to use OMERO ranging between:

• Omero stores every image captured
• Omero is a showcase for final publication images

I would recommend sitting closer to the showcase end for a couple of reasons: the majority of images captured won’t get used, and the as Omero gets bigger the IT expertise needed increases rapidly.

### 1. The Drake equation for images

A agree that with @erickratamero that images follow something an equation like the one estimate the number planets suitable for life. The relationship between the the number of images captured in a facility and the number in publications might look something like:

N_{paper} = N_{capured} \cdot f_{suitable} \cdot f_{organised} \cdot f_{analysed} \cdot f_{prepared} \cdot f_{showcased} \cdot f_{published}

In a year, the number of images stored in Omero or on the network share is the low millions, and hopefully the number of published images is the 100’s.
I think the sweet spot is for Omero to store the images the were worth keeping organised (i.e. curating the metadata).
Depending on how proficient at microscopy your users are, and how diligent they are at metadata curation, this probably reduces the number of files by 100X. We have been exploring having users to import into a ‘last import’ directory then use Omero like a light table to review, cull and organise the images. It would be great to have a client-side version of OMERO.web to do this to avoid the churn of images on the server.

### 2. Complexity of the server setup

Omero is monolithic server so the more demands put on it the more complicated the setup gets. If you upload every image at a facility you soon run into network transfer, server side processing and file storage issues which requires more system administration skills to handle.
There are many moving parts involved so when something goes wrong and chokes the server there are lot of interactions to debug. For example, when have had problems from:
• corrupt files
• file types not being handled correctly by bioformats
• issues in Omero
• network latency between the servers
• network traffic between the servers being filtered by some firewall
• file system timeout due to load on the filesystem
• unbalanced setting like memory and threads in the JVM
You can soon end up in an IT service ticket round-about with each department pointing the finger at the other.

Hope it helps,

Chris

2 Likes

Do you have an architecture diagram of how you scale to a big facility like yours?

Thanks,
T

Hi Trefex,

I’m not a sys admin so I’m not sure if this is what you’re after.

• The 3 servers are virtualised through VMware (I think). This allows the resources to be dyanmically scaled.
• The storage is on an Isilon network share (NFS)

Cheers,

Chris

2 Likes

This is fantastic. Thank you. We have actually the same type of FWs and Isilon too, so I think we are on the right path

No worries. Let me know if you’d like me chase up more details specs or the ansible playbooks that we use.

Again, I’m not a say admin so this just my uninformed opinion… but I think things would have been smoother if the connection between app server, database server and storage were more direct.

I suspect OMERO was written for the components being on the same rack rather than having the connection cross firewalls and data centers.

As with most things it’s complicated
If your underlying infrastructure is setup well then within a datacentre there shouldn’t be too much impact, but the only way to know is to benchmark your OMERO performance. It’ll depend heavily on your OMERO usage.

For example, if you have very complex metadata you may be bound by PostgreSQL CPU/memory (for the underlying query) or OMERO CPU/memory (when it converts the PostgreSQL result into OMERO objects) rather than network IO. Tuning the PostgreSQL configuration instead of using the defaults may also improve performance. If you have very large files network IO or storage array performance might be the limiting factor. If you have a lot of very small files then the optimisations required to the storage array can be the opposite of those required for large files.

The only way to be sure is to benchmark everything!

3 Likes

@evenhuis I’d be curious to hear more about how your facility uses OMERO and Fiji together. Do you use ImageJ-OMERO?

2 Likes

Hi Curtis,

The broader goal of deploying OMERO is make it easier to collaborate, search and reproduce our scientific work. The dream would be able to do things like:

• given a figure in paper, where is the original file, how was it processed, what experiment is from is from and is there an entry lab book or protocol?
• given a raw datafile, what analyses has it been used in?
• what has this script been used to process, was it used in any publications?

Here’s a brief outline of what we doing and what we’d like to do:

Cheers,

Chris

### Current use

We are getting to users to store and catalogue their important raw image files in OMERO and to either

• download a copy to their local machine and process in imageJ
• open it directly with the Fiji-OMERO plugin

By doing this the local machine is just a temporary cache for raw image files. With the excellent OMERO.figure by @will-moore, a large fraction of users can produce montages etc in OMERO without needing to open them Fiji.

For analysis there is no substitute for imageJ though. What we currently are working on is:

• develop and tune the image processing steps into a script
• save dataset specific parameters to a settings file

Once this is set up then

• processed locally with Fiji
• the output images are uploaded to OMERO
• The script (or a link to the git version) and the settings files are added as annotations in OMERO

### Near future

Somethings things we are currently working on or planning to do