Building an image database with bioimage informatics support for an individual experimental biology lab

Hello image.sc forum community,

I apologize in advance if the information I am looking for is already posted somewhere. I have been searching and reading the OME website and image.sc forum for many days, and what I have found makes me think I am going to have to build some things. While I enjoy trying to make wheels rounder than anyone else has, I would happily use established solutions here if there are any.

As the title indicates, I am trying to build an image database with bioimage informatics support for an individual lab. I already have a handbuilt system in place to deal with one of the bread and butter techniques in the lab, which generates widefield microscopy movies of a genetically encoded fluorescent protein sensor. During my PhD I taught myself computer programming and built this lab bioimage informatics system using Fiji and Matlab programs. The system I built organizes, processes, and analyzes the movies in a standardized and semi-automated manner, it has been used to support multiple publications and collaborations, and is currently being used by multiple people in the lab. With a couple of months before I leave the lab, I would really like to port this system into a better developed and supported solution that will continue to serve the lab well after I leave. I would like to make it as easy as possible for future lab members to add new sensors and experiments for widefield studies, bring confocal data into the image database, and develop new processing and analysis workflows to extract biological insight from the imaging data.

OMERO seems like a beautiful solution, and I have played around with the demo server a little and it seems like it could do the job. I have a problem here that OMERO servers seem to require significant IT support, and all the mentions of OMERO servers I have found seem to be set up and managed at an institutional or microscopy core level. My previous attempts to explore research IT support at our institution have been frustrating, and our general IT here is consistently underwhelming, so I’m not sure I would trust their support in setting up, maintaining, and/or administering an OMERO server. The shared microscopy facilities that are on campus often have trouble just training new users and keeping microscopes maintained, so I don’t think trying to add an OMERO server to their workload is viable. Besides that, one of the primary data generators for the lab is a microscope that is owned and maintained by the lab, so not associated with a microscopy core. Looking at trying to set up an OMERO server on my own, I find the prospect intimidating and potentially over my head, coming from an experimental biology background with a mostly self-taught computational mindset and programming skills. I imagine I could get through the set up eventually, but there is potential for it to take more time than I have. Additionally, I am the computational guy in the experimental biology lab, there is no replacement for me lined up, and I would hate to leave a system in place that could need detailed computational / IT troubleshooting or reconfiguration with hardware / software changes, new types of experimental data, or new analyses. From playing with the demo server it seems like OMERO would be user friendly enough for non-computational experimental biologists to pick up with a small learning curve, and the OMERO.applications seem like they would be very powerful and helpful, but the concerns in this paragraph seem to make OMERO not a viable option.

Another alternative I see is to bring the images in line with the OME data model, either saving the movies as OME-TIFF files that carry the metadata with them, or leaving the movies themselves as raw TIFF files and having the OME-XML metadata stored separately, using a lab unique identifier for each TIFF file to connect with its metadata. My current system leverages a lab GitHub repository to generate a lab unique identifier for each movie entered into the image database, and stores the acquisition, experiment, processing, and analysis metadata in text files in this lab GitHub repo. The movie files themselves are centrally stored in a RAID housed in a Synology Diskstation in the lab, which is only accessible on the lab wireless network or via hardwire. Movie files also live on individual lab and personal computers, where they are processed and analyzed, and the raw files are uploaded to AWS Glacier for permanent archiving. I see in some of the earlier OME papers that local databases were talked about, but nowadays OMERO seems to be the only option. Have I missed instances of groups using OME for local image databases, or does everyone just use OMERO?

My thought is that a halfway approach of bringing the movies in line with the OME data model, without going all the way to OMERO, would facilitate processing and analysis operations on the movies using multiple programs. Fiji / ImageJ is used extensively in the lab, Matlab is used to some degree, and I’m trying to introduce KNIME before I leave, to provide a graphical programming interface and powerful computational tools and workflows for experimental biologists who may be intimidated by the thought of working in computer code. The Fiji / ImageJ and Matlab workflows established in the lab can be extended with some effort, but this is not straightforward even for me, the person who single-handedly built the system from the ground up. Incorporating new sensors, experiments, or analyses, or sharing processing and analysis workflows with collaborators, is quite difficult, if it’s even possible at all. It seems the OME data model was developed partly to deal with these very issues, and so I am curious about ways this could be implemented and set up to maximally benefit my PI and future lab members.

I think this is enough to start this off, I am happy to provide any further information that is helpful. I look forward to thoughts or suggestions from the community about this situation, and any possible insights about how to proceed. Thanks for reading!

  • Jeff
5 Likes

Hi,
if your problem is setting up an omero server, you can get a server running very easily using docker, at least on linux. If you have docker installed and an internet connection you can just run:

docker volume create --name omero-db
docker volume create --name omero-data
docker run -d --name postgres -e POSTGRES_PASSWORD=postgres -v omero-db:/var/lib/postgresql/data postgres
docker run -d --name omero-server --link postgres:db -e CONFIG_omero_db_user=postgres -e CONFIG_omero_db_pass=postgres -e CONFIG_omero_db_name=postgres -e ROOTPASS=omero-root-password -v omero-data:/OMERO -p 4063:4063 -p 4064:4064 openmicroscopy/omero-server:latest
docker run -d --name omero-web --link omero-server:omero -p 4080:4080 openmicroscopy/omero-web-standalone:latest

To make it securely run on a public server, you still might need an IT-expert though…
Best regards,
Volker

2 Likes

@volker’s definitely right that this would be an easy way to get an OMERO up and running. (https://github.com/ome/omero-deployment-examples contains similar snippets if anyone is interested.) In addition to @volker’s caveat about the security of the system (which requires regular software patching, etc.), however, I’d also include that securing the data volumes (omero-db and omero-data) via backup and resilient filesystems, etc. is also vital. Judging by the above, @JeffB’s analysis that:

From playing with the demo server it seems like OMERO would be user friendly enough for non-computational experimental biologists to pick up with a small learning curve, and the OMERO.applications seem like they would be very powerful and helpful, but the concerns in this paragraph seem to make OMERO not a viable option.

might still hold. i.e. an investment in personnel is needed to manage a data management server.

Hi all,
Maybe this has already been discused, but is there an estimate of the work load associated with server maintenance (once in production)? Barring any plugin development, the pure sys admin work?

Hi @glyg

In my experience this question is not easy to answer. Depending on the requirements, server maintenance can be very complex.

A good comparison is the job of a caretaker:
If the house is to be in perfect condition and every request for alterations is to be implemented immediately, then this is a full-time job. However, if a broken window or a squeaking door does not bother you, then it is sufficient for the caretaker to check every day whether the roof is still tight and the entrance door is working.
In this case the estimate of work to maintance a production server is some minutes every day. But you must always have the possibility to sacrifice a few hours a day in case of systems troubles.

Regards,
Susanne

Thank you for the evocative answer, I think I get the idea :smiley:

Further, it helps to have a general idea how OMERO works and is installed so that when something does happen, one has a clue where to look and what to do. Using things like the docker images is good because then they’ll at least be more a known thing that matches everyone else’s so others also know and can more easily help.

For end-users OME has a good amount of training material for facilities to help biologists get to grips with using OMERO and we do outreach. Also, institutions have the option of outsourcing OMERO server support to Glencoe Software: what it costs in money it sure may save in local labor hours / downtime.

Without OMERO, making files (including important metadata) readable by Bio-Formats is very enabling in itself, you’re absolutely right on that. If you go for separate metadata then note that OME companion files allow one to usefully place the OME-XML in a file alongside the pixel data in TIFF.

Thank you for the replies and discussion, this is helpful! This seems like a popular topic, so I will keep posting here as I work through this.

@volker Thanks for the suggestion. Unfortunately we do not have any machines running linux in the lab, and although I am theoretically familiar with docker, I have never actually worked with it (or linux for that matter), and no one else in the lab has either, to my knowledge. It’s great fun that my background shows through so clearly :frowning_face:

The security of the system is certainly a concern. The person who introduced the Synology Diskstation to the lab and set it up had more experience in these matters, and set it up for remote access. After he left we noticed that the computer that enabled the remote access was running so slowly that it was almost unusable, and it took me around two weeks of researching and troubleshooting to find that mystery IP addresses from the other side of the globe were constantly trying to brute force the password. I disabled the remote access, and that fixed the problem. Looking at trying to re-enable remote access with better security looked like it would take more time and effort than what it would be worth, so we simply made the Diskstation only accessible on the lab wireless network or via hardwire, and this solution has been working well.

@joshmoore Thank you for your input and the link. I completely agree that a dedicated person to setup, manage, and administer a data management server is the ideal solution. Unfortunately at the PhD student transitioning to postdoc level this is not something I have any control over. We’re not a super large lab, so there wouldn’t be enough work to justify hiring a dedicated person for the lab. I hear that it can be a struggle to get institutional support or external funding for personnel in these areas, so I think I’m left with doing the best I can and making the most of the situation that I find myself in.

@glyg That was a helpful question, and @sukunis, a helpful answer, so thank you both.

@mtbc Thanks for mentioning the option of outsourcing OMERO server support, I was not aware of that. We do have a new imaging facility that is in the processing of coming on line on campus, and I have an inquiry in to them to learn about any data management plans they may have, so if outsourcing OMERO support seems like it could be an option for them I will pass the info along.

From the replies and discussion here so far, I am concluding that an OMERO server for an individual experimental biology lab is not feasible, unless there is someone in the lab with an IT background. Further discussion about this is of course welcome, but with my limited time I think it is best to focus on bringing my current system in line with the OME data model. I imagine I will have more questions, so I will post those as I encounter them and post my progress in case it may be helpful for others.

Thanks!

  • Jeff

Hi everyone.

I took a small detour to see if I could find other potential image database solutions that may be accessible to individual microscopy labs with minimal IT support. Below are my thoughts about what I encountered, I hope it adds to the conversation.

First, to minimize potential equivocations, what I mean by image database is standardized organization of image files and metadata. This organization could be instantiated as a centralized location, i.e. a server or single lab computer with a lot of storage, or distributed over multiple computers that are used by individual researchers, or some combination. The purpose of establishing an image database is to make it easy and efficient for human beings and computer programs to find image files and metadata on demand.

I started this thread with OMERO because it is the most popular image database solution I found, but sadly it seems to be not feasible for individual labs with minimal IT support.

Another potential solution I found is BisQue https://bioimage.ucsb.edu/bisque. Strangely, I don’t find any information about BisQue on this forum. One thing I am unsure about from my cursory examination is how easy it would be to find, load, process, analyze, and save output files programmatically. This is a big concern, because using programs like Fiji, Matlab, and KNIME to automate workflows brings huge benefits in terms of research efficiency and reproducibility, and I’ll build my own solution before I implement an image database where the files cannot be dealt with programmatically. Does anyone have experience working with BisQue, or have thoughts on this matter?

Looking at other popular image analysis programs, specifically CellProfiler/Analyst and Icy, my cursory examination seems to indicate that they are not concerned with the setup of image databases, although they have really nice functionalities to use image databases once they are set up. Does anyone have experience working with image databases and these programs?

Are there other solutions for setting up an image database that I missed? If anyone has other thoughts on this effort please feel free to share.

Thanks!

  • Jeff