Omero stopping occasionally since the update to 5.6

Hi all

Since I performed our update to 5.6 a few weeks ago, our OMERO installation stops every few days and needs restarting. We have OMERO.web and OMERO.server installed on separate servers (and we have test and production versions of each) and all seem to have the problem.

A brief exploration suggests that OMERO.web and icegridnode are the parts that are stopping but I’m not sure what’s causing this, this wasn’t a problem before the upgrade. Has anyone else had this issue?

I’ve inherited our OMERO installation and am maintaining it until our new sysadmin starts but am no expert in Linux or sysadmin so it’s entirely possible that I’ve missed something pertinent to the upgrade along the way.

Thanks in advance, Claire

From which version did you upgrade? Are you on new servers now or you somehow upgraded the environment of the previous? I’m afraid I don’t know of any changes that might cause the effect you are seeing; were there any other changes on those machines as part of or coinciding with the upgrade?

What do you observe that makes you think those are “stopping” - quite what happens? omero admin diagnostics can be helpfully revealing, as might the server and web logs from around the time of the stoppage. If you see odd things in them that are difficult to interpret of course we are happy to take a look if you zip them up and submit them to http://qa.openmicroscopy.org.uk/qa/upload/

Thanks for the reply!

To answer your questions:

  1. We upgraded from 5.5.1 to the same environment. The servers are maintained by our IT services so it could be they did something in the background unrelated to OMERO that is affecting things.

  2. “Stopping” for me means that the webpage shows a “502 Bad gateway” error, this is fixed by restarting OMERO.web.

  3. The server also seems to “stop” at the same time. After I’ve restarted OMERO.web, I get a “cannot connect to server” issue when I try to log in. Again, restarting OMERO.server seems to fix this.

  4. I took a look at the diagnostics when it stopped working last time but unfortunately I think I forgot to save the results - I’ll take another look when it happens again.

  5. I’ve taken a look at the logs, there are no obvious errors on the web log, it simply stops writing anything to the log one day and then starts again when I noticed and restarted everything. The server log is showing an “Failed to acquire connection after retries=3” error so I would presume the problem is with the connection dropping out to our servers?

I’ll speak with our IT in the first instance and upload the logs/diagnostic output to the link you provided if it happens again. Thanks!

Hi @CMitchell,

Sounds good. As it stands, we don’t have any leads to go on. Maybe let’s hope it doesn’t happen again. :wink:

~Josh