Error 504 when many users log in to OMERO.web simultaneously

Hey @joshmoore,
probably “from time to time” was the wrong wording. Today I had to restart the server 3 times.

But Thanks for the question about insight. I haven´t test it before. So insight is still working when OMERO.web is not. As the students do not have to upload something, we haven´t showed them OMERO.insight yet. Maybe I should do that.

Understood. Can you try restart just omero-web the next time it happens? If that solves the issue, then possibly what you need to do is set a timeout on the web threads so that download isn’t possible.

~J.

Just restarting omero-web will not solve the error. I really have to restart omero admin restart and then omero-web to make OMERO.web accessible again.

Best
Thomas

Understood. Having the output of jstack $(omero admin ice server pid Blitz-0) before you restart would be useful then. ~J.

OK, let´s wait for the next error then :smile:

1 Like

We had the error again, but when I tried jstack $(omero admin ice server pid Blitz-0) I get bash: jstack: command not found
Just using omero admin ice server pid Blitz-0 as the omero user outputs 6366

I googled jstack and found that it might not be in my §PATH. I am not sure what exactly I have to do now. Since I am not very familiar with centos at all I don´t want to brake anything (until last week I never looked into a .log file or used grep… Don´t know how I got OMERO running :sweat_smile:) )Unfortunately my system-administrator, who usually helps me, doesn´t react.

Omero.web was running again by itself after some time.

If you haven’t restarted, you can use kill -QUIT 6366. The output will be in master.out. J.

While you’re at it, can you show me

SHOW max_connections;

from psql? See https://stackoverflow.com/a/8288860

Hi Josh,
sorry for the stupid question: But under which user and in which folder I have to do it. Couldn´t figure it out.
I will contact my system administrator again tomorrow morning until he react. I think this is the best.

Thanks you a lot for your help.

Morning, @T-Zobel. Sorry for all the brevity. That was all happening on my phone. For showing the max connections, you’ll need to connect to your Postgres. Replacing with values from omero config get:

psql -h ${omero.db.host} -U ${omero.db.user} ${omero.db.name}

then in the shell enter SHOW max_connections;.

omero admin ice server pid Blitz-0 gave you the process ID of the main OMERO server. For running kill -QUIT with that PID, you’ll need to either be the operating system user root or the user running OMERO, likely omero-server. If you’ve set up sudo, then sudo kill -QUIT will also work.

Alternatively, you can install the Java JDK (rather than just the JRE) in order to have jstack installed. That will print the stack trace to the console rather than to the master.out file.

~Josh

All fine, I am glad that you are helping us.

I got help from another friend now.
SHOW max_connections; --> 100
He changed it to max_connections 300
and Shared buffer to 256MB

Best,
Thomas

Ok. Let’s see how that works. Your omero.db.poolsize value should likely be slightly below your max_connections setting, but at 300 I would assume you’re ok. i.e. it’s possible that originally poolsize was too small and then too large. Let’s hope it’s now just right.

~J.

OK, now I understand (nearly) how it works together. Let´s see if it runs more stable now. Luis also changed some nginx timeout values.

I will write here how the system runs now.

Thanks for your effort! :+1:
Best Thomas

If the root problem really is long running database connections, then it may become necessary to set a timeout at that layer.

OK, then omero.sessions.timeout will be our next optimization step.

That’s certainly an option, but since clients keep themselves alive, you may not see much change with it. The database-level timeout I was referring to is statement_timeout in PostgreSQL itself. The down-side is that valid long-running commands would also be cancelled.

You can see more here: https://github.com/IDR/deployment/blob/a903e028d6dedccc82e1dbff6779703f4a426df8/ansible/idr-omero-readonly.yml#L163

ALTER ROLE {{ idr_omero_readonly_database.user }} SET statement_timeout = ...

~Josh

Our server is running stable now.
The last 2 days, we had more then 100 active students simultaneously. At some point the server was kind of slow, but I guess thats ok. The “Load Average” displayed with htop was at 7 (15min / ( 8 Cores), so the students also were quite active.

Tu sum up, we adjusted:
omero config set omero.db.poolsize 290
omero config set omero.threads.background_threads 100
omero.threads.max_threads 270
omero config set omero.web.wsgi_workers 17

Memory
omero config set omero.jvmcfg.percent 90
omero config set omero.jvmcfg.percent.blitz 30
omero config set omero.jvmcfg.percent.indexer 30
omero config set omero.jvmcfg.percent.pixeldata 20
omero config set omero.jvmcfg.system_memory 32000

Database
Adjust max_connections according to the omero.db.poolsize:
/var/lib/pgsql/{version_number}/data/postgresql.conf
max_connections 300
shared_buffer 256

nginx
timeouts:
location / { proxy_connect_timeout 600; proxy_send_timeout 600; proxy_read_timeout 600; send_timeout 600; }

One day we had again 504 errors, however, that was after the course. So not as many active users. We assume it is because of the download of big files. We will try to adjust related settings end also trying @sukunis plugin in the near future.

Thanks for the help so far!
All the best,
Thomas

1 Like

Glad to heart it.

Understood. Let’s start a dedicated thread when the time comes. Downloads are definitely their own special can of worms.

~J.