OMERO scripts producing "Failed to get results"

Hello, all,

We are running Omero 5.4.9 from an Ubuntu 16.04 server. When running Python scripts from Omero, we periodically get “Failed to get results” errors as the status response, though the script often seems successful. In these, cases we don’t get links to the error or stdout logs or links to the resulting data. It seems this occurs frequently when running many scripts at a time, either one user or multiple users.

For example, a user recently ran 10 simultaneous instances of the “Batch Image Export…” script on moderate size data (I don’t know the exact size, but each script run took many minutes). The resulting zip files appeared to be correctly attached to the datasets, but several of those exports reported “Failed to get Results”; see the image below. I’m also pasting some of our settings and a brief snippet of the OMEROweb.log below.

We have seen this behavior from our in-house scripts as well as Omero-provided scripts.

Do you have advice on what might be the cause or how we should go about determining the cause?

Thanks for any assistance,
Michael


Select settings from our Omero configuration:
omero.db.poolsize=64
omero.jvmcfg.max_system_memory=750000
omero.jvmcfg.percent.blitz=60
omero.jvmcfg.strategy=percent
omero.scripts.timeout=36000000
omero.sessions.timeout=36000000
omero.threads.background_threads=32
omero.threads.max_threads=64
omero.threads.min_threads=16
omero.web.application_server=wsgi-tcp
omero.web.apps=[“omero_fpbioimage”, “omero_iviewer”]
omero.web.wsgi_workers=8

Snippet of the OMEROweb.log:

2019-11-21 21:09:41,556 INFO [ omero.gateway] (proc.153537) connect():2219 created connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:41,556 INFO [ omero.gateway] (proc.69453) connect():2219 created connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:41,556 INFO [ omeroweb.webclient.views] (proc.153537) activities():3321 Check callback on script: ProcessCallback/79ebe936-c0e2-4938-b4da-392522c81059 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:41,556 INFO [ omero.gateway] (proc.98119) connect():2219 created connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:41,556 INFO [ omeroweb.webclient.views] (proc.69453) activities():3321 Check callback on script: ProcessCallback/79ebe936-c0e2-4938-b4da-392522c81059 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:41,557 INFO [ omeroweb.webclient.views] (proc.98119) activities():3321 Check callback on script: ProcessCallback/79ebe936-c0e2-4938-b4da-392522c81059 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:41,563 INFO [ omeroweb.webclient.views] (proc.153537) activities():3321 Check callback on script: ProcessCallback/de08ad37-5759-498a-bb65-b8d927fd1936 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:41,563 INFO [ omeroweb.webclient.views] (proc.69453) activities():3321 Check callback on script: ProcessCallback/de08ad37-5759-498a-bb65-b8d927fd1936 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:41,563 INFO [ omeroweb.webclient.views] (proc.98119) activities():3321 Check callback on script: ProcessCallback/de08ad37-5759-498a-bb65-b8d927fd1936 -t -e 1.0:tcp -h 10.200.26.102 -p 38605 -t 60000
2019-11-21 21:09:44,371 INFO [ omeroweb.webclient.views] (proc.154779) activities():3343 Failed on proc.getResults() for OMERO.script
2019-11-21 21:09:44,371 INFO [ omeroweb.webclient.views] (proc.153537) activities():3343 Failed on proc.getResults() for OMERO.script
2019-11-21 21:09:44,371 INFO [ omeroweb.webclient.views] (proc.187804) activities():3343 Failed on proc.getResults() for OMERO.script
2019-11-21 21:09:44,371 INFO [ omeroweb.webclient.views] (proc.98119) activities():3343 Failed on proc.getResults() for OMERO.script
2019-11-21 21:09:44,371 INFO [ omeroweb.webclient.views] (proc.69453) activities():3343 Failed on proc.getResults() for OMERO.script
2019-11-21 21:09:44,409 INFO [ omero.gateway] (proc.154779) close():1932 closed connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:44,410 INFO [ omero.gateway] (proc.153537) close():1932 closed connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:44,412 INFO [ omero.gateway] (proc.187804) close():1932 closed connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:44,414 INFO [ omero.gateway] (proc.69453) close():1932 closed connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)
2019-11-21 21:09:44,417 INFO [ omero.gateway] (proc.98119) close():1932 closed connection (uuid=3e4926ba-d282-4715-9310-61a8957bd758)

Can you also send the Processor-0.log ? ~J

Hi, JoshMoore, I’m attaching the Processor-0.log. It contains time-stamps similar to those in the OMEROweb.log snippet, so I think this is the correct version of the file. If it doesn’t look correct, please let me know.
–Michael
Processor-0.log.zip (45.4 KB)

I’m not sure if this is helpful, but I edited the line in lib/python/omeroweb/webclient/views.py to print the exception. It comes from the proc.getResults(0, conn.SERVICE_OPTS) call and is printed here

Failed in getResults with exception : exception ::omero::ApiUsageException
{
serverStackTrace =
serverExceptionClass =
message = No current process.
}

Where proc is

Proc = ProcessCallback/1255fcd2-b6e4-4821-b213-a457fcbd8504 -t -e 1.0:tcp -h 10.220.64.102 -p 45425 -t 60000

Thanks for the logs, @austinMLB.

Processor-0.log looks unusual but I couldn’t put my finger on what was going wrong. The ApiUsageException you mention sounds as if the process has been killed behind the scenes, which would match the log file.

Does this happen when you only execute a single thread?

Are there any system level logs (dmesg, etc) that could point to the Linux OOM killer or similar?

~Josh

Hi, Josh, thanks for the reply. I’ll run some tests this morning to try to answer those questions. It appears to me as if that try-catch block in question is hit more than once for some scripts. Yesterday, I ran 7 instances of a script, and it hit that code 9 times. If that is correct (and I’ll try to confirm), it suggests getResults is called twice for some individual runs. --Michael

I upgraded my test server to 5.5.1, and created simple scripts that essentially just sleeps a few times. If I run this script 7 times, as quickly as possible, it will sometimes give me the “Failed to get results” error. I’ve bundled my script, OMEROWeb.log and Processor.log. The only things that should have been done on these is two runs of 7. The first set gave no (visible) error, and the second gave the error on the last of the 7 scripts runs.

It may be that previous example I sent was a different problem; one where the particular scripts failed differently. I’m hoping this version of the problem is easier to diagnose.

If you see anything problematic with my Script, please let me know.

It appears that for some runs, it tries to get the results multiple times, but I can’t explain why that would be. I have even seen “Failed to get results” flash on the status line before the correct results line is added.

@joshmoore, I have not been able to reproduce the problem with 1 or 2 scripts runs in a set. Doing set of 1 through 10, it seemed to get progressively more likely that a give run would have an error. (That is a run is more likely to have the failure if it was running alongside more scripts.) Also, I didn’t see anything in the Ubuntu logs that would point me towards a specific failure.

Thanks for the help as this has been rather troubling to our users,
Michael
bundle.zip (53.6 KB)

So, it turns out that this was a result of the webclient requesting results from too many calls all at once. I’ve opened a fix PR at https://github.com/ome/omero-web/pull/87

Regards,
Will

1 Like