IJM macro crashes after a few loops

TL;DR: I’m trying to run a series of convolutions with CLIJ2 in a for() loop, but it seems that my computer “crashes” after a random number of loops. Introducing a wait(500) command after process-heavy steps seems to help somewhat (i.e. more loops get finished), but the code never finishes completely, so I have to tell it to pick up where it left off after each crash. @haesleinhuepf can you help?

Here are my computer specs:
OS: macOS Catalina version 10.15.5 (19F101)
CPU: 2.4 GHz Intel Core i9
RAM: 32 GB 2400 MHz DDR4
GPU1: Intel UHD Graphics 630 1536MB
GPU2: Radeon Pro Vega 20 4GB (I’m using this one for CLIJ2)

Fiji:
Version: 1.0.0-rc-69/1.53c
Build: 269a0ad53f

CLIJ2 Version 2.0.0.14

I’ve tested individual iterations of the for() loop manually, to check that my GPU has the memory to handle the convolution.

Unfortunately, FIJI doesn’t throw an error. My computer just goes to the login screen halfway through a random iteration of the for() loop, forcing me to log in again. Fiji re-opens on startup, but

Code is attached. The for() loop should go from slice = 1 to slice = 71, but I’ve changed the uploaded version so that it can pick up where it crashed.

Thanks for any advice!

TannerconvolveLITE.ijm (2.4 KB) golubsheet_0.488umwvl_focus_12mm_a1_0.1783mm_d1_0.39465mm-1.tif (7.9 MB) Output.tif (4.0 MB) PSF_1.49NA_1.515RI_509nm_50x50x100nm.tif (4.0 MB)

1 Like

Hey Tanner @fadero,

Wow. That sounds horrible. Apologies for the inconvenience.

Could you please check if some kind of log files are in the Fiji.app folder? Or under Contents? Not sure where crashlog files end up on Mac…

I’ll run your script in the meantime on my AMD test system.

Thanks for reporting! Stay tuned :slight_smile:

Cheers,
Robert

1 Like

Thanks for taking a look at it for me. I did find a few of these crash files from today (see attached), but it doesn’t appear to be from the most recent crash (based on the time stamp). Maybe it doesn’t save all of them? From a cursory glance, this does seem like it’s reporting the crash as happening during the convolution.

TannerImageJ-macosx_2020-07-09-135255_Tanner-Faderos-Computer.txt (140.1 KB)

1 Like

Thanks! Perfect! It says

Crashed Thread:        47  Java: Convolve an image with another image on GPU  Dispatch queue: opencl_runtime

So I guess that has something todo with CLIJ :slight_smile: I’ll come back to you soon. Your computer may become my guinea pig.

Thanks again for reporting!

Hey Tanner @fadero,

I just tried a couple of times and “unfortunately” my Fiji doesn’t crash. I assume you ran the program a couple of times, right? Does it always crash “after a few loops” or is it sometimes earlier, sometimes later? If you have multiple crash log files (by chance), are they similar? Do they always say something like

Crashed Thread:        47  Java: Convolve an image with another image on GPU  Dispatch queue: opencl_runtime

and

Application Specific Information:
abort() called

Application Specific Signatures:
Graphics hardware encountered an error and was reset: 0x00000001

I’m preparing a verbose clij for your system which will write a log file to disc. I’m just trying to narrow down where the issue might be :wink:

Thanks for your support!

Cheers,
Robert

Hey @haesleinhuepf

Yes it does crash sometimes earlier, sometimes later. Introducing the wait() commands seems to shift the number of loops it’s able to complete.

I’ve attached four more crash logs that I found below. They do all appear to have the same “Crashed Thread” and “Graphics hardware encountered an error and was reset” lines in the logs, though the threads seem to differ.

Let me know when/how I can install the verbose CLIJ! Thanks for trying it out on your end.

-ImageJ-macosx_2020-07-09-133148_Tanner-Faderos-Computer.txt (155.2 KB) ImageJ-macosx_2020-07-09-133558_Tanner-Faderos-Computer.txt (144.2 KB) ImageJ-macosx_2020-07-09-134138_Tanner-Faderos-Computer.txt (145.8 KB) ImageJ-macosx_2020-07-09-134440_Tanner-Faderos-Computer.txt (141.3 KB) Tanner

1 Like

@haesleinhuepf one thing that I just remembered is that I typically run the caffeinate command through Terminal while my CLIJ code runs so that the computer doesn’t go to sleep. I didn’t think that would affect anything (i.e. causing crashes) but it might be worth mentioning.

1 Like

It could be a driver related issue. Sometime missing GPU drivers can cause such mishaps. Try updating your GPU driver to see if it helps.

2 Likes

@kapoorlab thanks for the suggestion. Do you know how I should do this specifically for an AMD Radeon Pro Vega 20 GPU on macOS Catalina? I tried looking it up and it seems like the only way to update GPU drivers is to update the general OS, which I have done. Just checking to see if there’s anything else I should be doing.

Maybe try to get it from here: https://www.amd.com/en/support/kb/release-notes/apple-boot-camp
Lets see if this helps in stopping the random reboots.

This looks like it’s a driver for Boot Camp (i.e. running Windows on my machine), not for just running the GPU on the normal macOS. Or am I wrong? Just double checking before installing these drivers.

Ah right, but maybe here you can see if there are any drivers for Mac. I have intel GPU on my Mac and with some driver installs such problems should be resolved.

I’m not seeing any drivers there for Mac. @haesleinhuepf do you know of any drivers that I need to install? I can’t find any via googling- it seems everything comes pre-packaged in macOS and you don’t need GPU drivers unless you’re running a different OS.

1 Like

I’m not an expert in MacOS. In general it could indeed be a driver issue but I think apple/amd do not allow different drivers…

Not an CLIJ expert at all, but for debugging:

  • can you select the Intel GPU as OpenCL device and see whether it crashes there?

It may sound unlikely but I would not rule out hardware problems.
I had funny things happen over the years when stressing GPUs.

  • a 2012 Macbook Pro would crash when hammering the GPU, this became worse over time and is a known issue with a chip in the power supply for the GPU. Early on, I needed to run code that was really stressing the GPU, later it happened so often that the computer was unusable.
  • a 2080Ti would sometimes create weird artefacts when running deconvolution code. This became progressively worse over a few weeks, eventually resulted in crashes and was also linked to a known hardware issue.
  • the Intel HD GPU in my Lenovo sometimes also creates artefacts when the machine is really hot. I could fairly reliably reproduce this by running a Unet for nuclei segmentation. It would produce perfect resuts for a few runs, but when trying to run things in batch on a laptop artefacts appear after a while (on the same data).

The fact that putting a pause in the computations gives you more iterations could mean that your GPU has more time to cool down.

2 Likes

And for some reason that I do not understand, apple allows Cuda download and then updates the graphics card driver. For me that fixes such glitches. In the screenshot you can see that cuda download for mac is possible too:Screenshot 2020-07-14 at 12.20.36

Hey @haesleinhuepf, @kapoorlab, and @VolkerH,

I think I may have figured out what is going wrong. I did some digging and found out that Macs, by default, try to switch between graphics cards to save on energy consumption based on the currently running software’s needs.

There’s a simple option in System Preferences under Energy Saver to disable graphics switching. I disabled that, commented out all of my wait() commands, and ran my code to completion without any hiccups.

Attached is a screenshot of the Energy Saver menu where the option can be found. @haesleinhuepf it might be worth digging into why this setting causes crashes. I’m certainly no expert, but I can imagine this might cause crashes if macOS tries to switch between graphics cards mid-code while CLIJ is repeatedly accessing the AMD card.

Anyway, I believe this is the solution, but I’ll post a follow-up if this crash happens again.

Thanks everyone for all your suggestions! I wouldn’t have found this option without digging around in macOS’s graphics settings.

-Tanner

2 Likes

Also I had fiji write logs while my code ran in case it crashed (it didn’t). Here is the log anyway:

Log.txt (136.9 KB)

1 Like

OMG. That also might be a solution for another issue I spotted earlier. @fadero I :heart: you :wink:

I will put that in the trouble shooting section of the FAQ. Thanks for letting me know!

Thanks to everyone for your suggestions. This community is awesome :slight_smile:

1 Like