MEM_OBJECT_ALLOCATION_FAILURE when running Stardist3D notebook on Windows 10

Hello again,

I tried to run this Stardist3D example notebook on Windows 10 using Visual Studio Code to see if it can use the GPU installed in our workstation (RTX 2070).

I “turned on” the use_gpu variable - use_gpu = True and gputools_available()

And I gave tensorflow2 a total amount of memory to use
limit_gpu_memory(0.99, total_memory=8000)

When I try to run the training block (with quick demo either true of false), it gives me the memory allocation error - MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE

richer output

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\pyopencl_init_.py in kernel_call(self, queue, global_size, local_size, *args, **kwargs)
863 # call can’t be overridden directly, so we need this
864 # trampoline hack.
–> 865 return self._enqueue(self, queue, global_size, local_size, *args, **kwargs)
866
867 def kernel_capture_call(self, filename, queue, global_size, local_size,

in enqueue_knl_stardist3d(self, queue, global_size, local_size, arg0, arg1, arg2, arg3, arg4, arg5, global_offset, g_times_l, allow_empty_ndrange, wait_for)

MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE

I do not know how to solve this issue. I made sure to install CUDA and cuDNN versions specified by the tensorflow install page - https://www.tensorflow.org/install/gpu#software_requirements

Thank you for your attention,
José Marques

P.S. - Maybe the only solution is to run this in a Linux machine.

Full error details:

Training block

quick_demo = True

if quick_demo:
print (
“NOTE: This is only for a quick demonstration!\n”
" Please set the variable ‘quick_demo = False’ for proper (long) training.",
file=sys.stderr, flush=True
)
model.train(X_trn, Y_trn, validation_data=(X_val,Y_val), augmenter=augmenter,
epochs=2, steps_per_epoch=5)

print("====> Stopping training and loading previously trained demo model from disk.", file=sys.stderr, flush=True)
model = StarDist3D.from_pretrained('3D_demo')

else:
model.train(X_trn, Y_trn, validation_data=(X_val,Y_val), augmenter=augmenter)
None;

Full error message

NOTE: This is only for a quick demonstration!
Please set the variable ‘quick_demo = False’ for proper (long) training.

MemoryError Traceback (most recent call last)
in
8 )
9 model.train(X_trn, Y_trn, validation_data=(X_val,Y_val), augmenter=augmenter,
—> 10 epochs=2, steps_per_epoch=5)
11
12 print("====> Stopping training and loading previously trained demo model from disk.", file=sys.stderr, flush=True)

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\model3d.py in train(self, X, Y, validation_data, augmenter, seed, epochs, steps_per_epoch)
440 n_take = self.config.train_n_val_patches if self.config.train_n_val_patches is not None else n_data_val
441 _data_val = StarDistData3D(validation_data, batch_size=n_take, length=1, **data_kwargs)
–> 442 data_val = _data_val[0]
443
444 data_train = StarDistData3D(X, Y, batch_size=self.config.train_batch_size, augmenter=augmenter, length=epochs
steps_per_epoch, **data_kwargs)

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\model3d.py in getitem(self, i)
74 prob = np.stack(tmp, out=self.out_edt_prob[:len(Y)])
75
—> 76 tmp = [star_dist3D(lbl, self.rays, mode=self.sd_mode) for lbl in Y]
77 if len(Y) == 1:
78 dist = tmp[0][np.newaxis]

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\model3d.py in (.0)
74 prob = np.stack(tmp, out=self.out_edt_prob[:len(Y)])
75
—> 76 tmp = [star_dist3D(lbl, self.rays, mode=self.sd_mode) for lbl in Y]
77 if len(Y) == 1:
78 dist = tmp[0][np.newaxis]

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\geometry\geom3d.py in star_dist3D(lbl, rays, grid, mode)
92 return _cpp_star_dist3D(lbl, rays, grid=grid)
93 elif mode == ‘opencl’:
—> 94 return _ocl_star_dist3D(lbl, rays, grid=grid)
95 else:
96 _raise(ValueError(“Unknown mode %s” % mode))

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\geometry\geom3d.py in _ocl_star_dist3D(lbl, rays, grid)
78 program.run_kernel(‘stardist3d’, res_shape[::-1], None,
79 lbl_g, rays_g.data, dist_g.data,
—> 80 np.int32(grid[0]),np.int32(grid[1]),np.int32(grid[2]))
81
82 return dist_g.get()

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\gputools\core\oclprogram.py in run_kernel(self, name, global_size, local_size, *args, **kwargs)
44 self._kernel_dict[name] = getattr(self,name)
45
—> 46 self._kernel_dict[name](self._dev.queue,global_size, local_size,*args,**kwargs)
47
48

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\pyopencl_init_.py in kernel_call(self, queue, global_size, local_size, *args, **kwargs)
863 # call can’t be overridden directly, so we need this
864 # trampoline hack.
–> 865 return self._enqueue(self, queue, global_size, local_size, *args, **kwargs)
866
867 def kernel_capture_call(self, filename, queue, global_size, local_size,

in enqueue_knl_stardist3d(self, queue, global_size, local_size, arg0, arg1, arg2, arg3, arg4, arg5, global_offset, g_times_l, allow_empty_ndrange, wait_for)

MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE

Could you try with the following?

limit_gpu_memory(0.9, total_memory=8000)

I tried it. A new error appears:

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node functional_1/conv3d/Conv3D (defined at C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\base.py:661) ]] [Op:__inference_predict_function_655]

This error appears on this block -
median_size = calculate_extents(Y, np.median)
fov = np.array(model._axes_tile_overlap(‘ZYX’))
print(f"median object size: {median_size}")
print(f"network field of view : {fov}")
if any(median_size > fov):
print(“WARNING: median object size larger than field of view of the neural network.”)

I restarted the jupyter kernel, ran everything from the beginning. And this new error still appears

Full error

AttributeError: ‘StarDist3D’ object has no attribute ‘_tile_overlap’

During handling of the above exception, another exception occurred:

UnknownError Traceback (most recent call last)
in
1 median_size = calculate_extents(Y, np.median)
----> 2 fov = np.array(model._axes_tile_overlap(‘ZYX’))
3 print(f"median object size: {median_size}")
4 print(f"network field of view : {fov}")
5 if any(median_size > fov):

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\base.py in _axes_tile_overlap(self, query_axes)
674 self._tile_overlap
675 except AttributeError:
–> 676 self._tile_overlap = self._compute_receptive_field()
677 overlap = dict(zip(
678 self.config.axes.replace(‘C’,’’),

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\base.py in _compute_receptive_field(self, img_size)
659 z = np.zeros_like(x)
660 x[(0,)+mid+(slice(None),)] = 1
–> 661 y = self.keras_model.predict(x)[0][0,…,0]
662 y0 = self.keras_model.predict(z)[0][0,…,0]
663 grid = tuple((np.array(x.shape[1:-1])/np.array(y.shape)).astype(int))

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
128 raise ValueError(’{} is not supported in multi-worker mode.’.format(
129 method.name))
–> 130 return method(self, *args, **kwargs)
131
132 return tf_decorator.make_decorator(

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py in predict(self, x, batch_size, verbose, steps, callbacks, max_queue_size, workers, use_multiprocessing)
1597 for step in data_handler.steps():
1598 callbacks.on_predict_batch_begin(step)
-> 1599 tmp_batch_outputs = predict_function(iterator)
1600 if data_handler.should_sync:
1601 context.async_wait()

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds)
778 else:
779 compiler = “nonXla”
–> 780 result = self._call(*args, **kwds)
781
782 new_tracing_count = self._get_tracing_count()

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
844 *args, **kwds)
845 # If we did not create any variables the trace we have is good enough.
–> 846 return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access
847
848 def fn_with_cond(*inner_args, **inner_kwds):

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self, args, kwargs, cancellation_manager)
1846 resource_variable_ops.BaseResourceVariable))],
1847 captured_inputs=self.captured_inputs,
-> 1848 cancellation_manager=cancellation_manager)
1849
1850 def _call_flat(self, args, captured_inputs, cancellation_manager=None):

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1922 # No tape is watching; skip to running the function.
1923 return self._build_call_outputs(self._inference_function.call(
-> 1924 ctx, args, cancellation_manager=cancellation_manager))
1925 forward_backward = self._select_forward_and_backward_functions(
1926 args,

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
548 inputs=args,
549 attrs=attrs,
–> 550 ctx=ctx)
551 else:
552 outputs = execute.execute_with_cancellation(

C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
—> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node functional_1/conv3d/Conv3D (defined at C:\Users\igcuser\AppData\Local\Programs\Python\Python37\lib\site-packages\stardist\models\base.py:661) ]] [Op:__inference_predict_function_655]

Function call stack:
predict_function

Sorry about the delay. Today has been quite hectic.

I suspect some other notebook was using the GPU (and its memory) when you tried this.

1 Like

There were these other 2 notebooks open (Data_visualization and Training_v2.1), but I made sure that no one was running them. Without closing them, I restarted the Jupyter kernel and ran everything again with
limit_gpu_memory(0.9, total_memory=8000)

The convolution error did not appear.
These were the values yielded by that cell
median object size: [10. 20. 20.]
network field of view : [17 30 30]

unfortunately, the MEM_OBJECT_ALLOCATION_FAILURE error returned.

Could you share the whole training notebook?

Yes, it’s here - https://drive.google.com/drive/folders/1kum6hS1aBrtVK4kW5xUPD-zBiLlwULRt?usp=sharing

Could you post the output of

from gputools import get_device
get_device()

-------- available devices -----------
platform: NVIDIA CUDA
device type: CPU
device type: GPU
GeForce RTX 2070

-------- currently used device -------
NAME: GeForce RTX 2070
GLOBAL_MEM_SIZE: 8589934592
GLOBAL_MEM_SIZE: 8589934592
MAX_MEM_ALLOC_SIZE: 2147483648
LOCAL_MEM_SIZE: 49152
IMAGE2D_MAX_WIDTH: 32768
IMAGE2D_MAX_HEIGHT: 32768
IMAGE3D_MAX_WIDTH: 16384
IMAGE3D_MAX_HEIGHT: 16384
IMAGE3D_MAX_DEPTH: 16384
MAX_WORK_GROUP_SIZE: 1024
MAX_WORK_ITEM_SIZES: [1024, 1024, 64]

  1. Does the same happens if you decrease the memory limit e.g. to
    limit_gpu_memory(0.7, total_memory=8000)

  2. Can you post the output of nvidia-smi after you run the notebook?

1 Like

NOTE: This is only for a quick demonstration!
Please set the variable ‘quick_demo = False’ for proper (long) training.
Epoch 1/2
2/5 [===========>…] - ETA: 0s - loss: 2.8745 - prob_loss: 1.4001 - dist_loss: 7.3717 - prob_kld: 0.6709 - dist_relevant_mae: 7.3715 - dist_relevant_mse: 82.6570WARNING:tensorflow:Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.2174s vs on_train_batch_end time: 0.4229s). Check your callbacks.
WARNING:tensorflow | Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.2174s vs on_train_batch_end time: 0.4229s). Check your callbacks.
5/5 [==============================] - 6s 1s/step - loss: 2.2009 - prob_loss: 0.8308 - dist_loss: 6.8506 - prob_kld: 0.4528 - dist_relevant_mae: 6.8505 - dist_relevant_mse: 68.1177 - val_loss: 1.7500 - val_prob_loss: 0.4313 - val_dist_loss: 6.5933 - val_prob_kld: 0.2917 - val_dist_relevant_mae: 6.5933 - val_dist_relevant_mse: 59.3845
Epoch 2/2
5/5 [==============================] - 5s 945ms/step - loss: 1.8193 - prob_loss: 0.3736 - dist_loss: 7.2288 - prob_kld: 0.2431 - dist_relevant_mae: 7.2287 - dist_relevant_mse: 73.6275 - val_loss: 1.5985 - val_prob_loss: 0.3943 - val_dist_loss: 6.0213 - val_prob_kld: 0.2546 - val_dist_relevant_mae: 6.0212 - val_dist_relevant_mse: 52.9513
====> Stopping training and loading previously trained demo model from disk.

Loading network weights from ‘weights_best.h5’.
Found model ‘3D_demo’ for ‘StarDist3D’.
Downloading data from https://github.com/stardist/stardist-models/releases/download/v0.1/python_3D_demo.zip
5750784/5749101 [==============================] - 1s 0us/step
Loading network weights from ‘weights_best.h5’.
Loading thresholds from ‘thresholds.json’.
Using default values: prob_thresh=0.707933, nms_thresh=0.3.

It worked, I would say. wow.

Nvidia-smi output

C:\Users\igcuser>nvidia-smi
Sat Nov 14 11:09:01 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71 Driver Version: 456.71 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 WDDM | 00000000:65:00.0 On | N/A |
| 0% 45C P8 28W / 185W | 6875MiB / 8192MiB | 2% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1428 C+G Insufficient Permissions N/A |
| 0 N/A N/A 2392 C+G …wekyb3d8bbwe\Video.UI.exe N/A |
| 0 N/A N/A 4192 C+G …lPanel\SystemSettings.exe N/A |
| 0 N/A N/A 6024 C+G Insufficient Permissions N/A |
| 0 N/A N/A 6996 C+G …8wekyb3d8bbwe\Cortana.exe N/A |
| 0 N/A N/A 7340 C+G …cw5n1h2txyewy\LockApp.exe N/A |
| 0 N/A N/A 7372 C+G …otePC Host\RemotePCUI.exe N/A |
| 0 N/A N/A 8352 C …ython\Python37\python.exe N/A |
| 0 N/A N/A 9740 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 10788 C+G …artMenuExperienceHost.exe N/A |
| 0 N/A N/A 11232 C+G …5n1h2txyewy\SearchApp.exe N/A |
| 0 N/A N/A 11544 C+G …me\Application\chrome.exe N/A |
| 0 N/A N/A 12072 C+G …nputApp\TextInputHost.exe N/A |
| 0 N/A N/A 13084 C+G …icrosoft VS Code\Code.exe N/A |
| 0 N/A N/A 13624 C+G …perience\NVIDIA Share.exe N/A |
| 0 N/A N/A 16120 C+G …bbwe\Microsoft.Photos.exe N/A |
| 0 N/A N/A 16496 C+G …ekyb3d8bbwe\YourPhone.exe N/A |
±----------------------------------------------------------------------------+

C:\Users\igcuser>

I’ve put quick_demo = False and I’m currently running a long training.

It’s currently using the GPU. Let’s see how long it takes

The training went without a hitch. No error message showed up.

I would say it took approximately 24 hours to do the whole training (400 epochs). Next time, I’ll put a start time and end time, so I don’t use the “guess o’meter”.

Thanks so much for all the help.

Best,