cuDNN launch failure - don't think it's memory related?

Hi,

I’ve been unable to train a model because I consistently get a cuDNN launch failure, however I don’t think it’s memory related as reducing the batch size to 4 from 8 doesn’t seem to make any difference.

The output when I try to launch network training (from the GUI):

Selecting multi-animal trainer
Config:
{‘all_joints’: [[0], [1], [2], [3], [4], [5], [6], [7], [8]],
‘all_joints_names’: [‘nose’,
‘lefteye’,
‘righteye’,
‘leftear’,
‘rightear’,
‘lefthand’,
‘righthand’,
‘leftelbow’,
‘rightelbow’],
‘batch_size’: 4,
‘crop_pad’: 0,
‘cropratio’: 0.4,
‘dataset’: ‘training-datasets/iteration-0/UnaugmentedDataSet_DLCTest_multiDec2/DLCTest_multi_Marcus95shuffle1.pickle’,
‘dataset_type’: ‘multi-animal-imgaug’,
‘deterministic’: False,
‘display_iters’: 500,
‘fg_fraction’: 0.25,
‘global_scale’: 0.8,
‘init_weights’: ‘/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt’,
‘intermediate_supervision’: False,
‘intermediate_supervision_layer’: 12,
‘location_refinement’: True,
‘locref_huber_loss’: True,
‘locref_loss_weight’: 0.05,
‘locref_stdev’: 7.2801,
‘log_dir’: ‘log’,
‘max_input_size’: 1500,
‘mean_pixel’: [123.68, 116.779, 103.939],
‘metadataset’: ‘training-datasets/iteration-0/UnaugmentedDataSet_DLCTest_multiDec2/Documentation_data-DLCTest_multi_95shuffle1.pickle’,
‘min_input_size’: 64,
‘mirror’: False,
‘multi_step’: [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]],
‘net_type’: ‘resnet_50’,
‘num_joints’: 9,
‘num_limbs’: 12,
‘optimizer’: ‘adam’,
‘pafwidth’: 20,
‘pairwise_huber_loss’: False,
‘pairwise_loss_weight’: 0.1,
‘pairwise_predict’: False,
‘partaffinityfield_graph’: [[0, 1],
[0, 2],
[0, 3],
[0, 4],
[3, 4],
[3, 1],
[4, 2],
[1, 2],
[0, 7],
[0, 8],
[7, 5],
[8, 6]],
‘partaffinityfield_predict’: True,
‘pos_dist_thresh’: 17,
‘project_path’: ‘/home/marcus/DLC/DLCTest_multi-Marcus-2020-12-02’,
‘regularize’: False,
‘rotation’: 25,
‘rotratio’: 0.4,
‘save_iters’: 10000,
‘scale_jitter_lo’: 0.5,
‘scale_jitter_up’: 1.25,
‘scoremap_dir’: ‘test’,
‘shuffle’: True,
‘snapshot_prefix’: ‘/home/marcus/DLC/DLCTest_multi-Marcus-2020-12-02/dlc-models/iteration-0/DLCTest_multiDec2-trainset95shuffle1/train/snapshot’,
‘stride’: 8.0,
‘weigh_negatives’: False,
‘weigh_only_present_joints’: False,
‘weigh_part_predictions’: False,
‘weight_decay’: 0.0001}
Activating limb prediction…
Starting with multi-animal imaug + adam pose-dataset loader.
Batch Size is 4
Getting specs multi-animal-imgaug 12 9
Initializing ResNet
Loading ImageNet-pretrained resnet_50
2020-12-03 15:02:19.985816: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-12-03 15:02:20.006091: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-12-03 15:02:20.006561: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x556bb0d967d0 executing computations on platform Host. Devices:
2020-12-03 15:02:20.006573: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2020-12-03 15:02:20.074795: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-03 15:02:20.075019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 5.49GiB
2020-12-03 15:02:20.075032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-12-03 15:02:20.075315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-03 15:02:20.075322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-12-03 15:02:20.075326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-12-03 15:02:20.075402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5317 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-12-03 15:02:20.076466: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x556bb0aa2da0 executing computations on platform CUDA. Devices:
2020-12-03 15:02:20.076476: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
Max_iters overwritten as 200000
Display_iters overwritten as 1000
Save_iters overwritten as 50000
Training parameters:
{‘stride’: 8.0, ‘weigh_part_predictions’: False, ‘weigh_negatives’: False, ‘fg_fraction’: 0.25, ‘mean_pixel’: [123.68, 116.779, 103.939], ‘shuffle’: True, ‘snapshot_prefix’: ‘/home/marcus/DLC/DLCTest_multi-Marcus-2020-12-02/dlc-models/iteration-0/DLCTest_multiDec2-trainset95shuffle1/train/snapshot’, ‘log_dir’: ‘log’, ‘global_scale’: 0.8, ‘location_refinement’: True, ‘locref_stdev’: 7.2801, ‘locref_loss_weight’: 0.05, ‘locref_huber_loss’: True, ‘optimizer’: ‘adam’, ‘intermediate_supervision’: False, ‘intermediate_supervision_layer’: 12, ‘regularize’: False, ‘weight_decay’: 0.0001, ‘crop_pad’: 0, ‘scoremap_dir’: ‘test’, ‘batch_size’: 4, ‘dataset_type’: ‘multi-animal-imgaug’, ‘deterministic’: False, ‘mirror’: False, ‘pairwise_huber_loss’: False, ‘weigh_only_present_joints’: False, ‘partaffinityfield_predict’: True, ‘pairwise_predict’: True, ‘all_joints’: [[0], [1], [2], [3], [4], [5], [6], [7], [8]], ‘all_joints_names’: [‘nose’, ‘lefteye’, ‘righteye’, ‘leftear’, ‘rightear’, ‘lefthand’, ‘righthand’, ‘leftelbow’, ‘rightelbow’], ‘cropratio’: 0.4, ‘dataset’: ‘training-datasets/iteration-0/UnaugmentedDataSet_DLCTest_multiDec2/DLCTest_multi_Marcus95shuffle1.pickle’, ‘display_iters’: 500, ‘init_weights’: ‘/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/models/pretrained/resnet_v1_50.ckpt’, ‘max_input_size’: 1500, ‘metadataset’: ‘training-datasets/iteration-0/UnaugmentedDataSet_DLCTest_multiDec2/Documentation_data-DLCTest_multi_95shuffle1.pickle’, ‘min_input_size’: 64, ‘multi_step’: [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]], ‘net_type’: ‘resnet_50’, ‘num_joints’: 9, ‘num_limbs’: 12, ‘pafwidth’: 20, ‘pairwise_loss_weight’: 0.1, ‘partaffinityfield_graph’: [[0, 1], [0, 2], [0, 3], [0, 4], [3, 4], [3, 1], [4, 2], [1, 2], [0, 7], [0, 8], [7, 5], [8, 6]], ‘pos_dist_thresh’: 17, ‘project_path’: ‘/home/marcus/DLC/DLCTest_multi-Marcus-2020-12-02’, ‘rotation’: 25, ‘rotratio’: 0.4, ‘save_iters’: 10000, ‘scale_jitter_lo’: 0.5, ‘scale_jitter_up’: 1.25}
Starting multi-animal training…
2020-12-03 15:02:25.935324: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-12-03 15:02:25.935353: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1334, in _do_call
return fn(*args)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([4,3,388,388]) filter shape([7,7,3,64])
[[{{node resnet_v1_50/conv1/Conv2D}}]]
[[{{node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/gui/train_network.py”, line 329, in train_network
maxiters=maxiters,
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py”, line 193, in train_network
raise e
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py”, line 176, in train_network
allow_growth=allow_growth,
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/train_multianimal.py”, line 217, in train
feed_dict={learning_rate: current_lr},
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 929, in run
run_metadata_ptr)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1328, in _do_run
run_metadata)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/client/session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([4,3,388,388]) filter shape([7,7,3,64])
[[node resnet_v1_50/conv1/Conv2D (defined at /home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:130) ]]
[[node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims (defined at /home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:323) ]]

Caused by op ‘resnet_v1_50/conv1/Conv2D’, defined at:
File “”, line 1, in
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/gui/launch_script.py”, line 66, in launch_dlc
app.MainLoop()
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/wx/core.py”, line 2166, in MainLoop
rv = wx.PyApp.MainLoop(self)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/gui/train_network.py”, line 329, in train_network
maxiters=maxiters,
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/training.py”, line 176, in train_network
allow_growth=allow_growth,
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/train_multianimal.py”, line 141, in train
losses = pose_net(cfg).train(batch)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py”, line 279, in train
heads = self.get_net(batch[Batch.inputs])
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py”, line 178, in get_net
net, end_points = self.extract_features(inputs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py”, line 130, in extract_features
im_centered, global_pool=False, output_stride=16, is_training=False
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py”, line 274, in resnet_v1_50
scope=scope)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py”, line 205, in resnet_v1
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope=‘conv1’)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/slim/python/slim/nets/resnet_utils.py”, line 146, in conv2d_same
scope=scope)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1155, in convolution2d
conv_dims=2)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1058, in convolution
outputs = layer.apply(inputs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py”, line 1227, in apply
return self.call(inputs, *args, **kwargs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/layers/base.py”, line 530, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py”, line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py”, line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py”, line 966, in call
return self.conv_op(inp, filter)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py”, line 591, in call
return self.call(inp, filter)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py”, line 208, in call
name=self.name)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py”, line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/framework/ops.py”, line 3300, in create_op
op_def=op_def)
File “/home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/tensorflow/python/framework/ops.py”, line 1801, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cuDNN launch failure : input shape([4,3,388,388]) filter shape([7,7,3,64])
[[node resnet_v1_50/conv1/Conv2D (defined at /home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:130) ]]
[[node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims (defined at /home/marcus/anaconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:323) ]]

Apologies if this is covered anywhere, I’ve been googling to little avail.

Marcus

OK I can confirm this IS a memory issue - when I run nvidia-smi at the same time it shows memory increasing to the GPU max within a couple of seconds. However I’ve tried reducing the batch size from 8 to 2 with no effect.

This is a relatively small GPU (6 g, still waiting on my Titan 24gb), but I’m not sure how to proceed. Apologies again for ignorance.

OK… looks like I’ve figured this out. After further googling, I set the allow_growth flag in train.py, training.py, and train_multianimal.py to True, and it’s now running, CPU usage is hovering between 10-15% while I’m consistently using ~48-4900mb of the 6gb GPU memory, and no crash reports yet.

(I’ve been running from the GUI, so I’m not sure which of those three training scripts is the important one. I know that doing it in train_multianimal.py is not enough, which I thought might be the case as I’m attempting a multianimal classification.)

2 Likes