Train network error - DLC2.2b7

Hello,

I have used DLCma before and this error popped up while trying to train the network on my Win 10 GPU. Until this point everything else had run smoothly. Any tips would be great! -

thanks

Kyle

The training dataset is successfully created. Use the function ‘train_network’ to start training. Happy training!
Selecting multi-animal trainer
Config:
{‘all_joints’: [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]],
‘all_joints_names’: [‘LeftEye’,
‘RightEye’,
‘SwimBladder’,
‘Tail1’,
‘Tail2’,
‘Tail3’,
‘Tail4’,
‘TLcorner’,
‘TRcorner’,
‘BLcorner’,
‘BRcorner’],
‘batch_size’: 8,
‘bottomheight’: 400,
‘crop’: True,
‘crop_pad’: 0,
‘cropratio’: 0.4,
‘dataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_5LZF_MODEL_ShortVidsJul8\5LZF_MODEL_ShortVids_Kyle95shuffle1.pickle’,
‘dataset_type’: ‘multi-animal-imgaug’,
‘deterministic’: False,
‘display_iters’: 500,
‘fg_fraction’: 0.25,
‘global_scale’: 0.8,
‘init_weights’: ‘C:\Users\Fish_Behavior\.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt’,
‘intermediate_supervision’: False,
‘intermediate_supervision_layer’: 12,
‘leftwidth’: 400,
‘location_refinement’: True,
‘locref_huber_loss’: True,
‘locref_loss_weight’: 0.05,
‘locref_stdev’: 7.2801,
‘log_dir’: ‘log’,
‘max_input_size’: 1500,
‘mean_pixel’: [123.68, 116.779, 103.939],
‘metadataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_5LZF_MODEL_ShortVidsJul8\Documentation_data-5LZF_MODEL_ShortVids_95shuffle1.pickle’,
‘min_input_size’: 64,
‘minsize’: 100,
‘mirror’: False,
‘multi_step’: [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]],
‘net_type’: ‘resnet_50’,
‘num_joints’: 11,
‘num_limbs’: 14,
‘optimizer’: ‘adam’,
‘pafwidth’: 20,
‘pairwise_huber_loss’: False,
‘pairwise_loss_weight’: 0.1,
‘pairwise_predict’: False,
‘partaffinityfield_graph’: [[0, 1],
[1, 2],
[4, 5],
[5, 6],
[1, 4],
[0, 6],
[2, 4],
[2, 3],
[0, 5],
[1, 6],
[0, 4],
[0, 3],
[3, 4],
[0, 2]],
‘partaffinityfield_predict’: True,
‘pos_dist_thresh’: 17,
‘project_path’: ‘C:\Users\Fish_Behavior\Desktop\DLCwork\5LZF_MODEL_ShortVids-Kyle-2020-07-08’,
‘regularize’: False,
‘rightwidth’: 400,
‘save_iters’: 10000,
‘scale_jitter_lo’: 0.5,
‘scale_jitter_up’: 1.25,
‘scoremap_dir’: ‘test’,
‘shuffle’: True,
‘snapshot_prefix’: ‘C:\Users\Fish_Behavior\Desktop\DLCwork\5LZF_MODEL_ShortVids-Kyle-2020-07-08\dlc-models\iteration-0\5LZF_MODEL_ShortVidsJul8-trainset95shuffle1\train\snapshot’,
‘stride’: 8.0,
‘topheight’: 400,
‘weigh_negatives’: False,
‘weigh_only_present_joints’: False,
‘weigh_part_predictions’: False,
‘weight_decay’: 0.0001}
Activating limb prediction…
Starting with multi-animal imaug + adam pose-dataset loader.
Batch Size is 8
Getting specs multi-animal-imgaug 14 11
Initializing ResNet
Loading ImageNet-pretrained resnet_50
2020-07-16 16:52:57.387377: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-07-16 16:52:57.600334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.03GiB
2020-07-16 16:52:57.608316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-07-16 16:52:59.887774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-16 16:52:59.891317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-07-16 16:52:59.893447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-07-16 16:52:59.898204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8697 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
Max_iters overwritten as 100000
Display_iters overwritten as 1000
Save_iters overwritten as 5000
Training parameters:
{‘stride’: 8.0, ‘weigh_part_predictions’: False, ‘weigh_negatives’: False, ‘fg_fraction’: 0.25, ‘mean_pixel’: [123.68, 116.779, 103.939], ‘shuffle’: True, ‘snapshot_prefix’: ‘C:\Users\Fish_Behavior\Desktop\DLCwork\5LZF_MODEL_ShortVids-Kyle-2020-07-08\dlc-models\iteration-0\5LZF_MODEL_ShortVidsJul8-trainset95shuffle1\train\snapshot’, ‘log_dir’: ‘log’, ‘global_scale’: 0.8, ‘location_refinement’: True, ‘locref_stdev’: 7.2801, ‘locref_loss_weight’: 0.05, ‘locref_huber_loss’: True, ‘optimizer’: ‘adam’, ‘intermediate_supervision’: False, ‘intermediate_supervision_layer’: 12, ‘regularize’: False, ‘weight_decay’: 0.0001, ‘mirror’: False, ‘crop_pad’: 0, ‘scoremap_dir’: ‘test’, ‘batch_size’: 8, ‘dataset_type’: ‘multi-animal-imgaug’, ‘deterministic’: False, ‘weigh_only_present_joints’: False, ‘pairwise_huber_loss’: False, ‘partaffinityfield_predict’: True, ‘pairwise_predict’: True, ‘crop’: True, ‘cropratio’: 0.4, ‘minsize’: 100, ‘leftwidth’: 400, ‘rightwidth’: 400, ‘topheight’: 400, ‘bottomheight’: 400, ‘all_joints’: [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]], ‘all_joints_names’: [‘LeftEye’, ‘RightEye’, ‘SwimBladder’, ‘Tail1’, ‘Tail2’, ‘Tail3’, ‘Tail4’, ‘TLcorner’, ‘TRcorner’, ‘BLcorner’, ‘BRcorner’], ‘dataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_5LZF_MODEL_ShortVidsJul8\5LZF_MODEL_ShortVids_Kyle95shuffle1.pickle’, ‘display_iters’: 500, ‘init_weights’: ‘C:\Users\Fish_Behavior\.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\models\pretrained\resnet_v1_50.ckpt’, ‘max_input_size’: 1500, ‘metadataset’: ‘training-datasets\iteration-0\UnaugmentedDataSet_5LZF_MODEL_ShortVidsJul8\Documentation_data-5LZF_MODEL_ShortVids_95shuffle1.pickle’, ‘min_input_size’: 64, ‘multi_step’: [[0.0001, 7500], [5e-05, 12000], [1e-05, 200000]], ‘net_type’: ‘resnet_50’, ‘num_joints’: 11, ‘num_limbs’: 14, ‘pafwidth’: 20, ‘pairwise_loss_weight’: 0.1, ‘partaffinityfield_graph’: [[0, 1], [1, 2], [4, 5], [5, 6], [1, 4], [0, 6], [2, 4], [2, 3], [0, 5], [1, 6], [0, 4], [0, 3], [3, 4], [0, 2]], ‘pos_dist_thresh’: 17, ‘project_path’: ‘C:\Users\Fish_Behavior\Desktop\DLCwork\5LZF_MODEL_ShortVids-Kyle-2020-07-08’, ‘save_iters’: 10000, ‘scale_jitter_lo’: 0.5, ‘scale_jitter_up’: 1.25}
Starting multi-animal training…
2020-07-16 16:53:13.691604: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.20GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-07-16 16:53:13.702737: E tensorflow/stream_executor/cuda/cuda_dnn.cc:82] The primary convolution algorithm failed memory allocation, while a secondary algorithm is not provided.
Traceback (most recent call last):
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1334, in _do_call
return fn(*args)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([32,512,28,28]) filter shape([3,3,512,512])
[[{{node resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D}}]]
[[{{node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\train_network.py”, line 329, in train_network
maxiters=maxiters,
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\training.py”, line 193, in train_network
raise e
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\training.py”, line 176, in train_network
allow_growth=allow_growth,
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\train_multianimal.py”, line 217, in train
feed_dict={learning_rate: current_lr},
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 929, in run
run_metadata_ptr)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1328, in _do_run
run_metadata)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\client\session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([32,512,28,28]) filter shape([3,3,512,512])
[[node resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D (defined at C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py:130) ]]
[[node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat (defined at C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py:323) ]]

Caused by op ‘resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D’, defined at:
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\runpy.py”, line 85, in run_code
exec(code, run_globals)
File "C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut_main
.py", line 20, in
deeplabcut.launch_dlc()
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\launch_script.py”, line 66, in launch_dlc
app.MainLoop()
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\wx\core.py”, line 2166, in MainLoop
rv = wx.PyApp.MainLoop(self)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\label_frames.py”, line 151, in label_frames
deeplabcut.label_frames(self.config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\trainingsetmanipulation.py”, line 429, in label_frames
multiple_individuals_labeling_toolbox.show(config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\multiple_individuals_labeling_toolbox.py”, line 1234, in show
app.MainLoop()
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\wx\core.py”, line 2166, in MainLoop
rv = wx.PyApp.MainLoop(self)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\label_frames.py”, line 151, in label_frames
deeplabcut.label_frames(self.config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\trainingsetmanipulation.py”, line 429, in label_frames
multiple_individuals_labeling_toolbox.show(config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\multiple_individuals_labeling_toolbox.py”, line 1234, in show
app.MainLoop()
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\wx\core.py”, line 2166, in MainLoop
rv = wx.PyApp.MainLoop(self)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\label_frames.py”, line 151, in label_frames
deeplabcut.label_frames(self.config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\trainingsetmanipulation.py”, line 429, in label_frames
multiple_individuals_labeling_toolbox.show(config)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\generate_training_dataset\multiple_individuals_labeling_toolbox.py”, line 1234, in show
app.MainLoop()
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\wx\core.py”, line 2166, in MainLoop
rv = wx.PyApp.MainLoop(self)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\gui\train_network.py”, line 329, in train_network
maxiters=maxiters,
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\training.py”, line 176, in train_network
allow_growth=allow_growth,
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\train_multianimal.py”, line 141, in train
losses = pose_net(cfg).train(batch)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py”, line 279, in train
heads = self.get_net(batch[Batch.inputs])
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py”, line 178, in get_net
net, end_points = self.extract_features(inputs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py”, line 130, in extract_features
im_centered, global_pool=False, output_stride=16, is_training=False
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\resnet_v1.py”, line 274, in resnet_v1_50
scope=scope)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\resnet_v1.py”, line 207, in resnet_v1
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\resnet_utils.py”, line 211, in stack_blocks_dense
net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\resnet_v1.py”, line 119, in bottleneck
residual, depth_bottleneck, 3, stride, rate=rate, scope=‘conv2’)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\resnet_utils.py”, line 131, in conv2d_same
scope=scope)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py”, line 1155, in convolution2d
conv_dims=2)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py”, line 1058, in convolution
outputs = layer.apply(inputs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\keras\engine\base_layer.py”, line 1227, in apply
return self.call(inputs, *args, **kwargs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\layers\base.py”, line 530, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\keras\engine\base_layer.py”, line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\keras\layers\convolutional.py”, line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\ops\nn_ops.py”, line 966, in call
return self.conv_op(inp, filter)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\ops\nn_ops.py”, line 591, in call
return self.call(inp, filter)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\ops\nn_ops.py”, line 576, in _with_space_to_batch_call
result = self.op(input_converted, filter)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\ops\nn_ops.py”, line 208, in call
name=self.name)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py”, line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\framework\op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\util\deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\framework\ops.py”, line 3300, in create_op
op_def=op_def)
File “C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\tensorflow\python\framework\ops.py”, line 1801, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cuDNN launch failure : input shape([32,512,28,28]) filter shape([3,3,512,512])
[[node resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/Conv2D (defined at C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py:130) ]]
[[node mean_squared_error/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat (defined at C:\Users\Fish_Behavior.conda\envs\dlc-WindowsGPU\lib\site-packages\deeplabcut\pose_estimation_tensorflow\nnet\pose_net.py:323) ]]