最近在使用Jetson TX2 在跑实验,然后遇到下面问题,做笔记,记录一下。
内存出错无法,中断
# 出现下面那种错误
2019-01-11 19:41:46.959970: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2019-01-11 19:41:46.960033: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x367c800: CUDA_ERROR_LAUNCH_FAILED
2019-01-11 19:41:46.960059: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x367c800: CUDA_ERROR_LAUNCH_FAILED
2019-01-11 19:41:46.960185: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
[1] 10332 abort (core dumped) python monodepth_simple.py --image_path ./data/training/image_2/000000_10.jpg
解决办法:
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True #加多这一行
sess = tf.Session(config=config)
出现failed to alloc 2304 bytes on host: CUDA_ERROR_UNKNOWN错误
- 跟上面的解决办法一样,使用
config.gpu_options.allow_growth = True
nvidia-smi无法使用
使用sudo ~/tegrastats
,GR3D
表示GPU