Press "Enter" to skip to content

linux服务器上配置进行kaggle比赛的深度学习tensorflow keras环境详细教程

本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.

本文首发于个人博客 https://kezunlin.me/post/6b505d27/
,欢迎阅读最新内容!

 

full guide tutorial to install and configure deep learning environments on linux server

 

<!–more–>

 

Quick Guide

 

prepare

 

tools

MobaXterm (for windows)
ssh + vscode

for windows:

 

drop files to MobaXterm to upload to server

 

use zip
format

 

commands

 

view disk

 

du -d 1 -h
df -h

 

gpu and cpu usage

 

watch -n 1 nvidia-smi
top

 

view files and count

 

wc -l data.csv
# count how many folders
ls -lR | grep '^d' | wc -l
17
# count how many jpg files
ls -lR | grep '.jpg' | wc -l
1360
# view 10 images 
ls train | head
ls test | head

 

link datasets

 

# link 
ln -s srt dest
ln -s /data_1/kezunlin/datasets/ dl4cv/datasets

 

scp

 

scp -r node17:~/dl4cv  ~/git/
scp -r node17:~/.keras ~/

 

tmux for background tasks

 

tmux new -s notebook
tmux ls 
tmux attach -t notebook
tmux detach

 

wget download

 

# wget 
# continue donwload
wget -c url 
# background donwload for large file
wget -b -c url
tail -f wget-log
# kill background wget
pkill -9 wget

 

tips about training large model

 

terminal 1:

 

tmux new -s train
conda activate keras
time python train_alexnet.py

 

terminal 2:

 

tmux detach
tmux attach -t train

 

and then close vscode, otherwise bash training process will exit when we close vscode.

 

cuda driver and toolkits

 

see cuda-toolkit for cuda driver version

 

cudatookit version depends on cuda driver version.

 

install nvidia-drivers

 

sudo add-apt-repository ppa:graphics-drivers/ppa
sudp apt-get update
sudo apt-cache search nvidia-*
# nvidia-384
# nvidia-396
sudo apt-get -y install nvidia-418
# test 
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

 

reboot to test again

https://stackoverflow.com/que…

install cuda-toolkit(dirvers)

 

remove all previous nvidia drivers

 

sudo apt-get -y pruge nvidia-*

 

go to here
and download cuda_10.1

 

wget -b -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run
sudo ./cuda_10.1.243_418.87.00_linux.run
vim .bashrc
# for cuda and cudnn
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

 

check cuda driver version

 

> cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  418.87.00  Thu Aug  8 15:35:46 CDT 2019
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 

>nvidia-smi
Tue Aug 27 17:36:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+

> nvidia-smi -L
GPU 0: Quadro RTX 8000 (UUID: GPU-acb01c1b-776d-cafb-ea35-430b3580d123)
GPU 1: Quadro RTX 8000 (UUID: GPU-df7f0fb8-1541-c9ce-e0f8-e92bccabf0ef)
GPU 2: Quadro RTX 8000 (UUID: GPU-67024023-20fd-a522-dcda-261063332731)
GPU 3: Quadro RTX 8000 (UUID: GPU-7f9d6a27-01ec-4ae5-0370-f0c356327913)
> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

 

<script async src=”https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js”></script>

 

<!– kzl in-article ad –>

 

<ins

 

style="display:block; text-align:center;"
 data-ad-layout="in-article"
 data-ad-format="fluid"
 data-ad-client="ca-pub-5653382914441020"
 data-ad-slot="7925631830"></ins>

 

<script>

 

(adsbygoogle = window.adsbygoogle || []).push({});

 

</script>

 

install conda

 

./Anaconda3-2019.03-Linux-x86_64.sh 
[yes]
[yes]

 

config channels

 

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes

 

install libraries

 

conclusions:

py37/keras: conda install -y tensorflow-gpu keras==2.2.5
py37/torch: conda install -y pytorch torchvision
py36/mxnet: conda install -y mxnet

keras 2.2.5 was released on 2019/8/23.

 

common libraries

 

conda install -y scikit-learn scikit-image pandas matplotlib pillow opencv seaborn
pip install imutils progressbar pydot pylint

 

pip install imutils
to avoid downgrade for tensorflow-gpu

 

py37

 

cudatoolkit               10.0.130                  0    
cudnn                     7.6.0                cuda10.0_0    
tensorflow-gpu            1.13.1

 

py36

 

cudatoolkit        anaconda/pkgs/main/linux-64::cudatoolkit-10.1.168-0
cudnn              anaconda/pkgs/main/linux-64::cudnn-7.6.0-cuda10.1_0
tensorboard        anaconda/pkgs/main/linux-64::tensorboard-1.14.0-py36hf484d3e_0
tensorflow         anaconda/pkgs/main/linux-64::tensorflow-1.14.0-gpu_py36h3fb9ad6_0
tensorflow-base    anaconda/pkgs/main/linux-64::tensorflow-base-1.14.0-gpu_py36he45bfe2_0
tensorflow-estima~ anaconda/cloud/conda-forge/linux-64::tensorflow-estimator-1.14.0-py36h5ca1d4c_0
tensorflow-gpu     anaconda/pkgs/main/linux-64::tensorflow-gpu-1.14.0-h0d30ee6_0

 

imutils only support 36 and 37.

 

details

 

# remove py35
conda remove -n py35 --all
conda info --envs
conda create -n py37 python==3.7
conda activate py37
# common libraries
conda install -y scikit-learn pandas pillow opencv
pip install imutils
# imutils
conda search imutils  
# py36 and py37
# Name                       Version           Build  Channel             
imutils                        0.5.2          py27_0  anaconda/cloud/conda-forge
imutils                        0.5.2          py36_0  anaconda/cloud/conda-forge
imutils                        0.5.2          py37_0  anaconda/cloud/conda-forge
# tensorflow-gpu and keras
conda install -y tensorflow-gpu keras
# install pytorch
conda install -y pytorch torchvision
# install mxnet
# method 1: pip
pip search mxnet
mxnet-cu80[mkl]/mxnet-cu90[mkl]/mxnet-cu91[mkl]/mxnet-cu92[mkl]/mxnet-cu100[mkl]/mxnet-cu101[mkl]
# method 2: conda
conda install mxnet
# py35 and py36

 

TensorFlow Object Detection API

 

home page: home page

 

download tensorflow models
and rename models-master
to tfmodels

 

vim ~/.bashrc

 

export PYTHONPATH=/home/kezunlin/dl4cv:/data_1/kezunlin/tfmodels/research:$PYTHONPATH

 

source ~/.bashrc

 

jupyter notebook

 

conda activate py37
conda install -y jupyter

 

install kernels

 

python -m ipykernel install --user --name=py37
Installed kernelspec py37 in /home/kezunlin/.local/share/jupyter/kernels/py37

 

config for server

 

python -c "import IPython;print(IPython.lib.passwd())"
Enter password: 
Verify password: 
sha1:ef2fb2aacff2:4ea2998699638e58d10d594664bd87f9c3381c04
jupyter notebook --generate-config
Writing default config to: /home/kezunlin/.jupyter/jupyter_notebook_config.py
vim .jupyter/jupyter_notebook_config.py
c.NotebookApp.ip = '*'  
c.NotebookApp.password = u'sha1:xxx:xxx' 
c.NotebookApp.open_browser = False 
c.NotebookApp.port = 8888 
c.NotebookApp.enable_mathjax = True

 

run jupyter on background

 

tmux new -s notebook
jupyter notebook
# ctlr+b+d exit session and DO NOT close session
# ctlr+d exit session and close session

 

access web
and input password

 

test

 

py37

 

import cv2
cv2.__version
import tensorflow as tf
import keras 
import torch
import torchvision

 

cat .keras/keras.json

 

{
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

 

py36

 

import mxnet

 

train demo

 

export

 

# use CPU only
export CUDA_VISIBLE_DEVICES=""
# use gpu 0 1
export CUDA_VISIBLE_DEVICES="0,1"

 

code

 

import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"

 

start train

 

python train.py

 

./keras folder

 

view keras models and datasets

 

ls .keras/
datasets  keras.json  models

 

models saved to /home/kezunlin/.keras/models/

datasets saved to /home/kezunlin/.keras/datasets/

 

models lists

vgg16

vgg19

resnet50

inceptionv3

xception

xxx_kernels_notop.h5
for include_top = False

xxx_kernels.h5
for include_top = True

 

<script async src=”https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js”></script>

 

<!– kzl in-article ad –>

 

<ins

 

style="display:block; text-align:center;"
 data-ad-layout="in-article"
 data-ad-format="fluid"
 data-ad-client="ca-pub-5653382914441020"
 data-ad-slot="7925631830"></ins>

 

<script>

 

(adsbygoogle = window.adsbygoogle || []).push({});

 

</script>

 

Datasets

 

mnist

 

cifar10

 

to skip download

 

wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
mv ~/Download/cifar-10-python.tar.gz ~/.keras/datasets/cifar-10-batches-py.tar.gz

 

to load data

 

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

 

flowers-17

offical flowers-17

data

train.csv

test.csv

animals

cat dog panda

panda images are WRONG !!!

 

counts

 

ls -lR animals/cat | grep ".jpg" | wc -l
1000
ls -lR animals/dog | grep ".jpg" | wc -l
1000
ls -lR animals/panda | grep ".jpg" | wc -l
1000

 

kaggle cats vs dogs

dogs-vs-cats

caltech101

caltech101

download background

 

wget -b -c http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz

 

Kaggle API

 

install and config

 

see kaggle-api

 

conda activate keras
conda install kaggle
# download kaggle.json
mv kaggle.json ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
cat kaggle.json
{"username":"xxx","key":"yyy"}

 

or by export

 

export KAGGLE_USERNAME=xxx
export KAGGLE_KEY=yyy

 

tips

 

 

    1. go to

account

    1. and select ‘Create API Token’ and

keras.json

    1. will be downloaded.

 

    1. Ensure

kaggle.json

    1. is in the location

~/.kaggle/kaggle.json

    1. to use the API.

 

 

check version

 

kaggle --version
Kaggle API 1.5.5

 

commands overview

 

commands

 

kaggle competitions {list, files, download, submit, submissions, leaderboard}
kaggle datasets {list, files, download, create, version, init}
kaggle kernels {list, init, push, pull, output, status}
kaggle config {view, set, unset}

 

download datasets

 

kaggle competitions download -c dogs-vs-cats

 

show leaderboard

 

kaggle competitions leaderboard dogs-vs-cats --show
teamId  teamName                           submissionDate       score    
------  ---------------------------------  -------------------  -------  
71046  Pierre Sermanet                    2014-02-01 21:43:19  0.98533  
66623  Maxim Milakov                      2014-02-01 18:20:58  0.98293  
72059  Owen                               2014-02-01 17:04:40  0.97973  
74563  Paul Covington                     2014-02-01 23:05:20  0.97946  
74298  we've been in KAIST                2014-02-01 21:15:30  0.97840  
71949  orchid                             2014-02-01 23:52:30  0.97733

 

set default competition

 

kaggle config set --name competition --value dogs-vs-cats
- competition is now set to: dogs-vs-cats
kaggle config set --name competition --value dogs-vs-cats-redux-kernels-edition

 

dogs-vs-cats

 

submit

 

kaggle c submissions
- Using competition: dogs-vs-cats
- No submissions found
kaggle c submit -f ./submission.csv -m "first submit"

 

competition has already ended, so can not submit.

 

Nvidia-docker and containers

 

install

 

sudo apt-get -y install docker
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

 

restart (optional)

 

cat /etc/docker/daemon.json

 

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

 

sudo systemctl enable docker
sudo systemctl start docker

 

if errors occur:

 

Job for docker.service failed because the control process exited with error code.

 

See “systemctl status docker.service” and “journalctl -xe” for details.

 

check /etc/docker/daemon.json

 

test

 

sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi
Thu Aug 29 00:11:32 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     Off  | 00000000:02:00.0 Off |                  Off |
| 43%   67C    P2   136W / 260W |  46629MiB / 48571MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 8000     Off  | 00000000:03:00.0 Off |                  Off |
| 34%   54C    P0    74W / 260W |      0MiB / 48571MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Quadro RTX 8000     Off  | 00000000:82:00.0 Off |                  Off |
| 34%   49C    P0    73W / 260W |      0MiB / 48571MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Quadro RTX 8000     Off  | 00000000:83:00.0 Off |                  Off |
| 33%   50C    P0    73W / 260W |      0MiB / 48571MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                            
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

 

add user to docker
group, and no need to use sudo docker xxx

 

command refs

 

sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi
sudo nvidia-docker -t -i --privileged nvidia/cuda bash
sudo docker run -it --name kzl -v /home/kezunlin/workspace/:/home/kezunlin/workspace nvidia/cuda

 

Reference

 

History

20190821: created.

Copyright

Post author: kezunlin

Post link: https://kezunlin.me/post/6b505d27/

Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

Be First to Comment

发表评论

邮箱地址不会被公开。 必填项已用*标注