Press "Enter" to skip to content

Kubeflow实战系列:利用TensorFlow Serving进行模型预测

本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.

Serving简介

TensorFlow Serving是Google开源的一个灵活的、高性能的机器学习模型服务系统,能够简化并加速从模型到生产应用的过程。它除了原生支持TensorFlow模型,还可以扩展支持其他类型的机器学习模型。

在前面的文章中,已经介绍了如何进行单机和分布式的模型训练,并且可以将训练的导出模型放置到分布式存储上。在本文中,会介绍模型如何被发布到TensorFlow Serving系统服务器端。并通过gRPC客户端提交请求,由服务端返回预测结果。

在分布式存储查看训练的模型

在前一篇文章中,我们已经将训练的模型导出到NAS上,可以先查看一下导出的模型。在serving的文件夹指定了模型的,即mnist名称;而mnist的下一层是模型的版本。

mkdir -p /nfsmount -t nfs -o vers=4.0 0fc844b526-rqx39.cn-hangzhou.nas.aliyuncs.com:/ /nfscd /nfstree servingserving└── mnist    └── 1        ├── saved_model.pb        └── variables            ├── variables.data-00000-of-00001            └── variables.index

在模型导出的章节中,已经在这个NAS存储上创建了对应的pv:tf--pv和pvc:tf-serving-pvc, 而TensorFlow Serving将从pvc中加载模型。

kubectl get pv tf-serving-pvNAME            CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                    STORAGECLASS   REASON    AGEtf-serving-pv   10Gi       RWX            Retain           Bound     default/tf-serving-pvc   nas                      2dkubectl get pvc tf-serving-pvcNAME             STATUS    VOLUME          CAPACITY   ACCESS MODES   STORAGECLASS   AGEtf-serving-pvc   Bound     tf-serving-pv   10Gi       RWX            nas            2d

利用Kubeflow启动TensorFlow Serving

# 创建TensorFlow Serving的namespaceexport NAMESPACE=default# 指定Kubeflow的版本VERSION=v0.2.0-rc.0APP_NAME=tf-serving# 初始化Kubeflow应用,并且将其namespace设置为default环境ks init ${APP_NAME} --api-spec=version:v1.9.3cd ${APP_NAME}ks env add ackks env set ack --namespace ${NAMESPACE}# 安装  模块ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflowks pkg install kubeflow/[email protected]${VERSION}# 指定配置TensorFlow Serving所需环境变量MODEL_COMPONENT=mnist-servingMODEL_NAME=mnistMODEL_PATH=/mnt/mnistMODEL_STORAGE_TYPE=nfsSERVING_PVC_NAME=tf-serving-pvcMODEL_SERVER_IMAGE=registry.aliyuncs.com/kubeflow-images-public/tensorflow-serving-1.7:v20180604-0da89b8a# 创建TensorFlow Serving的模板ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME}ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH}ks param set ${MODEL_COMPONENT} modelStorageType ${MODEL_STORAGE_TYPE}ks param set ${MODEL_COMPONENT} nfsPVC ${SERVING_PVC_NAME}ks param set ${MODEL_COMPONENT} modelServerImage $MODEL_SERVER_IMAGE # 设置tf-servingks param set ${MODEL_COMPONENT} cloud ack# 如果需要暴露对外部系统的服务ks param set ${MODEL_COMPONENT} serviceType LoadBalancer# 如果使用GPU, 请使用以下配置NUMGPUS=1ks param set ${MODEL_COMPONENT} numGpus ${NUMGPUS}MODEL_GPU_SERVER_IMAGE=registry.aliyuncs.com/kubeflow-images-public/tensorflow-serving-1.6gpu:v20180604-0da89b8aks param set ${MODEL_COMPONENT} modelServerImage $MODEL_SERVER_IMAGEks apply ack -c mnist-serving

部署完成后可以通过kubectl get deploy查询到TensorFlow Serving运行状态

# kubectl get deploy -lapp=$MODEL_NAMENAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGEmnist-v1   1         1         1            1           4m

查看TensorFlow Serving运行日志,发现模型已经加载

2018-06-19 06:50:19.185785: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2018-06-19 06:50:19.202907: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:161] Restoring SavedModel bundle.2018-06-19 06:50:19.418625: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:196] Running LegacyInitOp on SavedModel bundle.2018-06-19 06:50:19.425357: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: success. Took 550707 microseconds.2018-06-19 06:50:19.430435: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: mnist version: 1}

以及对外的暴露的服务ip和端口

kubectl get svc -lapp=$MODEL_NAMENAME      TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)                         AGEmnist     LoadBalancer   172.19.4.241   xx.xx.xx.xx   9000:32697/TCP,8000:32166/TCP   7m

这里可以看到gRPC对外服务ip为xx.xx.xx.xx,对外服务的端口为9000

使用gRPC客户端访问TensorFlow Serving

通过kubectl run运行gRPC客户端, 并且点击回车,登录到Pod里

kubectl run -i --tty mnist-client --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/tf-mnist-client-demo --restart=Never --command -- /bin/bashIf you don't see a command prompt, try pressing enter.

运行客户端python代码:

# export TF_MNIST_IMAGE_PATH=1.png# export TF_MODEL_SERVER_HOST=172.19.4.241# python mnist_client.py/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.  from ._conv import register_converters as _register_convertersoutputs {  key: "scores"  value {    dtype: DT_FLOAT    tensor_shape {      dim {        size: 1      }      dim {        size: 10      }    }    float_val: 1.0    float_val: 0.0    float_val: 9.85347854001e-34    float_val: 1.00954509814e-35    float_val: 0.0    float_val: 0.0    float_val: 1.5053762612e-14    float_val: 0.0    float_val: 5.21842267799e-22    float_val: 0.0  }}..............................................................[email protected]@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@@[email protected]@@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@[email protected]@@...........................................................................................................................Your model says the above number is... 1!

这样我们训练导出的模型,就可以直接通过gRPC的客户端访问了,从而实现在线预测。结合前面的文章,我们已经介绍了从深度学习的模型训练,模型导出到模型部署上线的全路径通路。

删除TensorFlow Serving

ks delete ack -c mnist-serving

总结

这个例子介绍了如何通过Kubeflow部署TensorFlow Serving, 并且加载阿里云NAS上存储的模型,并且提供模型预测服务。

Kubeflow部署机器学习应用非常简单,但是只有应用层的简便是不够的;云端基础设施的自动化集成也是非常重要的,比如GPU/NAS/OSS以及负载均衡的无缝调用,而这方面使用阿里云Kubernetes容器服务可以大幅度降低数据科学家的模型交付难度。

Be First to Comment

发表评论

电子邮件地址不会被公开。 必填项已用*标注