bert-as-service 用 BERT 作为句子编码器, 并通过 ZeroMQ 服务托管, 只需两行代码就可以将句子映射成固定长度的向量表示;
准备
windows10 + python3.5 + tensorflow1.2.1
安装流程
- 安装 tensorflow,
- 安装 bert-as-service
bert-as-service, 依赖于 python≥3.5 AND tensorflow≥1.10 ;
pip install bert-serving-server pip instlal bert-serving-client
下载中文 bert 预训练的模型
BERT-Base, Uncased | 12-layer, 768-hidden, 12-heads, 110M parameters |
---|---|
BERT-Large, Uncased | 24-layer, 1024-hidden, 16-heads, 340M parameters |
BERT-Base, Cased | 12-layer, 768-hidden, 12-heads , 110M parameters |
BERT-Large, Cased | 24-layer, 1024-hidden, 16-heads, 340M parameters |
BERT-Base, Multilingual Cased (New) | 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters |
BERT-Base, Multilingual Cased (Old) | 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters |
BERT-Base, Chinese | Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters |
启动 bert-as-serving 服务
bert-serving-start -model_dir /tmp/english_L-12_H-768_A-12/ -num_worker=2 //模型路径自改 usage: xxxx\Anaconda3\envs\py35\Scripts\bert-serving-start -model_dir D:\env\bert\chinese_L-12_H-768_A-12 -num_worker=2 ARG VALUE __________________________________________________ ckpt_name = bert_model.ckpt config_name = bert_config.json cors = * cpu = False device_map = [] do_lower_case = True fixed_embed_length = False fp16 = False gpu_memory_fraction = 0.5 graph_tmp_dir = None http_max_connect = 10 http_port = None mask_cls_sep = False max_batch_size = 256 max_seq_len = 25 model_dir = D:\env\bert\chinese_L-12_H-768_A-12 no_position_embeddings = False no_special_token = False num_worker = 2 pooling_layer = [-2] pooling_strategy = REDUCE_MEAN port = 5555 port_out = 5556 prefetch_size = 10 priority_batch_size = 16 show_tokens_to_client = False tuned_model_dir = None verbose = False xla = False I:[35mVENTILATOR[0m:freeze, optimize and export graph, could take a while... I:[36mGRAPHOPT[0m:model config: D:\env\bert\chinese_L-12_H-768_A-12\bert_config.json I:[36mGRAPHOPT[0m:checkpoint: D:\env\bert\chinese_L-12_H-768_A-12\bert_model.ckpt I:[36mGRAPHOPT[0m:build graph... I:[36mGRAPHOPT[0m:load parameters from checkpoint... I:[36mGRAPHOPT[0m:optimize... I:[36mGRAPHOPT[0m:freeze... I:[36mGRAPHOPT[0m:write graph to a tmp file: C:\Users\Memento\AppData\Local\Temp\tmpo07002um I:[35mVENTILATOR[0m:bind all sockets I:[35mVENTILATOR[0m:open 8 ventilator-worker sockets I:[35mVENTILATOR[0m:start the sink I:[32mSINK[0m:ready I:[35mVENTILATOR[0m:get devices W:[35mVENTILATOR[0m:no GPU available, fall back to CPU I:[35mVENTILATOR[0m:device map: worker 0 -> cpu worker 1 -> cpu I:[33mWORKER-0[0m:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um I:[33mWORKER-1[0m:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um I:[33mWORKER-0[0m:ready and listening! I:[33mWORKER-1[0m:ready and listening! I:[35mVENTILATOR[0m:all set, ready to serve request!
- 用 python 模拟调用 bert-as-service 服务
bc = BertClient(ip="localhost", check_version=False, check_length=False) vec = bc.encode(['你好', '你好呀', '我很好']) print(vec)
输出结果:
[[ 0.2894022 -0.13572647 0.07591158 ... -0.14091237 0.54630077 -0.30118054] [ 0.4535432 -0.03180456 0.3459639 ... -0.3121457 0.42606848 -0.50814617] [ 0.6313594 -0.22302179 0.16799903 ... -0.1614125 0.23098437 -0.5840646 ]]
亮点
:telescope: State-of-the-art : build on pretrained 12/24-layer BERT models released by Google AI, which is considered as a milestone in the NLP community.
:hatching_chick: Easy-to-use : require only two lines of code to get sentence/token-level encodes.
:zap: Fast : 900 sentences/s on a single Tesla M40 24GB. Low latency, optimized for speed. See benchmark .
:octopus: Scalable : scale nicely and smoothly on multiple GPUs and multiple clients without worrying about concurrency. See benchmark .
:gem: Reliable : tested on multi-billion sentences; days of running without a break or OOM or any nasty exceptions.
可视化监控
启动服务时加入参数 -http_port 8081
即可通过 8081 端口对外提供查询服务;
请求 http://localhost:8081/status/server
可以查看到服务的状态:
{ "ckpt_name": "bert_model.ckpt", "client": "7a033047-f177-45fd-9ef5-45781b10d322", "config_name": "bert_config.json", "cors": "*", "cpu": false, "device_map": [], "do_lower_case": true, "fixed_embed_length": false, "fp16": false, "gpu_memory_fraction": 0.5, "graph_tmp_dir": null, "http_max_connect": 10, "http_port": 8081, "mask_cls_sep": false, "max_batch_size": 256, "max_seq_len": 25, "model_dir": "D:\\env\\bert\\chinese_L-12_H-768_A-12", "no_position_embeddings": false, "no_special_token": false, "num_concurrent_socket": 8, "num_process": 3, "num_worker": 1, "pooling_layer": [ -2 ], "pooling_strategy": 2, "port": 5555, "port_out": 5556, "prefetch_size": 10, "priority_batch_size": 16, "python_version": "3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:05:27) [MSC v.1900 64 bit (AMD64)]", "pyzmq_version": "20.0.0", "server_current_time": "2021-03-03 15:53:03.859211", "server_start_time": "2021-03-03 10:00:21.128310", "server_version": "1.10.0", "show_tokens_to_client": false, "statistic": { "avg_last_two_interval": 1665.306127225, "avg_request_per_client": 8.333333333333334, "avg_request_per_second": 0.09246377980293276, "avg_size_per_request": 102.58333333333333, "max_last_two_interval": 17484.7365829, "max_request_per_client": 53, "max_request_per_second": 0.9194538223647459, "max_size_per_request": 601, "min_last_two_interval": 1.087602199997491, "min_request_per_client": 2, "min_request_per_second": 0.00005719274038008647, "min_size_per_request": 1, "num_active_client": 0, "num_data_request": 12, "num_max_last_two_interval": 1, "num_max_request_per_client": 1, "num_max_request_per_second": 1, "num_max_size_per_request": 1, "num_min_last_two_interval": 1, "num_min_request_per_client": 6, "num_min_request_per_second": 1, "num_min_size_per_request": 1, "num_sys_request": 63, "num_total_client": 9, "num_total_request": 75, "num_total_seq": 1231 }, "status": 200, "tensorflow_version": [ "1", "10", "0" ], "tuned_model_dir": null, "ventilator -> worker": [ "tcp://127.0.0.1:52440", "tcp://127.0.0.1:52441", "tcp://127.0.0.1:52442", "tcp://127.0.0.1:52443", "tcp://127.0.0.1:52444", "tcp://127.0.0.1:52445", "tcp://127.0.0.1:52446", "tcp://127.0.0.1:52447" ], "ventilator <-> sink": "tcp://127.0.0.1:52439", "verbose": false, "worker -> sink": "tcp://127.0.0.1:52467", "xla": false, "zmq_version": "4.3.3" }
然后做个可视化的前端呈现数据即可, 也可以直接使用 bert-as-service 项目里的 plugin/dashboard ;
参考:
https://github.com/hanxiao/bert-as-service#monitoring-the-service-status-in-a-dashboard
https://bert-as-service.readthedocs.io/en/latest/tutorial/add-monitor.html
QA
Q: 启动 bert-as-service 服务提示缺少 cudart64_100.dll
dll 文件
A: 从 网上 下载个 dll 文件, 然后放置在 C:\Windows\System32
目录下, 重新启动命令行窗口执行命令即可;
Q: fail to optimize the graph!, TypeError: cannot unpack non-iterable NoneType object
A: 降级安装 TF 1.10.0 版本; 确认 model 路径是绝对路径;
pip uninstall tensorflow pip uninstall tensorflow-estimator conda install --channel https://conda.anaconda.org/aaronzs tensorflow
参考:
https://github.com/hanxiao/bert-as-service/issues/467
https://blog.csdn.net/cktcrawl/article/details/103028725
参考资料
Elasticsearch meets BERT
windows下的启动bert-serving-server
bert+es7实现相似度搜索(待测试与更新bert中文预处理模型)
bert-as-service
Bert 中文使用方式
使用文档
Be First to Comment