Press "Enter" to skip to content

M1加速深度学习:HanLP正式支持苹果芯GPU

本站内容均来自兴趣收集,如不慎侵害的您的相关权益,请留言告知,我们将尽快删除.谢谢.

 

今天我的MBP M1MAX终于寄到了,于是 第一时间为HanLP提供M1的原生CPU+GPU支持。MBP用户从此享受到GPU加速的推理与训练,微调个BERT同样丝滑。本文简要介绍原生环境搭建与安装,适用于包括M1系列在内的Apple Silicon芯片。

 

 

首先介绍一些基础知识,我们最常用的Intel芯片是amd64架构,而 M1其实是 arm64架构的子集,它的binary通常命名为 osx-arm64。按道理来讲amd64的binary放到arm64芯片上是不能执行的,好在Apple提供了一种翻译机器指令的软件,称作Rosetta,可以即时将amd64指令翻译为 osx-arm64。经过我的体验,绝大多数amd64软件可以无缝运行,得益于M1MAX的强大性能,甚至感觉比MBP15上的Intel i7还要快。这些软件包括Java、pyhanlp以及HanLPv2.x。

 

对,你没有看错。如果你将Intel MBP上的数据(JVM、Python)通过迁移助手或者时间胶囊迁移到M1 MBP上, 你不需要任何操作就可以直接运行pyhanlp和HanLPv2.x。只不过所有指令都得过一遍 Rosetta,而不是native binary。所以有一定性能损失,也没法用到Apple的GPU加速。

 

本文的目的就是解决这两个问题:

 

 

如何避免 Rosetta,直接 利用M1的CPU?

 

如何利用M1的GPU核心加速神经网络?

 

 

第一个问题自然是编译安装 osx-arm64的Python、PyTorch、TensorFlow等。好在现在有了conda这样的工具,可以直接下载官方编译的 osx-arm64 binary。第二个问题则需要安装tensorflow-macos这个由苹果魔改的TF,以及苹果的GPU运行时Metal。

 

 

提前声明,以下所有安装步骤适用于在M1上跑任意深度学习应用,并非局限于HanLP。HanLP在其他平台上的安装从来都是一句pip搞定,不存在难安装的问题。如果你无法理解,请勿传播你无法理解的内容。

 

安装

 

安装conda

 

这里使用的社区维护的分支,叫做miniforge3,里面的包比miniconda还少,可谓最精简版。

 

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
sh ./Miniforge3-MacOSX-arm64.sh
source ~/miniforge3/bin/activate

 

安装 tensorflow-macos

 

conda install -c apple tensorflow-deps==2.6.0 -y
pip install tensorflow-macos==2.6.0
pip install tensorflow-metal

 

安装PyTorch

 

conda install -c pytorch pytorch -y

 

安装huggingface相关依赖

 

截止今天, huggingface的tokenizers依赖Rust编译器,却未提交 osx-arm64的wheel。所以需要安装Rust编译器,自行编译。请将Xcode升级到最新后,执行:

 

conda install -c conda-forge sentencepiece -y
conda install -c conda-forge rust -y
pip install transformers

 

安装其他第三方依赖

 

pip install bert-for-tf2-mod==0.14.10
pip install py-params==0.9.7
pip install params-flow==0.8.2

 

安装HanLP

 

pip install hanlp

 

推理

 

TensorFlow

 

import hanlp
tokenizer = hanlp.load('LARGE_ALBERT_BASE')
tagger = hanlp.load('CTB9_POS_ALBERT_BASE')
syntactic_parser = hanlp.load('CTB7_BIAFFINE_DEP_ZH')
semantic_parser = hanlp.load('SEMEVAL16_TEXT_BIAFFINE_ZH')
pipeline = hanlp.pipeline() \
    .append(hanlp.utils.rules.split_sentence, output_key='sentences') \
    .append(tokenizer, output_key='tokens') \
    .append(tagger, output_key='part_of_speech_tags') \
    .append(syntactic_parser, input_key=('tokens', 'part_of_speech_tags'), output_key='syntactic_dependencies', conll=False) \
    .append(semantic_parser, input_key=('tokens', 'part_of_speech_tags'), output_key='semantic_dependencies', conll=False)
print(pipeline)
text = '''HanLP是一系列模型与算法组成的自然语言处理工具包,目标是普及自然语言处理在生产环境中的应用。
HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。
内部算法经过工业界和学术界考验,配套书籍《自然语言处理入门》已经出版。
'''
doc = pipeline(text)
print(doc)

 

你将看到如下字样:

 

Metal device set to: Apple M1 Max
systemMemory: 64.00 GB
maxCacheSize: 21.33 GB
{
  "sentences": [
    "HanLP是一系列模型与算法组成的自然语言处理工具包,目标是普及自然语言处理在生产环境中的应用。",
    "HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。",
    "内部算法经过工业界和学术界考验,配套书籍《自然语言处理入门》已经出版。"
  ],
  "tokens": [
    ["HanLP", "是", "一系列", "模型", "与", "算法", "组成", "的", "自然", "语言", "处理", "工具包", ",", "目标", "是", "普及", "自然", "语言", "处理", "在", "生产", "环境", "中", "的", "应用", "。"],
    ["HanLP", "具备", "功能", "完善", "、", "性能", "高效", "、", "架构", "清晰", "、", "语料", "时", "新", "、", "可", "自", "定义", "的", "特点", "。"],
    ["内部", "算法", "经过", "工业界", "和", "学术界", "考验", ",", "配套", "书籍", "《", "自然", "语言", "处理", "入门", "》", "已经", "出版", "。"]
  ],
  "part_of_speech_tags": [
    ["NR", "VC", "CD", "NN", "P", "NN", "VV", "DEC", "NN", "NN", "NN", "NN", "PU", "NN", "VC", "VV", "NN", "NN", "VV", "P", "NN", "NN", "LC", "DEG", "NN", "PU"],
    ["NR", "VV", "NN", "VA", "PU", "NN", "VA", "PU", "NN", "VA", "PU", "NN", "LC", "VA", "PU", "VV", "VV", "VV", "DEC", "NN", "PU"],
    ["NN", "NN", "P", "NN", "CC", "NN", "NN", "PU", "JJ", "NN", "PU", "NN", "NN", "NN", "VV", "PU", "AD", "VV", "PU"]
  ],
  "syntactic_dependencies": [
    [[0, "root"], [26, "dep"], [9, "dep"], [9, "dep"], [0, "root"], [22, "dep"], [0, "root"], [9, "dep"], [9, "dep"], [0, "root"], [0, "dep"], [15, "dep"], [9, "dep"], [15, "dep"], [15, "dep"], [9, "dep"], [9, "dep"], [0, "dep"], [0, "dep"], [0, "root"], [0, "root"], [0, "root"], [0, "root"], [0, "root"], [0, "dep"], [0, "dep"]],
    [[21, "dep"], [21, "dep"], [21, "dep"], [0, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"]],
    [[19, "dep"], [19, "dep"], [0, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"]]
  ],
  "semantic_dependencies": [
    [[[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[1], ["dFreq"]], [[1], ["dFreq"]], [[0], ["Root"]], [[1], ["dFreq"]], [[0], ["Root"]], [[1], ["dFreq"]], [[1], ["dFreq"]], [[0], ["Root"]], [[0], ["Root"]]],
    [[[0], ["Root"]], [[0], ["Root"]], [[12], ["rBelg"]], [[12], ["rBelg"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[12], ["dFreq"]], [[12], ["rBelg"]]],
    [[[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]]]
  ]
}
[None->LambdaComponent->sentences, sentences->TransformerTokenizerTF->tokens, tokens->TransformerTaggerTF->part_of_speech_tags, ['tokens', 'part_of_speech_tags']->BiaffineDependencyParserTF->syntactic_dependencies, ['tokens', 'part_of_speech_tags']->BiaffineSemanticDependencyParserTF->semantic_dependencies]
{
  "sentences": [
    "HanLP是一系列模型与算法组成的自然语言处理工具包,目标是普及自然语言处理在生产环境中的应用。",
    "HanLP具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。",
    "内部算法经过工业界和学术界考验,配套书籍《自然语言处理入门》已经出版。"
  ],
  "tokens": [
    ["HanLP", "是", "一系列", "模型", "与", "算法", "组成", "的", "自然", "语言", "处理", "工具包", ",", "目标", "是", "普及", "自然", "语言", "处理", "在", "生产", "环境", "中", "的", "应用", "。"],
    ["HanLP", "具备", "功能", "完善", "、", "性能", "高效", "、", "架构", "清晰", "、", "语料", "时", "新", "、", "可", "自", "定义", "的", "特点", "。"],
    ["内部", "算法", "经过", "工业界", "和", "学术界", "考验", ",", "配套", "书籍", "《", "自然", "语言", "处理", "入门", "》", "已经", "出版", "。"]
  ],
  "part_of_speech_tags": [
    ["NR", "VC", "CD", "NN", "P", "NN", "VV", "DEC", "NN", "NN", "NN", "NN", "PU", "NN", "VC", "VV", "NN", "NN", "VV", "P", "NN", "NN", "LC", "DEG", "NN", "PU"],
    ["NR", "VV", "NN", "VA", "PU", "NN", "VA", "PU", "NN", "VA", "PU", "NN", "LC", "VA", "PU", "VV", "VV", "VV", "DEC", "NN", "PU"],
    ["NN", "NN", "P", "NN", "CC", "NN", "NN", "PU", "JJ", "NN", "PU", "NN", "NN", "NN", "VV", "PU", "AD", "VV", "PU"]
  ],
  "syntactic_dependencies": [
    [[0, "root"], [26, "dep"], [9, "dep"], [9, "dep"], [0, "root"], [22, "dep"], [0, "root"], [9, "dep"], [9, "dep"], [0, "root"], [0, "dep"], [15, "dep"], [9, "dep"], [15, "dep"], [15, "dep"], [9, "dep"], [9, "dep"], [0, "dep"], [0, "dep"], [0, "root"], [0, "root"], [0, "root"], [0, "root"], [0, "root"], [0, "dep"], [0, "dep"]],
    [[21, "dep"], [21, "dep"], [21, "dep"], [0, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"], [21, "dep"]],
    [[19, "dep"], [19, "dep"], [0, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"], [19, "dep"]]
  ],
  "semantic_dependencies": [
    [[[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[1], ["dFreq"]], [[1], ["dFreq"]], [[0], ["Root"]], [[1], ["dFreq"]], [[0], ["Root"]], [[1], ["dFreq"]], [[1], ["dFreq"]], [[0], ["Root"]], [[0], ["Root"]]],
    [[[0], ["Root"]], [[0], ["Root"]], [[12], ["rBelg"]], [[12], ["rBelg"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[12], ["dFreq"]], [[12], ["rBelg"]]],
    [[[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]], [[0], ["Root"]]]
  ]
}

 

代表M1的GPU已经发挥作用了。

 

PyTorch

 

以MTL为例:

 

import hanlp
from hanlp_common.document import Document
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_BASE_ZH)
doc: Document = HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])
doc.pretty_print()

 

输出:

 

Dep Tree    Token    RelatiPoSTok      NER TypeTok      SRL PA1     Tok      SRL PA2     Tok      PoS    3       4       5       6       7       8       9 
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ┌─────────►2021年    tmod  NT 2021年    ───►DATE2021年    ───►ARGM-TMP2021年                2021年    NT ───────────────────────────────────────────►NP ───┐   
 │┌────────►HanLPv2.1nsubj NR HanLPv2.1───►WWW HanLPv2.1───►ARG0    HanLPv2.1            HanLPv2.1NR ───────────────────────────────────────────►NP────┤   
 ││┌─►┌─────为        prep  P  为                为        ◄─┐         为                    为        P ───────────┐                                       │   
 │││  │  ┌─►生产       nn    NN 生产               生产         ├►ARG2    生产                   生产       NN ──┐       ├────────────────────────►PP ───┐       │   
 │││  └─►└──环境       pobj  NN 环境               环境       ◄─┘         环境                   环境       NN ──┴►NP ───┘                               │       │   
┌┼┴┴────────带来       root  VV 带来               带来       ╟──►PRED    带来                   带来       VV ──────────────────────────────────┐       │       │   
││       ┌─►次        amod  JJ 次                次        ◄─┐         次                    次        JJ ───►ADJP──┐                       │       ├►VP────┤   
││  ┌───►└──世代       nn    NN 世代               世代         │         世代                   世代       NN ───►NP ───┴►NP ───┐               │       │       │   
││  │    ┌─►最        advmodAD 最                最          │         最        ───►ARGM-ADV最        AD ───────────►ADVP──┼►ADJP──┐       ├►VP ───┘       ├►IP
││  │┌──►├──先进       rcmod JJ 先进               先进         │         先进       ╟──►PRED    先进       JJ ───────────►VP ───┘       │       │               │   
││  ││   └─►的        assm  DEG的                的          ├►ARG1    的                    的        DEG──────────────────────────┤       │               │   
││  ││   ┌─►多        nummodCD 多                多          │         多                    多        CD ───►QP ───┐               ├►NP ───┘               │   
││  ││┌─►└──语种       nn    NN 语种               语种         │         语种                   语种       NN ───►NP ───┴────────►NP────┤                       │   
││  │││  ┌─►NLP      nn    NR NLP              NLP        │         NLP                  NLP      NR ──┐                       │                       │   
│└─►└┴┴──┴──技术       dobj  NN 技术               技术       ◄─┘         技术       ───►ARG0    技术       NN ──┴────────────────►NP ───┘                       │   
└──────────►。        punct PU 。                。                    。                    。        PU ──────────────────────────────────────────────────┘   
Dep Tree    TokRelatPoTokNER Type        TokSRL PA1 TokSRL PA2 TokPo    3       4       5       6 
──────────────────────────────────────────────────────────────────────────────────────────────────
         ┌─►阿婆主nsubjNN阿婆主                阿婆主───►ARG0阿婆主───►ARG0阿婆主NN───────────────────►NP ───┐   
┌┬────┬──┴──来到 root VV来到                 来到 ╟──►PRED来到         来到 VV──────────┐               │   
││    │  ┌─►北京 nn   NR北京 ───►LOCATION    北京 ◄─┐     北京         北京 NR──┐       ├►VP ───┐       │   
││    └─►└──立方庭dobj NR立方庭───►LOCATION    立方庭◄─┴►ARG1立方庭        立方庭NR──┴►NP ───┘       │       │   
│└─►┌───────参观 conj VV参观                 参观         参观 ╟──►PRED参观 VV──────────┐       ├►VP────┤   
│   │  ┌───►自然 nn   NN自然 ◄─┐             自然         自然 ◄─┐     自然 NN──┐       │       │       ├►IP
│   │  │┌──►语义 nn   NN语义   │             语义         语义   │     语义 NN  │       ├►VP ───┘       │   
│   │  ││┌─►科技 nn   NN科技   ├►ORGANIZATION科技         科技   ├►ARG1科技 NN  ├►NP ───┘               │   
│   └─►└┴┴──公司 dobj NN公司 ◄─┘             公司         公司 ◄─┘     公司 NN──┘                       │   
└──────────►。  punctPU。                  。          。          。  PU──────────────────────────┘

 

由于PyTorch暂时不支持M1的GPU,所以只能上CPU,期待后续支持。TF虽然很快支持,但相当于重新开了一版,在体验上造成了断裂,未必是正确的做法。

 

训练

 

以微调BERT做情感分析为例:

 

from hanlp.components.classifiers.transformer_classifier_tf import TransformerClassifierTF, TransformerTextTransform
from hanlp.datasets.classification.sentiment import CHNSENTICORP_ERNIE_TRAIN, CHNSENTICORP_ERNIE_TEST, \
    CHNSENTICORP_ERNIE_DEV
save_dir = 'data/model/classification/chnsenticorp_bert_base'
classifier = TransformerClassifierTF(TransformerTextTransform(y_column=0))
classifier.fit(CHNSENTICORP_ERNIE_TRAIN, CHNSENTICORP_ERNIE_DEV, save_dir,
               transformer='chinese_L-12_H-768_A-12')
classifier.load(save_dir)
print(classifier.predict('前台客房服务态度非常好!早餐很丰富,房价很干净。再接再厉!'))
classifier.evaluate(CHNSENTICORP_ERNIE_TEST, save_dir=save_dir)

 

速度是:

 

Trained 3 epochs in 10 m 30 s, each epoch takes 3 m 30 s

 

对比服务器上的TITAN RTX 1080:

 

Trained 3 epochs in 6 m 13 s, each epoch takes 2 m 4 s

 

已经是一个数量级上的选手了,要知道服务器功耗大,这MBP只是个轻薄本。

 

总结

 

MBP2021开箱惊艳,手感丝滑。除了跑训练,我从未听到风扇响过。 Rosetta体验是透明的,大多数软件直接就能跑。配合自己编译安装一些深度学习的库,HanLP还可以直接利用到原生M1以及它的GPU核心,性能可以跟 TITAN RTX 1080达到一个数量级。 至于刘海,根本没注意。买来至今都是合盖,连5k显示器。

 

宝剑赠英雄,希望HanLP搭配MBP能加速用户的研究工作。

 

Be First to Comment

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注