代码拉取完成,页面将自动刷新
##:MeCab
!apt install aptitude swig
!apt install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
Reading package lists... Done Building dependency tree Reading state information... Done aptitude is already the newest version (0.8.10-6ubuntu1). swig is already the newest version (3.0.12-1). mecab is already installed at the requested version (0.996-5) libmecab-dev is already installed at the requested version (0.996-5) mecab-ipadic-utf8 is already installed at the requested version (2.7.0-20070801+main-1) git is already installed at the requested version (1:2.17.1-1ubuntu0.7) make is already installed at the requested version (4.1-9.1ubuntu1) curl is already installed at the requested version (7.58.0-2ubuntu3.8) xz-utils is already installed at the requested version (5.2.2-1.3) file is already installed at the requested version (1:5.32-2ubuntu0.3) mecab is already installed at the requested version (0.996-5) libmecab-dev is already installed at the requested version (0.996-5) mecab-ipadic-utf8 is already installed at the requested version (2.7.0-20070801+main-1) git is already installed at the requested version (1:2.17.1-1ubuntu0.7) make is already installed at the requested version (4.1-9.1ubuntu1) curl is already installed at the requested version (7.58.0-2ubuntu3.8) xz-utils is already installed at the requested version (5.2.2-1.3) file is already installed at the requested version (1:5.32-2ubuntu0.3) No packages will be installed, upgraded, or removed. 0 packages upgraded, 0 newly installed, 0 to remove and 29 not upgraded. Need to get 0 B of archives. After unpacking 0 B will be used.
!pip install mecab-python3
Requirement already satisfied: mecab-python3 in /usr/local/lib/python3.6/dist-packages (0.996.5)
!git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
!echo yes | mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -a
fatal: destination path 'mecab-ipadic-neologd' already exists and is not an empty directory. [install-mecab-ipadic-NEologd] : Start.. [install-mecab-ipadic-NEologd] : Check the existance of libraries [install-mecab-ipadic-NEologd] : find => ok [install-mecab-ipadic-NEologd] : sort => ok [install-mecab-ipadic-NEologd] : head => ok [install-mecab-ipadic-NEologd] : cut => ok [install-mecab-ipadic-NEologd] : egrep => ok [install-mecab-ipadic-NEologd] : mecab => ok [install-mecab-ipadic-NEologd] : mecab-config => ok [install-mecab-ipadic-NEologd] : make => ok [install-mecab-ipadic-NEologd] : curl => ok [install-mecab-ipadic-NEologd] : sed => ok [install-mecab-ipadic-NEologd] : cat => ok [install-mecab-ipadic-NEologd] : diff => ok [install-mecab-ipadic-NEologd] : tar => ok [install-mecab-ipadic-NEologd] : unxz => ok [install-mecab-ipadic-NEologd] : xargs => ok [install-mecab-ipadic-NEologd] : grep => ok [install-mecab-ipadic-NEologd] : iconv => ok [install-mecab-ipadic-NEologd] : patch => ok [install-mecab-ipadic-NEologd] : which => ok [install-mecab-ipadic-NEologd] : file => ok [install-mecab-ipadic-NEologd] : openssl => ok [install-mecab-ipadic-NEologd] : awk => ok [install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd is already up-to-date [install-mecab-ipadic-NEologd] : mecab-ipadic-NEologd will be install to /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd [install-mecab-ipadic-NEologd] : Make mecab-ipadic-NEologd [make-mecab-ipadic-NEologd] : Start.. [make-mecab-ipadic-NEologd] : Check local seed directory [make-mecab-ipadic-NEologd] : Check local seed file [make-mecab-ipadic-NEologd] : Check local build directory [make-mecab-ipadic-NEologd] : Download original mecab-ipadic file [make-mecab-ipadic-NEologd] : Original mecab-ipadic file is already there. [make-mecab-ipadic-NEologd] : Decompress original mecab-ipadic file [make-mecab-ipadic-NEologd] : Delete old mecab-ipadic-2.7.0-20070801-neologd-20200430 directory mecab-ipadic-2.7.0-20070801/ mecab-ipadic-2.7.0-20070801/README mecab-ipadic-2.7.0-20070801/AUTHORS mecab-ipadic-2.7.0-20070801/COPYING mecab-ipadic-2.7.0-20070801/ChangeLog mecab-ipadic-2.7.0-20070801/INSTALL mecab-ipadic-2.7.0-20070801/Makefile.am mecab-ipadic-2.7.0-20070801/Makefile.in mecab-ipadic-2.7.0-20070801/NEWS mecab-ipadic-2.7.0-20070801/aclocal.m4 mecab-ipadic-2.7.0-20070801/config.guess mecab-ipadic-2.7.0-20070801/config.sub mecab-ipadic-2.7.0-20070801/configure mecab-ipadic-2.7.0-20070801/configure.in mecab-ipadic-2.7.0-20070801/install-sh mecab-ipadic-2.7.0-20070801/missing mecab-ipadic-2.7.0-20070801/mkinstalldirs mecab-ipadic-2.7.0-20070801/Adj.csv mecab-ipadic-2.7.0-20070801/Adnominal.csv mecab-ipadic-2.7.0-20070801/Adverb.csv mecab-ipadic-2.7.0-20070801/Auxil.csv mecab-ipadic-2.7.0-20070801/Conjunction.csv mecab-ipadic-2.7.0-20070801/Filler.csv mecab-ipadic-2.7.0-20070801/Interjection.csv mecab-ipadic-2.7.0-20070801/Noun.adjv.csv mecab-ipadic-2.7.0-20070801/Noun.adverbal.csv mecab-ipadic-2.7.0-20070801/Noun.csv mecab-ipadic-2.7.0-20070801/Noun.demonst.csv mecab-ipadic-2.7.0-20070801/Noun.nai.csv mecab-ipadic-2.7.0-20070801/Noun.name.csv mecab-ipadic-2.7.0-20070801/Noun.number.csv mecab-ipadic-2.7.0-20070801/Noun.org.csv mecab-ipadic-2.7.0-20070801/Noun.others.csv mecab-ipadic-2.7.0-20070801/Noun.place.csv mecab-ipadic-2.7.0-20070801/Noun.proper.csv mecab-ipadic-2.7.0-20070801/Noun.verbal.csv mecab-ipadic-2.7.0-20070801/Others.csv mecab-ipadic-2.7.0-20070801/Postp-col.csv mecab-ipadic-2.7.0-20070801/Postp.csv mecab-ipadic-2.7.0-20070801/Prefix.csv mecab-ipadic-2.7.0-20070801/Suffix.csv mecab-ipadic-2.7.0-20070801/Symbol.csv mecab-ipadic-2.7.0-20070801/Verb.csv mecab-ipadic-2.7.0-20070801/char.def mecab-ipadic-2.7.0-20070801/feature.def mecab-ipadic-2.7.0-20070801/left-id.def mecab-ipadic-2.7.0-20070801/matrix.def mecab-ipadic-2.7.0-20070801/pos-id.def mecab-ipadic-2.7.0-20070801/rewrite.def mecab-ipadic-2.7.0-20070801/right-id.def mecab-ipadic-2.7.0-20070801/unk.def mecab-ipadic-2.7.0-20070801/dicrc mecab-ipadic-2.7.0-20070801/RESULT [make-mecab-ipadic-NEologd] : Configure custom system dictionary on /content/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801-neologd-20200430 checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking whether make sets $(MAKE)... yes checking for working aclocal-1.4... missing checking for working autoconf... missing checking for working automake-1.4... missing checking for working autoheader... missing checking for working makeinfo... missing checking for a BSD-compatible install... /usr/bin/install -c checking for mecab-config... /usr/bin/mecab-config configure: creating ./config.status config.status: creating Makefile [make-mecab-ipadic-NEologd] : Encode the character encoding of system dictionary resources from EUC_JP to UTF-8 ./../../libexec/iconv_euc_to_utf8.sh ./Adverb.csv ./../../libexec/iconv_euc_to_utf8.sh ./Interjection.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.org.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.place.csv ./../../libexec/iconv_euc_to_utf8.sh ./Auxil.csv ./../../libexec/iconv_euc_to_utf8.sh ./Symbol.csv ./../../libexec/iconv_euc_to_utf8.sh ./Prefix.csv ./../../libexec/iconv_euc_to_utf8.sh ./Verb.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.proper.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.name.csv ./../../libexec/iconv_euc_to_utf8.sh ./Conjunction.csv ./../../libexec/iconv_euc_to_utf8.sh ./Others.csv ./../../libexec/iconv_euc_to_utf8.sh ./Filler.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.nai.csv ./../../libexec/iconv_euc_to_utf8.sh ./Suffix.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.verbal.csv ./../../libexec/iconv_euc_to_utf8.sh ./Postp.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.demonst.csv ./../../libexec/iconv_euc_to_utf8.sh ./Adnominal.csv ./../../libexec/iconv_euc_to_utf8.sh ./Postp-col.csv ./../../libexec/iconv_euc_to_utf8.sh ./Adj.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.number.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.adjv.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.others.csv ./../../libexec/iconv_euc_to_utf8.sh ./Noun.adverbal.csv rm ./Adverb.csv rm ./Interjection.csv rm ./Noun.org.csv rm ./Noun.place.csv rm ./Auxil.csv rm ./Symbol.csv rm ./Prefix.csv rm ./Verb.csv rm ./Noun.proper.csv rm ./Noun.name.csv rm ./Conjunction.csv rm ./Others.csv rm ./Filler.csv rm ./Noun.csv rm ./Noun.nai.csv rm ./Suffix.csv rm ./Noun.verbal.csv rm ./Postp.csv rm ./Noun.demonst.csv rm ./Adnominal.csv rm ./Postp-col.csv rm ./Adj.csv rm ./Noun.number.csv rm ./Noun.adjv.csv rm ./Noun.others.csv rm ./Noun.adverbal.csv ./../../libexec/iconv_euc_to_utf8.sh ./pos-id.def ./../../libexec/iconv_euc_to_utf8.sh ./feature.def ./../../libexec/iconv_euc_to_utf8.sh ./right-id.def ./../../libexec/iconv_euc_to_utf8.sh ./char.def ./../../libexec/iconv_euc_to_utf8.sh ./left-id.def ./../../libexec/iconv_euc_to_utf8.sh ./matrix.def ./../../libexec/iconv_euc_to_utf8.sh ./rewrite.def ./../../libexec/iconv_euc_to_utf8.sh ./unk.def rm ./pos-id.def rm ./feature.def rm ./right-id.def rm ./char.def rm ./left-id.def rm ./matrix.def rm ./rewrite.def rm ./unk.def mv ./left-id.def.utf8 ./left-id.def mv ./Others.csv.utf8 ./Others.csv mv ./Postp.csv.utf8 ./Postp.csv mv ./char.def.utf8 ./char.def mv ./Filler.csv.utf8 ./Filler.csv mv ./Noun.csv.utf8 ./Noun.csv mv ./Symbol.csv.utf8 ./Symbol.csv mv ./Prefix.csv.utf8 ./Prefix.csv mv ./Adj.csv.utf8 ./Adj.csv mv ./Noun.name.csv.utf8 ./Noun.name.csv mv ./Interjection.csv.utf8 ./Interjection.csv mv ./matrix.def.utf8 ./matrix.def mv ./pos-id.def.utf8 ./pos-id.def mv ./Noun.proper.csv.utf8 ./Noun.proper.csv mv ./Noun.org.csv.utf8 ./Noun.org.csv mv ./Auxil.csv.utf8 ./Auxil.csv mv ./Noun.adjv.csv.utf8 ./Noun.adjv.csv mv ./Noun.place.csv.utf8 ./Noun.place.csv mv ./Noun.nai.csv.utf8 ./Noun.nai.csv mv ./Adverb.csv.utf8 ./Adverb.csv mv ./Adnominal.csv.utf8 ./Adnominal.csv mv ./Verb.csv.utf8 ./Verb.csv mv ./Noun.demonst.csv.utf8 ./Noun.demonst.csv mv ./Noun.verbal.csv.utf8 ./Noun.verbal.csv mv ./Noun.number.csv.utf8 ./Noun.number.csv mv ./Postp-col.csv.utf8 ./Postp-col.csv mv ./rewrite.def.utf8 ./rewrite.def mv ./Noun.others.csv.utf8 ./Noun.others.csv mv ./right-id.def.utf8 ./right-id.def mv ./unk.def.utf8 ./unk.def mv ./Conjunction.csv.utf8 ./Conjunction.csv mv ./Noun.adverbal.csv.utf8 ./Noun.adverbal.csv mv ./feature.def.utf8 ./feature.def mv ./Suffix.csv.utf8 ./Suffix.csv [make-mecab-ipadic-NEologd] : Fix yomigana field of IPA dictionary patching file Noun.csv patching file Noun.place.csv patching file Verb.csv patching file Noun.verbal.csv patching file Noun.name.csv patching file Noun.adverbal.csv patching file Noun.csv patching file Noun.name.csv patching file Noun.org.csv patching file Noun.others.csv patching file Noun.place.csv patching file Noun.proper.csv patching file Noun.verbal.csv patching file Prefix.csv patching file Suffix.csv patching file Noun.proper.csv patching file Noun.csv patching file Noun.name.csv patching file Noun.org.csv patching file Noun.place.csv patching file Noun.proper.csv patching file Noun.verbal.csv patching file Noun.name.csv patching file Noun.org.csv patching file Noun.place.csv patching file Noun.proper.csv patching file Suffix.csv patching file Noun.demonst.csv patching file Noun.csv patching file Noun.name.csv [make-mecab-ipadic-NEologd] : Copy user dictionary resource [make-mecab-ipadic-NEologd] : Install adverb entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-adverb-dict-seed.20150623.csv.xz [make-mecab-ipadic-NEologd] : Install interjection entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-interjection-dict-seed.20170216.csv.xz [make-mecab-ipadic-NEologd] : Install noun orthographic variant entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-common-noun-ortho-variant-dict-seed.20170228.csv.xz [make-mecab-ipadic-NEologd] : Install noun orthographic variant entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-proper-noun-ortho-variant-dict-seed.20161110.csv.xz [make-mecab-ipadic-NEologd] : Install entries of orthographic variant of a noun used as verb form using /content/mecab-ipadic-neologd/libexec/../seed/neologd-noun-sahen-conn-ortho-variant-dict-seed.20160323.csv.xz [make-mecab-ipadic-NEologd] : Install frequent adjective orthographic variant entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-std-dict-seed.20151126.csv.xz [make-mecab-ipadic-NEologd] : Install infrequent adjective orthographic variant entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-exp-dict-seed.20151126.csv.xz [make-mecab-ipadic-NEologd] : Install adjective verb orthographic variant entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-adjective-verb-dict-seed.20160324.csv.xz [make-mecab-ipadic-NEologd] : Install infrequent datetime representation entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-date-time-infreq-dict-seed.20190415.csv.xz [make-mecab-ipadic-NEologd] : Install infrequent quantity representation entries using /content/mecab-ipadic-neologd/libexec/../seed/neologd-quantity-infreq-dict-seed.20190415.csv.xz [make-mecab-ipadic-NEologd] : Install entries of ill formed words using /content/mecab-ipadic-neologd/libexec/../seed/neologd-ill-formed-words-dict-seed.20170127.csv.xz [make-mecab-ipadic-NEologd] : Re-Index system dictionary reading ./unk.def ... 40 emitting double-array: 100% |###########################################| ./model.def is not found. skipped. reading ./neologd-common-noun-ortho-variant-dict-seed.20170228.csv ... 152869 reading ./Adverb.csv ... 3032 reading ./Interjection.csv ... 252 reading ./Noun.org.csv ... 17149 reading ./neologd-adjective-std-dict-seed.20151126.csv ... 507812 reading ./Noun.place.csv ... 73194 reading ./Auxil.csv ... 199 reading ./Symbol.csv ... 208 reading ./neologd-proper-noun-ortho-variant-dict-seed.20161110.csv ... 138379 reading ./neologd-quantity-infreq-dict-seed.20190415.csv ... 229216 reading ./Prefix.csv ... 224 reading ./neologd-date-time-infreq-dict-seed.20190415.csv ... 16866 reading ./Verb.csv ... 130750 reading ./Noun.proper.csv ... 27493 reading ./Noun.name.csv ... 34215 reading ./Conjunction.csv ... 171 reading ./neologd-adverb-dict-seed.20150623.csv ... 139792 reading ./Others.csv ... 2 reading ./Filler.csv ... 19 reading ./Noun.csv ... 60734 reading ./neologd-ill-formed-words-dict-seed.20170127.csv ... 60616 reading ./Noun.nai.csv ... 42 reading ./Suffix.csv ... 1448 reading ./neologd-adjective-verb-dict-seed.20160324.csv ... 20268 reading ./Noun.verbal.csv ... 12150 reading ./neologd-interjection-dict-seed.20170216.csv ... 4701 reading ./Postp.csv ... 146 reading ./Noun.demonst.csv ... 120 reading ./Adnominal.csv ... 135 reading ./Postp-col.csv ... 91 reading ./Adj.csv ... 27210 reading ./mecab-user-dict-seed.20200430.csv ... 3190186 reading ./neologd-adjective-exp-dict-seed.20151126.csv ... 1051146 reading ./Noun.number.csv ... 42 reading ./Noun.adjv.csv ... 3328 reading ./Noun.others.csv ... 153 reading ./Noun.adverbal.csv ... 808 reading ./neologd-noun-sahen-conn-ortho-variant-dict-seed.20160323.csv ... 26058 emitting double-array: 100% |###########################################| reading ./matrix.def ... 1316x1316 emitting matrix : 100% |###########################################| done! [make-mecab-ipadic-NEologd] : Make custom system dictionary on /content/mecab-ipadic-neologd/libexec/../build/mecab-ipadic-2.7.0-20070801-neologd-20200430 make: Nothing to be done for 'all'. [make-mecab-ipadic-NEologd] : Finish.. [install-mecab-ipadic-NEologd] : Get results of tokenize test [test-mecab-ipadic-NEologd] : Start.. [test-mecab-ipadic-NEologd] : Replace timestamp from 'git clone' date to 'git commit' date [test-mecab-ipadic-NEologd] : Get buzz phrases % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1573 100 1573 0 0 1237 0 0:00:01 0:00:01 --:--:-- 1237 [test-mecab-ipadic-NEologd] : Get difference between default system dictionary and mecab-ipadic-NEologd [test-mecab-ipadic-NEologd] : Tokenize phrase using default system dictionary [test-mecab-ipadic-NEologd] : Tokenize phrase using mecab-ipadic-NEologd [test-mecab-ipadic-NEologd] : Get result of diff [test-mecab-ipadic-NEologd] : Please check difference between default system dictionary and mecab-ipadic-NEologd default system dictionary | mecab-ipadic-NEologd [test-mecab-ipadic-NEologd] : Finish.. [install-mecab-ipadic-NEologd] : Please check the list of differences in the upper part. [install-mecab-ipadic-NEologd] : Do you want to install mecab-ipadic-NEologd? Type yes or no. [install-mecab-ipadic-NEologd] : OK. Let's install mecab-ipadic-NEologd. [install-mecab-ipadic-NEologd] : Start.. [install-mecab-ipadic-NEologd] : /usr/lib/x86_64-linux-gnu/mecab/dic is current user's directory [install-mecab-ipadic-NEologd] : Make install to /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd make[1]: Entering directory '/content/mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200430' make[1]: Nothing to be done for 'install-exec-am'. /bin/bash ./mkinstalldirs /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd /usr/bin/install -c -m 644 ./matrix.bin /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/matrix.bin /usr/bin/install -c -m 644 ./char.bin /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/char.bin /usr/bin/install -c -m 644 ./sys.dic /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/sys.dic /usr/bin/install -c -m 644 ./unk.dic /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/unk.dic /usr/bin/install -c -m 644 ./left-id.def /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/left-id.def /usr/bin/install -c -m 644 ./right-id.def /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/right-id.def /usr/bin/install -c -m 644 ./rewrite.def /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/rewrite.def /usr/bin/install -c -m 644 ./pos-id.def /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/pos-id.def /usr/bin/install -c -m 644 ./dicrc /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd/dicrc make[1]: Leaving directory '/content/mecab-ipadic-neologd/build/mecab-ipadic-2.7.0-20070801-neologd-20200430' [install-mecab-ipadic-NEologd] : Install completed. [install-mecab-ipadic-NEologd] : When you use MeCab, you can set '/usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd' as a value of '-d' option of MeCab. [install-mecab-ipadic-NEologd] : Usage of mecab-ipadic-NEologd is here. Usage: $ mecab -d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd ... [install-mecab-ipadic-NEologd] : Finish.. [install-mecab-ipadic-NEologd] : Finish..
import subprocess
cmd='echo `mecab-config --dicdir`"/mecab-ipadic-neologd"'
path_neologd = (subprocess.Popen(cmd, stdout=subprocess.PIPE,
shell=True).communicate()[0]).decode('utf-8')
import MeCab
m=MeCab.Tagger("-Ochasen")
text = "예제。"
text_segmented = m.parse(text)
print(text_segmented)
EOS
m=MeCab.Tagger("-Owakati")
text_segmented = m.parse(text)
print(text_segmented)
m=MeCab.Tagger("-Oyomi")
text_segmented = m.parse(text)
print(text_segmented)
m=MeCab.Tagger("-Ochasen -d "+str(path_neologd)) # NEologd
text = ""
text_segmented = m.parse(text)
print(text_segmented)
EOS
m=MeCab.Tagger("-Owakati -d "+str(path_neologd)) # NEologdへのパスを追加
text_segmented = m.parse(text)
print(text_segmented)
!pip install transformers==2.9.0
Requirement already satisfied: transformers==2.9.0 in /usr/local/lib/python3.6/dist-packages (2.9.0) Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (2.23.0) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (4.41.1) Requirement already satisfied: sacremoses in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (0.0.43) Requirement already satisfied: sentencepiece in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (0.1.86) Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (3.0.12) Requirement already satisfied: tokenizers==0.7.0 in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (0.7.0) Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (0.7) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (2019.12.20) Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers==2.9.0) (1.18.4) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers==2.9.0) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers==2.9.0) (2020.4.5.1) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers==2.9.0) (1.24.3) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers==2.9.0) (2.9) Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers==2.9.0) (0.14.1) Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers==2.9.0) (7.1.2) Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers==2.9.0) (1.12.0)
import torch
from transformers.modeling_bert import BertModel
from transformers.tokenization_bert_japanese import BertJapaneseTokenizer
# 分かち書きをするtokenizerです
tokenizer = BertJapaneseTokenizer.from_pretrained('bert-base-japanese-whole-word-masking')
# BERT
model = BertModel.from_pretrained('bert-base-japanese-whole-word-masking')
print(model)
BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32000, 768, padding_idx=0) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) )
from transformers import BertConfig
#
config_japanese = BertConfig.from_pretrained('bert-base-japanese-whole-word-masking')
print(config_japanese)
BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "type_vocab_size": 2, "vocab_size": 32000 }
。
text1 = ""
text2 = ""
text3 = ""
#
input_ids1 = tokenizer.encode(text1, return_tensors='pt') # ptはPyTorchの略
print(tokenizer.convert_ids_to_tokens(input_ids1[0].tolist())) # 文章
print(input_ids1) # id
['[CLS]', ] tensor([[ 2, 811, 11, 13700, 7, 58, 10, 8, 3]])
#
input_ids2 = tokenizer.encode(text2, return_tensors='pt') # ptはPyTorch
print(tokenizer.convert_ids_to_tokens(input_ids2[0].tolist())) #
print(input_ids2) # id
['[CLS]', '[SEP]'] tensor([[ 2, 5521, 3118, 4027, 12, 13700, 14, 4897, 28457, 8, 3]])
# id
input_ids3 = tokenizer.encode(text3, return_tensors='pt') # ptはPyTorchの略
print(tokenizer.convert_ids_to_tokens(input_ids3[0].tolist())) # 文章
print(input_ids3) # id
['[CLS]', '[SEP]'] tensor([[ 2, 811, 11, 7279, 26, 20, 10, 8, 3]])
# BERT
result1 = model(input_ids1)
print(result1[0].shape)
print(result1[1].shape)
# reult sequence_output, pooled_output, (hidden_states), (attentions)。
# 。
torch.Size([1, 9, 768]) torch.Size([1, 768])
# ERT
result2 = model(input_ids2)
result3 = model(input_ids3)
word_vec1 = result1[0][0][3][:] # 1
word_vec2 = result2[0][0][5][:] # 2
word_vec3 = result3[0][0][3][:] # 3ㅅ
#
cos = torch.nn.CosineSimilarity(dim=0)
cos_sim_12 = cos(word_vec1, word_vec2)
cos_sim_13 = cos(word_vec1, word_vec3)
print(cos_sim_12)
print(cos_sim_13)
tensor(0.6647, grad_fn=<DivBackward0>) tensor(0.7841, grad_fn=<DivBackward0>)
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。