ð°Pythonã§å§ããèªç¶èšèªåŠçïŒspaCyã©ã€ãã©ãªå ¥é - åå¿è ã§ãåããïŒ
ã¿ãªãããããã«ã¡ã¯ïŒä»åã¯ãèªç¶èšèªåŠçïŒNLPïŒã®äžçã«é£ã³èŸŒãã§ã¿ãããšæããŸãã
ç§èªèº«ãååŠè
ãªã®ã§ééã£ãèŠè§£ãªã©ãããã°ã³ã¡ã³ããé¡ãèŽããŸãã
ã¯ããã«
æè¿ããããã°ããŒã¿ãããAIããšããèšèãããè³ã«ããŸãããïŒå®ã¯ããããã®æè¡ã®è£åŽã§å€§æŽ»èºããŠããã®ãèªç¶èšèªåŠçãªãã§ããSNSã®æçš¿ããã¥ãŒã¹èšäºãååã¬ãã¥ãŒãªã©ãç§ãã¡ã®åšãã«ã¯ããããã®ããã¹ãããŒã¿ããããŸããããããåæããŠæçšãªæ
å ±ãåãåºã... ãããªããšãã§ãããé¢çœããã ãšæããŸãããïŒ
ããã§ä»åã¯ãPythonãšãã人æ°ã®ããã°ã©ãã³ã°èšèªã䜿ã£ãŠãèªç¶èšèªåŠçã®åºæ¬ãåŠãã§ãããŸããç¹ã«ããspaCyããšãããšãŠã䟿å©ãªã©ã€ãã©ãªã䜿ã£ãŠãæ¥æ¬èªã®ããã¹ããåæããæ¹æ³ã玹ä»ããŸãã
ãããäžç·ã«èªç¶èšèªåŠçã®äžçãæ¢æ€ããŸãããïŒ
èªç¶èšèªåŠçïŒNLPïŒã£ãŠäœïŒ
ãèªç¶èšèªåŠçã... ãªãã ãé£ããããªèšèã§ããããã§ããå®ã¯ãšãŠãã·ã³ãã«ãªæŠå¿µãªãã§ãã
ç°¡åã«èšããšãèªç¶èšèªåŠçãšã¯ã人éã®èšèãã³ã³ãã¥ãŒã¿ã«ç解ãããæè¡ãã§ããäŸãã°ã»ã»ã»
- ã¹ããŒããã©ã³ã®é³å£°ã¢ã·ã¹ã¿ã³ãïŒSiriãªã©ïŒãç§ãã¡ã®è³ªåãç解ããŠçããŠããã
- Googleãæ€çŽ¢ããŒã¯ãŒããç解ããŠãé¢é£ããæ å ±ã衚瀺ããŠããã
- æ©æ¢°ç¿»èš³ãµãŒãã¹ããããèšèªããå¥ã®èšèªã«æç« ã翻蚳ããŠããã
ãããã¯ãã¹ãŠèªç¶èšèªåŠçã®å¿çšäŸãªãã§ãããããã§ããïŒ
èªç¶èšèªåŠçã«ã¯ãäž»ã«ä»¥äžã®ãããªã¿ã¹ã¯ããããŸãã
- ããã¹ãã®åé¡ïŒäŸãã°ãã¡ãŒã«ãã¹ãã ãã©ãããå€æãã
- ææ åæïŒæç« ãããžãã£ããªå 容ãããã¬ãã£ããªå 容ããå€æãã
- åºæè¡šçŸæœåºïŒæç« ãã人åãå°åãçµç¹åãªã©ãæœåºãã
- èŠçŽïŒé·ãæç« ãçãèŠçŽãã
- 質åå¿çïŒè³ªåã«å¯ŸããŠé©åãªçããçæãã
ä»åã¯ããããã®ã¿ã¹ã¯ã®åºç€ãšãªãæè¡ãspaCyãšããã©ã€ãã©ãªã䜿ã£ãŠåŠãã§ãããŸãã
æºåãããïŒ
ãŸãã¯æºåããå§ããŸãã
å¿ èŠãªãã®
- PythonïŒããŒãžã§ã³3.8以äžïŒ
- spaCyïŒããŒãžã§ã³3.0以äžïŒ
- æ¥æ¬èªã¢ãã«ïŒja_core_news_smïŒ
PythonããŸã ã€ã³ã¹ããŒã«ãããŠããªãæ¹ã¯ãå ¬åŒãµã€ãããããŠã³ããŒãããŠã€ã³ã¹ããŒã«ããŠãã ããã
spaCyã®ã€ã³ã¹ããŒã«
Pythonãã€ã³ã¹ããŒã«ã§ãããã次ã¯spaCyãã€ã³ã¹ããŒã«ããŸããã³ãã³ãããã³ããïŒWindowsã®å ŽåïŒãã¿ãŒããã«ïŒMacãLinuxã®å ŽåïŒãéããŠã以äžã®ã³ãã³ããå ¥åããŠãã ããïŒ
pip install spacy
次ã«ãæ¥æ¬èªã¢ãã«ãããŠã³ããŒãããŸãïŒ
python -m spacy download ja_core_news_sm
ããã§æºåå®äºã§ãïŒ
spaCyã䜿ã£ãŠã¿ããïŒ
æºåãã§ããããæ©éspaCyã䜿ã£ãŠã¿ãŸãããããŸãã¯ãåºæ¬çãªäœ¿ãæ¹ããå§ããŸãã
Pythonã®ã³ãŒããšãã£ã¿ïŒPyCharmãVSCodeãªã©ïŒãéããŠã以äžã®ã³ãŒããå
¥åããŠã¿ãŠãã ããïŒ
import spacy
# æ¥æ¬èªã¢ãã«ã®èªã¿èŸŒã¿
nlp = spacy.load("ja_core_news_sm")
# åæããããã¹ã
text = "ç§ã¯æ±äº¬ã§åããŠããŸããæ¯æ¥é»è»ã§éå€ããŠããŸãã"
# ããã¹ããåŠç
doc = nlp(text)
# çµæã®è¡šç€ºïŒåèªããšã«åå²ïŒ
print("åèªåå²ã®çµæ:")
for token in doc:
print(token.text)
ãã®ã³ãŒããå®è¡ãããšã以äžã®ãããªçµæã衚瀺ãããã¯ãã§ãïŒ
åèªåå²ã®çµæ:
ç§
ã¯
æ±äº¬
ã§
åã
ãŠ
ã
ãŸã
ã
æ¯æ¥
é»è»
ã§
éå€
ã
ãŠ
ã
ãŸã
ã
ããïŒããã¹ããåèªããšã«åå²ãããŸãããã
ããããããŒã¯ã³åããšåŒã°ããåŠçã§ãã
èªç¶èšèªåŠçã®åºæ¬äžã®åºæ¬ãªãã§ãã
ãã£ãšè©³ããèŠãŠã¿ãã
spaCyã«ã¯ä»ã«ãããããã®æ©èœããããŸããããã§ã¯ãç¹ã«äŸ¿å©ãª3ã€ã®æ©èœã玹ä»ããŸãã
-
åè©ã¿ã°ä»ã
ååèªãã©ããªçš®é¡ã®èšèãªã®ãïŒåè©ãåè©ã圢容è©ãªã©ïŒãå€å®ããæ©èœã§ããå ã»ã©ã®ã³ãŒãã«ä»¥äžãè¿œå ããŠã¿ãŸãããïŒ
print("\nåè©ã¿ã°ä»ãã®çµæ:")
for token in doc:
print(f"{token.text}: {token.pos_}")
çµæã¯ãããªæãã«ãªããŸãïŒ
åè©ã¿ã°ä»ãã®çµæ:
ç§: PRON
ã¯: ADP
æ±äº¬: PROPN
ã§: ADP
åã: VERB
ãŠ: SCONJ
ã: VERB
ãŸã: AUX
ã: PUNCT
æ¯æ¥: ADV
é»è»: NOUN
ã§: ADP
éå€: VERB
ã: AUX
ãŠ: SCONJ
ã: VERB
ãŸã: AUX
ã: PUNCT
PRONã¯ä»£åè©ãVERBã¯åè©ãNOUNã¯åè©ãè¡šããŠããŸããããã§æç« ã®æ§é ããã詳ããåãããŸããã
-
åºæè¡šçŸæœåº
人åãçµç¹åãå°åãªã©ã®åºæåè©ãæœåºããæ©èœã§ãã以äžã®ã³ãŒããè¿œå ããŠã¿ãŸãããïŒ
print("\nåºæè¡šçŸæœåºã®çµæ:")
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
çµæïŒ
åºæè¡šçŸæœåºã®çµæ:
æ±äº¬: GPE
GPEã¯ãGeo-Political Entityãã®ç¥ã§ãå°ççã»æ¿æ²»çãªå®äœïŒãã®å Žåã¯éœåžåïŒãè¡šããŠããŸãããæ±äº¬ããæ£ããå°åãšããŠèªèãããŠããŸããã
-
äŸåæ§æ解æ
æäžã®åèªå士ã®é¢ä¿ïŒäž»èªãšè¿°èªã®é¢ä¿ãªã©ïŒãåæããæ©èœã§ãã以äžã®ã³ãŒããè¿œå ããŠãã ããïŒ
print("\näŸåæ§æ解æã®çµæ:")
for token in doc:
print(f"{token.text} <- {token.head.text} ({token.dep_})")
çµæïŒ
äŸåæ§æ解æã®çµæ:
ç§ <- åã (nsubj)
㯠<- ç§ (case)
æ±äº¬ <- åã (obl)
㧠<- æ±äº¬ (case)
åã <- åã (ROOT)
㊠<- åã (mark)
ã <- ㊠(fixed)
ãŸã <- åã (aux)
ã <- åã (punct)
æ¯æ¥ <- éå€ (advmod)
é»è» <- éå€ (obl)
㧠<- é»è» (case)
éå€ <- éå€ (ROOT)
ã <- éå€ (aux)
㊠<- éå€ (mark)
ã <- ㊠(fixed)
ãŸã <- éå€ (aux)
ã <- éå€ (punct)
ãã®çµæããããç§ãããåããã®äž»èªïŒnsubjïŒã§ããããšãããæ±äº¬ãããåããã®å ŽæïŒoblïŒã§ããããšãªã©ãåãããŸãã
å®è·µïŒç°¡åãªæç« åæããŒã«ãäœã£ãŠã¿ããïŒ
ããããããŸã§ã®ç¥èã䜿ã£ãŠãç°¡åãªæç« åæããŒã«ãäœã£ãŠã¿ãŸãããããã®ããŒã«ã¯ãäžããããæç« ããåè©ãåè©ãåºæè¡šçŸãæœåºããŸãã
import spacy
def analyze_text(text):
nlp = spacy.load("ja_core_news_sm")
doc = nlp(text)
# åè©ã®æœåº
nouns = [token.text for token in doc if token.pos_ == "NOUN"]
# åè©ã®æœåº
verbs = [token.text for token in doc if token.pos_ == "VERB"]
# åºæè¡šçŸã®æœåº
entities = [(ent.text, ent.label_) for ent in doc.ents]
return {
"nouns": nouns,
"verbs": verbs,
"entities": entities
}
# åæã®å®è¡
text = "å±±ç°å€ªéã¯æ±äº¬ã§åããŠããŸãã圌ã¯æ¯æ¥ã³ãŒããŒã飲ã¿ãŸãã"
result = analyze_text(text)
print("æç« åæçµæ:")
print(f"åè©: {result['nouns']}")
print(f"åè©: {result['verbs']}")
print(f"åºæè¡šçŸ: {result['entities']}")
ãã®ã³ãŒããå®è¡ãããšã以äžã®ãããªçµæãåŸãããŸãïŒ
æç« åæçµæ:
åè©: ['ã³ãŒããŒ']
åè©: ['åã', 'ã', '飲ã¿']
åºæè¡šçŸ: [('å±±ç°å€ªé', 'PERSON'), ('æ±äº¬', 'GPE')]
ããã§ç°¡åãªæç« åæããŒã«ã®å®æã§ãïŒ
ãã®çµæãããæç« ã®äž»ãªå
容ïŒèª°ããã©ãã§ãäœãããŠãããïŒãäžç®ã§åãããŸããã
ãŸãšã
ãããã§ãããïŒspaCyã䜿ããšãè€éããã«èŠããèªç¶èšèªåŠçãæå€ãšç°¡åã«ã§ããããšãåãã£ãã®ã§ã¯ãªãã§ããããã
ä»ååŠãã ããšã䜿ãã°ãäŸãã°ä»¥äžã®ãããªããšãã§ããããã«ãªããŸãïŒ
- SNSã®æçš¿ããããã䜿ãããŠããåè©ãåè©ãæœåºããŠããã¬ã³ããåæããã
- ãã¥ãŒã¹èšäºãã人åãçµç¹åãæœåºããŠã誰ã«ã€ããŠæžãããèšäºãªã®ããå€æããã
- ååã¬ãã¥ãŒã®æç« æ§é ãåæããŠãããžãã£ããªæèŠãšãã¬ãã£ããªæèŠãåé¡ããã
èªç¶èšèªåŠçã®äžçã¯æ¬åœã«å¥¥ãæ·±ãããŸã ãŸã çºå±éäžã®åéã§ããä»åã®å 容ã¯ãã®å ¥ãå£ã«éããŸãããã§ãããã®èšäºãèªãã§å°ãã§ãèå³ãæã£ãŠããã ãããªããããã¯ãšãŠãå¬ããããšã§ãã
ãã£ãšåŠã³ãã人ãž
èªç¶èšèªåŠçã«ãã£ãšèå³ãæã£ãæ¹ã以äžã®ãªãœãŒã¹ãããããã§ãïŒ
- spaCyå ¬åŒããã¥ã¡ã³ãïŒããã«è©³ãã䜿ãæ¹ãåŠã¹ãŸãã
- èªç¶èšèªåŠç100æ¬ããã¯ïŒå®è·µçãªåé¡ã解ããªããNLPã®ã¹ãã«ã磚ããŸãã
èªç¶èšèªåŠçã®äžçã¯æ¥ã
é²åããŠããŸãããã£ãšãããªãã®ã¢ã€ãã¢æ¬¡ç¬¬ã§ããŸã 誰ãæãã€ããªãã£ãçŽ æŽãããã¢ããªã±ãŒã·ã§ã³ãçãŸãããããããŸããã
ç§èªèº«ããããããã£ãšæ·±ãèªç¶èšèªåŠçã«ã€ããŠåŠãã§ãããããšæã£ãŠããŸããæ°ããããšãåŠã¶ãã³ã«ããŸãçãããšå
±æã§ããã°ãããªãšæããŸãã
ããã§ã¯ãŸãäŒããŸãããïŒ
Discussion