🖥

#Qiita #API で得たタグ一覧を元に #Twitter 的なハッシュタグをテキストに付与する #python スクリプトの例

2019/04/20に公開

こんな感じでタグの一覧を JSON で得てファイルに記録しておく ( #Qiita のタグ一覧の #json を得る簡単な #python スクリプトの例 · Issue #1333 · YumaInaura/YumaInaura )
日本語は正規表現的に特殊パターンとかが出てきそうなので除外して、英単語だけを対象にする

tags.json

一部

$ cat ../qiita/log/tags.json | jq '.[200:205]'
[
  {
    "followers_count": 236,
    "icon_url": "https://s3-ap-northeast-1.amazonaws.com/qiita-tag-image/dc4b674d7e818f250e75e2452c8fdabe117e3159/medium.jpg?1515773118",
    "id": "Raspberrypi3",
    "items_count": 648
  },
  {
    "followers_count": 229,
    "icon_url": "https://s3-ap-northeast-1.amazonaws.com/qiita-tag-image/51713336a1474837883624c6cc2b10de6a21ddd8/medium.jpg?1387910883",
    "id": "勉強会",
    "items_count": 647
  },
  {
    "followers_count": 296,
    "icon_url": "https://s3-ap-northeast-1.amazonaws.com/qiita-tag-image/ede8a283c83a86055fd62288526cb691dc8e678a/medium.jpg?1409291235",
    "id": "テスト",
    "items_count": 647
  },
  {
    "followers_count": 14061,
    "icon_url": "https://s3-ap-northeast-1.amazonaws.com/qiita-tag-image/10a6aaf785afbc2e1c7ee05a7edba804b5b518ef/medium.jpg?1456046980",
    "id": "Facebook",
    "items_count": 645
  },
  {
    "followers_count": 484,
    "icon_url": "https://s3-ap-northeast-1.amazonaws.com/qiita-tag-image/8e6240a62e8594ddc3fe4933701dad8b2170acb9/medium.jpg?1423025366",
    "id": "WebGL",
    "items_count": 642
  }
]

jq コマンドで英単語だけに絞り込んだファイルを作成

cat ../qiita/log/tags.json | jq '[.[] | select(.id | match("^[a-zA-z][a-zA-z0-9]+$"))]' > ../qiita/log/english-tags.json

python script

#!/usr/bin/env python3

import sys, json, os, re 
from funcy import pluck

seeds = json.loads(sys.stdin.read())

tags_file = sys.argv[1]
read_tags_file = open(tags_file, "r").read()
tags = json.loads(read_tags_file)

dictionary_json_key = os.environ.get('DICTIONARY_JSON_KEY') if os.environ.get('DICTIONARY_JSON_KEY') else "text"
json_key = os.environ.get('ADD_HASHTAG_JSON_KEY') if os.environ.get('ADD_HASHTAG_JSON_KEY') else "text"

regexp_or = '|'.join(list(pluck(dictionary_json_key, tags)))

regex_pattern = r'\b(?<!#)(' + regexp_or + r')\b'
pattern = re.compile(regex_pattern, re.IGNORECASE)

results = []

for seed in seeds:
  result = seed

  result[json_key] = re.sub(pattern, "#\\1", seed[json_key])

  results.append(result)

print(json.dumps(results))

exe

JSON オブジェクトの配列を標準出力で python script に与える
第一引数にタグ一覧のファイルパスを渡す
#Ruby #python などがタグ付けされているのが分かる

$ echo '[{"text":"i like Ruby and python and Perl and linux both" }]' | DICTIONARY_JSON_KEY=id ./add-hashtag.py ../qiita/log/english-tags.json
[{"text": "i like #Ruby and #python and #Perl and #linux both"}]

Original by Github issue

チャットメンバー募集

何か質問、悩み事、相談などあればLINEオープンチャットもご利用ください。

Twitter

https://twitter.com/YumaInaura

公開日時

2019-04-20

tags.json

jq コマンドで英単語だけに絞り込んだファイルを作成

python script

exe

Original by Github issue

チャットメンバー募集

Twitter

公開日時

Discussion