🧩

Wordleをプレイせずに、みんなのツイート結果から正解を予測する

2022/01/25に公開約3,400字

Wordleネタです。ほぼほぼタイトル詐欺です。

Solverを作られているかたが結構いるので、アプローチを変えてやってみました (Wordleのルールはこちら を参照してください)。

このように解いた結果をツイートしている方がかなりいるので、これを使って正解を出すことはできないか考えました。以下の方針でやってみます。

  • 五文字単語のリストを使う
    • 使っていいかどうかは悩ましいですが、Wordleのソース中に埋め込まれている12972語の単語リストを使用します
  • みんなのツイートから使えそうな行を探す
    • "🟩🟨🟩🟨⬛️" のように、🟩 と 🟨 が多い行が絞り込みに有利
  • 五文字単語のリストと集めた行データを使って、正解の候補を絞り込む
    • 例えば正解が "MAKER" だったと仮定すると、ツイート結果に "⬛️🟩🟩🟩🟩" があれば、"FAKER" などがそこに入っていることなります。もし候補が1つもない場合は、その単語は正解ではないことになります

これを「12972語の単語リスト * 集めた行すべて」で計算します。かなり力技です。ソースは一応ここに貼っておきますが、きたないのでアレです。Pythonで書いた方が良かった感。

結果

#218 のツイートから使えそうなものを集めました。

"🟨🟩🟩🟩⬛️"
"⬛️🟩🟩🟩🟩"
"🟩⬛️🟩🟩🟩"
"🟩🟩🟩⬛️🟩"
"🟩🟩🟩🟨⬛️"
"🟩🟩⬛️🟩🟩"

集める前から薄々気付いていたのですが、使えるパターンは結構少ないです。この時点で雲行きはかなり怪しい。

結果は、12972個を2068個に絞り込むことができました。

これではいまいちなので、さらにチートをします。Wordleのソース中には12972個の単語リストがありますが、実際には、2300個ぐらいのリストと残りのリストに分かれていて、正解はこの2300個のリストの中にあるみたいです(入力には残りのリストのほうも使えます)。そこで、このリストを使ってさらに絞り込んでみると、2068個を205個まで削ることができました。

["naval","grade","croak","sower","cluck","crass","bleed","alone","click","coast","clock","prove","solve","grime","craze","picky","crank","gorge","crimp","perky","shard","swill","hoard","slosh","purge","harry","black","chunk","slave","butch","glass","phase","payer","brink","beady","rusty","droll","flack","bland","liver","patty","unlit","gully","trice","saint","glory","grove","piney","fanny","folly","shock","crave","valve","mulch","blast","paste","carve","slack","minty","slick","chose","grace","chase","prone","diner","goody","elite","reset","purse","crock","slide","brawn","cress","elude","boule","clink","thank","poppy","track","caddy","dumpy","sport","brain","candy","march","silky","jolly","rouse","pudgy","laugh","block","taper","share","otter","tuber","blank","beach","graze","trade","modal","missy","party","touch","crest","ramen","glide","chuck","shift","shape","chore","bully","short","mammy","mange","flank","waist","slope","ditty","pence","sulky","trunk","merge","brash","swell","gassy","latch","holly","grass","shack","shark","sally","nerdy","shone","terse","shirt","chick","shale","glade","stoke","filly","billy","clout","lease","thick","blare","slept","sewer","parse","plank","verge","miner","brush","bossy","sense","taken","moral","cease","clack","flame","broke","sheer","brand","think","spine","smote","snack","crime","pinky","chock","stone","grant","cable","boast","toddy","daily","berry","gross","gloss","bulky","horse","clank","brunt","swash","clone","glaze","raven","slink","shorn","shore","mamma","lorry","chime","crack","bunch","broth","pasty","happy","golly","crone","mossy","sully","leach","blame","shine","shave"]

正解は含まれていますが、うーむ、いまいち。

念の為、他の回でもやってみます。次の #219にて、同様に使えそう行を集めました。

"⬛️🟩⬛️🟩⬛️"
"🟩⬛️⬛️🟩⬛️"
"⬛️⬛️🟩🟩🟩"
"🟩🟩🟩⬛️⬛️"
"⬛️🟨🟩🟨🟨"
"🟨⬛️🟩🟨⬛️"
"⬛️🟩⬛️🟩🟨"

結果は 12972個が10344個に。さらに2300個のリストで絞り込んでも、2102個までしか絞り込みできませんでした。

まとめ

正解ツイートからのみで予測するのはちょっと厳しそうです。単独の行を使用するのではなく、各ユーザーごとに、最大6行の時系列データとして処理したらなにかできるかな? とか、正解を5x6 pixelの画像として、CNNで解けないか? (でも、10000超のカテゴリ分類は筋悪そう)なども考えたのですが、ここまでかなと。

Discussion

ログインするとコメントできます