🥇

集計とランキングの実装方法の考察

2023/04/03に公開

Redis

検証用のセットアップ

require "active_record"
require "table_format"
ActiveSupport::LogSubscriber.colorize_logging = false
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Migration.verbose = false
ActiveRecord::Schema.define do
  create_table :histories do |t|
    t.string :name
    t.string :ox
  end
end
class History < ActiveRecord::Base; end
History.create!(name: "a", ox: "x")
History.create!(name: "b", ox: "o")
History.create!(name: "b", ox: "o")
History.create!(name: "b", ox: "x")
History.create!(name: "c", ox: "o")
History.create!(name: "c", ox: "o")
ActiveRecord::Base.logger = ActiveSupport::Logger.new(STDOUT)

集計対象を確認する

tp History
# >   History Load (0.1ms)  SELECT "histories".* FROM "histories"
# > |----|------|----|
# > | id | name | ox |
# > |----|------|----|
# > |  1 | a    | x  |
# > |  2 | b    | o  |
# > |  3 | b    | o  |
# > |  4 | b    | x  |
# > |  5 | c    | o  |
# > |  6 | c    | o  |
# > |----|------|----|

勝敗が o x で入っている。

group + count → Hash は集計の方向が違う

tp h = History.group(:name, :ox).count  # => {["a", "x"]=>1, ["b", "o"]=>2, ["b", "x"]=>1, ["c", "o"]=>2}
# >   History Count (0.1ms)  SELECT COUNT(*) AS "count_all", "histories"."name" AS "histories_name", "histories"."ox" AS "histories_ox" FROM "histories" GROUP BY "histories"."name", "histories"."ox"
# > |------------|---|
# > | ["a", "x"] | 1 |
# > | ["b", "o"] | 2 |
# > | ["b", "x"] | 1 |
# > | ["c", "o"] | 2 |
# > |------------|---|

group(:name, :ox) で複合キーのハッシュが作れるので、

b が勝った回数

h[["b", "o"]]  # => 2

c が負けた回数

h[["c", "x"]]  # => nil

とすれば任意のユーザーの勝敗数が O(1) で求まる。しかし求めたいのは全体のランキングであって個々の勝敗数ではない。そこでハッシュキーに入っているユーザー名を利用してテーブルにしてみると、

tp h.collect { |k, _| k.first }.uniq.collect { |e|
  { name: e, o: h[[e, "o"]] || 0, x: h[[e, "x"]] || 0 }
}.sort_by { |e| -e[:o] }
# > |------|---|---|
# > | name | o | x |
# > |------|---|---|
# > | b    | 2 | 1 |
# > | c    | 2 | 0 |
# > | a    | 0 | 1 |
# > |------|---|---|

なんだこれ。first して uniq するなどあきらかに不自然である。ユニークなユーザー配列がどこにもないのが問題なので再度DBから引くと、

tp History.distinct.pluck("name").collect { |e|
  { name: e, o: h[[e, "o"]] || 0, x: h[[e, "x"]] || 0 }
}.sort_by { |e| -e[:o] }
# >   History Pluck (0.1ms)  SELECT DISTINCT "histories"."name" FROM "histories"
# > |------|---|---|
# > | name | o | x |
# > |------|---|---|
# > | b    | 2 | 1 |
# > | c    | 2 | 0 |
# > | a    | 0 | 1 |
# > |------|---|---|

となり、すこしはましになるがなぜ二度もDBにアクセスすることになってしまったのだろう。これはデータをハッシュで取得したのが失敗だった。

group + select → Array は正しい

count ではなく select に置き換える。そして勝った回数を score とする。

tp History.select([
    "name",
    "COUNT(ox = 'o') AS score",
  ]).group("name").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", COUNT(ox = 'o') AS score FROM "histories" GROUP BY "histories"."name"
# > |------|-------|----|
# > | name | score | id |
# > |------|-------|----|
# > | a    |     1 |    |
# > | b    |     3 |    |
# > | c    |     2 |    |
# > |------|-------|----|

だが score がおかしい。これは次のように、

tp History.select([
    "name",
    "sum(CASE WHEN ox = 'o' THEN 1 ELSE 0 END) AS score",
  ]).group("name").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", sum(CASE WHEN ox = 'o' THEN 1 ELSE 0 END) AS score FROM "histories" GROUP BY "histories"."name"
# > |------|-------|----|
# > | name | score | id |
# > |------|-------|----|
# > | a    |     0 |    |
# > | b    |     2 |    |
# > | c    |     2 |    |
# > |------|-------|----|

ox = 'o' の真偽を 1 0 としてその和を求めるとうまくいく。が、どういうわけか COUNT(ox = 'o' OR NULL) としても同様の結果になる。

tp items = History.select([
    "name",
    "COUNT(ox = 'o' OR NULL) AS score",
  ]).group("name").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", COUNT(ox = 'o' OR NULL) AS score FROM "histories" GROUP BY "histories"."name"
# > |------|-------|----|
# > | name | score | id |
# > |------|-------|----|
# > | a    |     0 |    |
# > | b    |     2 |    |
# > | c    |     2 |    |
# > |------|-------|----|

不可解な構文だが SQL はそういうものなので考えてはいけない。

工夫していろいろ集計しておく (おまけ)

勝数を score とするだけではおもしろくないので他の情報も求めておく場合は、

tp History.select([
    "name",
    "COUNT(ox = 'o' OR NULL) AS o_count",
    "COUNT(ox = 'x' OR NULL) AS x_count",
    "COUNT(*)                AS total",
    "COUNT(ox = 'o' OR NULL) / CAST(COUNT(*) as REAL) AS ratio",
  ]).group("name").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", COUNT(ox = 'o' OR NULL) AS o_count, COUNT(ox = 'x' OR NULL) AS x_count, COUNT(*)                AS total, COUNT(ox = 'o' OR NULL) / CAST(COUNT(*) as REAL) AS ratio FROM "histories" GROUP BY "histories"."name"
# > |------|---------|---------|-------|--------------------|----|
# > | name | o_count | x_count | total | ratio              | id |
# > |------|---------|---------|-------|--------------------|----|
# > | a    |       0 |       1 |     1 |                0.0 |    |
# > | b    |       2 |       1 |     3 | 0.6666666666666666 |    |
# > | c    |       2 |       0 |     2 |                1.0 |    |
# > |------|---------|---------|-------|--------------------|----|

などとする。割り算のときは CAST(xxx as REAL) として浮動小数点型にしないといけない。キャストするのは除数・被除数の一方だけでよい。

順序を考慮する

order("score DESC") を追加する。

tp items = History.select([
    "name",
    "COUNT(ox = 'o' OR NULL) AS score",
  ]).group("name").order("score DESC").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", COUNT(ox = 'o' OR NULL) AS score FROM "histories" GROUP BY "histories"."name" ORDER BY score DESC
# > |------|-------|----|
# > | name | score | id |
# > |------|-------|----|
# > | c    |     2 |    |
# > | b    |     2 |    |
# > | a    |     0 |    |
# > |------|-------|----|

と、この時点でランキングの並びだけは取れる。さにらランクを求めるには、

tp items.collect { |e|
  e.merge(:rank => items.count { |o| o["score"] > e["score"] }.next)
}
# > |------|-------|----|------|
# > | name | score | id | rank |
# > |------|-------|----|------|
# > | c    |     2 |    |    1 |
# > | b    |     2 |    |    1 |
# > | a    |     0 |    |    3 |
# > |------|-------|----|------|

とする。ただ O(N^2) になっているのでかなり遅い。替わりに DB で計算するには、

ActiveRecord::Schema.define do
  create_table :rankings do |t|
    t.string :name
    t.integer :score
  end
end
class Ranking < ActiveRecord::Base; end
Ranking.create!(items)

tp Ranking.all.collect { |e|
  sql = "SELECT COUNT(*) + 1 FROM rankings WHERE score > #{e.score}"
  rank = ActiveRecord::Base.connection.select_value(sql)
  e.attributes.merge(rank: rank)
}
# >   Ranking Load (0.1ms)  SELECT "rankings".* FROM "rankings"
# >    (0.1ms)  SELECT COUNT(*) + 1 FROM rankings WHERE score > 2
# >    (0.0ms)  SELECT COUNT(*) + 1 FROM rankings WHERE score > 2
# >    (0.0ms)  SELECT COUNT(*) + 1 FROM rankings WHERE score > 0
# > |----|------|-------|------|
# > | id | name | score | rank |
# > |----|------|-------|------|
# > |  1 | c    |     2 |    1 |
# > |  2 | b    |     2 |    1 |
# > |  3 | a    |     0 |    3 |
# > |----|------|-------|------|

とすればよいが計算量は変わってないので実用に耐えない。

Redis の ZADD + ZCOUNT

名前とスコアのペアのソートしていない配列を用意し、

tp items = History.select([
    "name",
    "COUNT(ox = 'o' OR NULL) AS score",
  ]).group("name").collect(&:attributes)
# >   History Load (0.1ms)  SELECT "histories"."name", COUNT(ox = 'o' OR NULL) AS score FROM "histories" GROUP BY "histories"."name"
# > |------|-------|----|
# > | name | score | id |
# > |------|-------|----|
# > | a    |     0 |    |
# > | b    |     2 |    |
# > | c    |     2 |    |
# > |------|-------|----|

ZADD で投入し、

require "redis-client"
redis = RedisClient.new
redis.call("FLUSHDB")
items.each do |e|
  redis.call("ZADD", "key1", e["score"], e["name"])
end

ZRANGE を呼ぶと、

tp redis.call("ZRANGE", "key1", 0, -1, :rev, :withscores)
# > |------------|
# > | ["c", 2.0] |
# > | ["b", 2.0] |
# > | ["a", 0.0] |
# > |------------|

ソートされた状態で取得できる。

続いて ZCOUNT に score + 1 を与えると、

redis.call("ZCOUNT", "key1", 2 + 1, "+inf") + 1  # => 1
redis.call("ZCOUNT", "key1", 2 + 1, "+inf") + 1  # => 1
redis.call("ZCOUNT", "key1", 0 + 1, "+inf") + 1  # => 3

ランクが求まる。したがって ZRANGE の結果でループして ZCOUNT にスコアを与えれば、

tp redis.call("ZRANGE", "key1", 0, -1, :rev, :withscores).collect { |name, score|
  {
    name: name,
    score: score,
    rank: redis.call("ZCOUNT", "key1", score + 1, "+inf") + 1,
  }
}
# > |------|-------|------|
# > | name | score | rank |
# > |------|-------|------|
# > | c    |   2.0 |    1 |
# > | b    |   2.0 |    1 |
# > | a    |   0.0 |    3 |
# > |------|-------|------|

全体のランクが求まる。ZCOUNT は O(log(N)) なので一瞬でランクが取れる。

リアルタイムでランクを出す

上の例はランキングを作る直前でまとめて ZADD したけどこれをリアルタイムで更新していくとランクもリアルタイムに出せるようになる。リアルタイムで更新するには正誤を記録したタイミングが適しているので、

class History < ActiveRecord::Base
  after_create do
    score = History.where(name: name, ox: "o").count
    redis = RedisClient.new
    redis.call("ZADD", "key1", score, name)
  end
end

とする。

そこで仮にスコア100を出したユーザーのランクを調べるには、

redis.call("ZCOUNT", "key1", 100 + 1, "+inf") + 1