🐈

データサイエンス100本ノック(SQL)31~40

2021/11/30に公開

新しく出てきた内容

集約関数
 標準偏差[stddev_samp()]
 パーセンテージ[percentile_cont() within group (order by )]

with句を使用した副問い合わせ(サブクエリ)の方法
内部結合(inner join)、完全外部結合(full outer join)
左外部結合(left join)、クロス結合(cross join)
COALESCEを使用した欠損値も含めた値の置換方法

https://www.postgresql.jp/document/13/html/functions-conditional.html

追記
最近になってGCPのBigQueryでも回答を作ってみたので合わせてそのコードも記載する
基本的にPostgreSQLの下にGCPのコードを記していく


S-031:レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに
     売上金額(amount)の標本標準偏差を計算し、降順にTOP5を表示せよ。

PostgreSQL
 select store_cd,
        stddev_samp(amount) as std_amount
 from receipt
 group by store_cd
 order by std_amount desc
 limit 5;

BigQuery
 select store_cd,
        stddev_samp(amount) as std_amount
 from [データセット名].receipt
 group by store_cd
 order by std_amount desc
 limit 5;

stddev_samp() = stddev() どちらでも良い


S-032:レシート明細テーブル(receipt)に対し、
     売上金額(amount)について25%刻みでパーセンタイル値を求めよ。

PostgreSQL
 select percentile_cont(0.25) within group (order by amount) as per_25,
        percentile_cont(0.5) within group (order by amount) as per_50,
        percentile_cont(0.75) within group (order by amount) as per_75,
        percentile_cont(1.0) within group (order by amount) as per_100
 from receipt;

BigQuery
 select distinct percentile_cont(amount, 0) over() as min_,
                 percentile_cont(amount, 0.25) over() as per_25,
                 percentile_cont(amount, 0.5) over() as median,
                 percentile_cont(amount, 0.75) over() as per_75,
                 percentile_cont(amount, 1) over() as max_,
 from [データセット名].receipt;

S-033:レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに
     売上金額(amount)の平均を計算し、330以上のものを抽出せよ。

PostgreSQL
 select store_cd,
        avg(amount) as avg_amount
 from receipt
 group by store_cd
 having avg(amount) >= 330;

BigQuery
 select store_cd,
        avg(amount) as avg_amount
 from [データセット名].receipt
 group by store_cd
 having avg(amount) >= 330;

S-034:レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに
     売上金額(amount)を合計して全顧客の平均を求めよ。
    ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。

PostgreSQL
 with customer_amount as (
                          select customer_id,
                                 sum(amount) as sum_amount
                          from receipt
                          where customer_id not like 'Z%'
                          group by customer_id
                         )
 select avg(sum_amount)
 from customer_amount;

BigQuery
 with customer_amount as (
                          select customer_id,
                                 sum(amount) as sum_amount
                          from [データセット名].receipt
                          where customer_id not like 'Z%'
                          group by customer_id
                         )
 select avg(sum_amount) as avg_
 from customer_amount;

--with句を使ってサブクエリ(副問い合わせ)を作ると読みやすくて良い。


S-035:レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに
     販売金額(amount)を合計して全顧客の平均を求め、
     平均以上に買い物をしている顧客を抽出せよ。
    ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。
    なお、データは10件だけ表示させれば良い。

PostgreSQL
 with customer_amount as (
                          select customer_id,
                                 sum(amount) as sum_amount
                          from receipt
                          where customer_id not like 'Z%'
                          group by customer_id
                         )
 select customer_id,
        sum_amount
 from customer_amount
 where sum_amount >= (select avg(sum_amount) from customer_amount)
 limit 10;

BigQuery
 with customer_amount as (
                          select customer_id,
                                 sum(amount) as sum_amount
                          from [データセット名].receipt
                          where customer_id not like 'Z%'
                          group by customer_id
                         )
 select customer_id,
        sum_amount
 from customer_amount
 where sum_amount >= (select avg(sum_amount) from customer_amount)
 limit 10;

S-036:レシート明細テーブル(receipt)と店舗テーブル(store)を内部結合し、
     レシート明細テーブルの全項目と店舗テーブルの店舗名(store_name)を10件表示せよ。

PostgreSQL
 select r.*,
        s.store_name
 from receipt as r
 inner join store as s
 on r.store_cd = s.store_cd
 limit 10;

BigQuery
 select r.*,
        s.store_name
 from [データセット名].receipt as r
 inner join [データセット名].store as s
 on r.store_cd = s.store_cd
 limit 10;

S-037:商品テーブル(product)とカテゴリテーブル(category)を内部結合し、
     商品テーブルの全項目と
     カテゴリテーブルの小区分名(category_small_name)を10件表示させよ。

PostgreSQL
 select p.*,
        c.category_small_name
 from product as p
 inner join category as c
 on p.category_small_cd = c.category_small_cd
 limit 10;

BigQuery
 select p.*,
        c.category_small_name
 from [データセット名].product as p
 inner join [データセット名].category as c
 on p.category_small_cd = c.category_small_cd
 limit 10;

S-038:顧客テーブル(customer)とレシート明細テーブル(receipt)から、
     各顧客ごとの売上金額合計を求めよ。
    ただし、買い物の実績がない顧客については売上金額を0として表示させること。
    また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、
     非会員(顧客IDが'Z'から始まるもの)は除外すること。
    なお、結果は10件だけ表示させれば良い。

PostgreSQL
 with customer_sum_amount as (
                              select customer_id,
                                     sum(amount) as sum_amount
                              from receipt
                              group by customer_id
                             )
 select cu.customer_id,
        coalesce(csa.sum_amount, '0')
 from customer as cu
 left join customer_sum_amount as csa
 on cu.customer_id = csa.customer_id
 where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
 limit 10;

BigQuery
 with customer_sum_amount as (
                              select customer_id,
                                     sum(amount) as sum_amount
                              from [データセット名].receipt
                             group by customer_id
                             )
 select cu.customer_id,
        coalesce(csa.sum_amount, 0) as customer_sum_amount
 from [データセット名].customer as cu
 left join customer_sum_amount as csa
 on cu.customer_id = csa.customer_id
 where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
 limit 10;

coalesce関数で欠損値を置換できる。


S-039:レシート明細テーブル(receipt)から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。
ただし、非会員(顧客IDが'Z'から始まるもの)は除外すること。

PostgreSQL
 with customer_days as (
                        select customer_id,
                               count(distinct sales_ymd) as come_days
                        from receipt
                        where customer_id not like 'Z%'
                        group by customer_id
                        order by come_days desc
                        limit 20
                       ),
      customer_amount as (
                          select customer_id,
                                 sum(amount) as buy_amount
                          from receipt
                          where customer_id not like 'Z%'
                          group by customer_id
                          order by buy_amount desc
                          limit 20
                         )
 select c_d.*,
        c_a.buy_amount
 from customer_days as c_d
 full outer join customer_amount as c_a
 on c_d.customer_id = c_a.customer_id;

BigQuery
 with customer_days as (
                        select customer_id,
                               count(distinct sales_ymd) as come_days
                        from [データセット名].receipt
                        where customer_id not like 'Z%'
                        group by customer_id
                        order by come_days desc
                        limit 20
                       ),
     customer_amount as (
                         select customer_id,
                                sum(amount) as buy_amount
                         from [データセット名].receipt
                         where customer_id not like 'Z%'
                         group by customer_id
                         order by buy_amount desc
                         limit 20
                        )
 select c_d.*,
        c_a.buy_amount
 from customer_days as c_d
 full outer join customer_amount as c_a
 on c_d.customer_id = c_a.customer_id
 order by c_d.come_days desc;

S-040:全ての店舗と全ての商品を組み合わせると何件のデータとなるか調査したい。店舗(store)と商品(product)を直積した件数を計算せよ。

PostgreSQL
 select count(*)
 from store as s
 cross join product as p;

BigQuery
 select count(*) as count
 from [データセット名].store as s
 cross join [データセット名].product as p;

Discussion