データサイエンス100本ノック(SQL)31~40
新しく出てきた内容
集約関数
標準偏差[stddev_samp()]
パーセンテージ[percentile_cont() within group (order by )]
with句を使用した副問い合わせ(サブクエリ)の方法
内部結合(inner join)、完全外部結合(full outer join)
左外部結合(left join)、クロス結合(cross join)
COALESCEを使用した欠損値も含めた値の置換方法
追記
最近になってGCPのBigQueryでも回答を作ってみたので合わせてそのコードも記載する
基本的にPostgreSQLの下にGCPのコードを記していく
S-031:レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに
売上金額(amount)の標本標準偏差を計算し、降順にTOP5を表示せよ。
PostgreSQL
select store_cd,
stddev_samp(amount) as std_amount
from receipt
group by store_cd
order by std_amount desc
limit 5;
BigQuery
select store_cd,
stddev_samp(amount) as std_amount
from [データセット名].receipt
group by store_cd
order by std_amount desc
limit 5;
stddev_samp() = stddev() どちらでも良い
S-032:レシート明細テーブル(receipt)に対し、
売上金額(amount)について25%刻みでパーセンタイル値を求めよ。
PostgreSQL
select percentile_cont(0.25) within group (order by amount) as per_25,
percentile_cont(0.5) within group (order by amount) as per_50,
percentile_cont(0.75) within group (order by amount) as per_75,
percentile_cont(1.0) within group (order by amount) as per_100
from receipt;
BigQuery
select distinct percentile_cont(amount, 0) over() as min_,
percentile_cont(amount, 0.25) over() as per_25,
percentile_cont(amount, 0.5) over() as median,
percentile_cont(amount, 0.75) over() as per_75,
percentile_cont(amount, 1) over() as max_,
from [データセット名].receipt;
S-033:レシート明細テーブル(receipt)に対し、店舗コード(store_cd)ごとに
売上金額(amount)の平均を計算し、330以上のものを抽出せよ。
PostgreSQL
select store_cd,
avg(amount) as avg_amount
from receipt
group by store_cd
having avg(amount) >= 330;
BigQuery
select store_cd,
avg(amount) as avg_amount
from [データセット名].receipt
group by store_cd
having avg(amount) >= 330;
S-034:レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに
売上金額(amount)を合計して全顧客の平均を求めよ。
ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。
PostgreSQL
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
where customer_id not like 'Z%'
group by customer_id
)
select avg(sum_amount)
from customer_amount;
BigQuery
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
)
select avg(sum_amount) as avg_
from customer_amount;
--with句を使ってサブクエリ(副問い合わせ)を作ると読みやすくて良い。
S-035:レシート明細テーブル(receipt)に対し、顧客ID(customer_id)ごとに
販売金額(amount)を合計して全顧客の平均を求め、
平均以上に買い物をしている顧客を抽出せよ。
ただし、顧客IDが"Z"から始まるのものは非会員を表すため、除外して計算すること。
なお、データは10件だけ表示させれば良い。
PostgreSQL
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
where customer_id not like 'Z%'
group by customer_id
)
select customer_id,
sum_amount
from customer_amount
where sum_amount >= (select avg(sum_amount) from customer_amount)
limit 10;
BigQuery
with customer_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
)
select customer_id,
sum_amount
from customer_amount
where sum_amount >= (select avg(sum_amount) from customer_amount)
limit 10;
S-036:レシート明細テーブル(receipt)と店舗テーブル(store)を内部結合し、
レシート明細テーブルの全項目と店舗テーブルの店舗名(store_name)を10件表示せよ。
PostgreSQL
select r.*,
s.store_name
from receipt as r
inner join store as s
on r.store_cd = s.store_cd
limit 10;
BigQuery
select r.*,
s.store_name
from [データセット名].receipt as r
inner join [データセット名].store as s
on r.store_cd = s.store_cd
limit 10;
S-037:商品テーブル(product)とカテゴリテーブル(category)を内部結合し、
商品テーブルの全項目と
カテゴリテーブルの小区分名(category_small_name)を10件表示させよ。
PostgreSQL
select p.*,
c.category_small_name
from product as p
inner join category as c
on p.category_small_cd = c.category_small_cd
limit 10;
BigQuery
select p.*,
c.category_small_name
from [データセット名].product as p
inner join [データセット名].category as c
on p.category_small_cd = c.category_small_cd
limit 10;
S-038:顧客テーブル(customer)とレシート明細テーブル(receipt)から、
各顧客ごとの売上金額合計を求めよ。
ただし、買い物の実績がない顧客については売上金額を0として表示させること。
また、顧客は性別コード(gender_cd)が女性(1)であるものを対象とし、
非会員(顧客IDが'Z'から始まるもの)は除外すること。
なお、結果は10件だけ表示させれば良い。
PostgreSQL
with customer_sum_amount as (
select customer_id,
sum(amount) as sum_amount
from receipt
group by customer_id
)
select cu.customer_id,
coalesce(csa.sum_amount, '0')
from customer as cu
left join customer_sum_amount as csa
on cu.customer_id = csa.customer_id
where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
limit 10;
BigQuery
with customer_sum_amount as (
select customer_id,
sum(amount) as sum_amount
from [データセット名].receipt
group by customer_id
)
select cu.customer_id,
coalesce(csa.sum_amount, 0) as customer_sum_amount
from [データセット名].customer as cu
left join customer_sum_amount as csa
on cu.customer_id = csa.customer_id
where cu.customer_id not like 'Z%' and cu.gender_cd = '1'
limit 10;
coalesce関数で欠損値を置換できる。
S-039:レシート明細テーブル(receipt)から売上日数の多い顧客の上位20件と、売上金額合計の多い顧客の上位20件を抽出し、完全外部結合せよ。
ただし、非会員(顧客IDが'Z'から始まるもの)は除外すること。
PostgreSQL
with customer_days as (
select customer_id,
count(distinct sales_ymd) as come_days
from receipt
where customer_id not like 'Z%'
group by customer_id
order by come_days desc
limit 20
),
customer_amount as (
select customer_id,
sum(amount) as buy_amount
from receipt
where customer_id not like 'Z%'
group by customer_id
order by buy_amount desc
limit 20
)
select c_d.*,
c_a.buy_amount
from customer_days as c_d
full outer join customer_amount as c_a
on c_d.customer_id = c_a.customer_id;
BigQuery
with customer_days as (
select customer_id,
count(distinct sales_ymd) as come_days
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
order by come_days desc
limit 20
),
customer_amount as (
select customer_id,
sum(amount) as buy_amount
from [データセット名].receipt
where customer_id not like 'Z%'
group by customer_id
order by buy_amount desc
limit 20
)
select c_d.*,
c_a.buy_amount
from customer_days as c_d
full outer join customer_amount as c_a
on c_d.customer_id = c_a.customer_id
order by c_d.come_days desc;
S-040:全ての店舗と全ての商品を組み合わせると何件のデータとなるか調査したい。店舗(store)と商品(product)を直積した件数を計算せよ。
PostgreSQL
select count(*)
from store as s
cross join product as p;
BigQuery
select count(*) as count
from [データセット名].store as s
cross join [データセット名].product as p;
Discussion