Open1
Pandasでread_csv()したDataflameをNaNをNoneに変換するときのメモ

df.where(df.notnull(), None)だと一部のカラムしか変換されなかった。
df.replace([np.nan], [None])だと全カラム変換された。
2つの関数の違いをすぐ確認できなかったのでそのうち調べる。
df.whre x df.notnull()だと全データNaNのカラムは拾わないんだっけか...
元データ
- code
import pandas as pd
df = pd.read_csv("filename.csv")
print(df)
- output
0 log 2025-03-01 06:58:30 60.101.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
1 log 2025-03-01 19:36:33 103.246.xx.xx Mozilla/5.0... ... NaN NaN NaN NaN mouse 2025-03-01
2 log 2025-03-01 08:05:38 106.73.xx.xx Mozilla/5.0... ... NaN NaN NaN 404p NaN 2025-03-01
3 log 2025-03-01 22:52:31 123.48.xxx.xx Mozilla/5.0... ... NaN NaN NaN 576p NaN 2025-03-01
4 log 2025-03-01 01:10:24 175.177.xx.xx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
5 log 2025-03-01 20:43:52 120.138.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 576p NaN 2025-03-01
6 log 2025-03-01 07:57:31 61.205.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
7 log 2025-03-01 13:58:47 121.87.xx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
8 log 2025-03-01 09:01:08 59.139.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
9 log 2025-03-01 04:14:18 58.91.xxx.xx Mozilla/5.0... ... NaN NaN NaN 1080p NaN 2025-03-01
df.where
- code
import pandas as pd
df = pd.read_csv("filename.csv")
df = df.where(df.notnull(), None)
print(df)
- output
0 log 2025-03-01 06:58:30 60.101.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
1 log 2025-03-01 19:36:33 103.246.xx.xx Mozilla/5.0... ... NaN NaN NaN None mouse 2025-03-01
2 log 2025-03-01 08:05:38 106.73.xx.xx Mozilla/5.0... ... NaN NaN NaN 404p None 2025-03-01
3 log 2025-03-01 22:52:31 123.48.xxx.xx Mozilla/5.0... ... NaN NaN NaN 576p None 2025-03-01
4 log 2025-03-01 01:10:24 175.177.xx.xx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
5 log 2025-03-01 20:43:52 120.138.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 576p None 2025-03-01
6 log 2025-03-01 07:57:31 61.205.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
7 log 2025-03-01 13:58:47 121.87.xx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
8 log 2025-03-01 09:01:08 59.139.xxx.xxx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
9 log 2025-03-01 04:14:18 58.91.xxx.xx Mozilla/5.0... ... NaN NaN NaN 1080p None 2025-03-01
df.replace
- code
import pandas as pd
import numpy as np
df = pd.read_csv("filename.csv")
df = df.replace([np.nan], [None])
print(df)
- output
0 log 2025-03-01 06:58:30 60.101.xxx.xxx Mozilla/5.0... ... None None None 1080p None 2025-03-01
1 log 2025-03-01 19:36:33 103.246.xx.xx Mozilla/5.0... ... None None None None mouse 2025-03-01
2 log 2025-03-01 08:05:38 106.73.xx.xx Mozilla/5.0... ... None None None 404p None 2025-03-01
3 log 2025-03-01 22:52:31 123.48.xxx.xx Mozilla/5.0... ... None None None 576p None 2025-03-01
4 log 2025-03-01 01:10:24 175.177.xx.xx Mozilla/5.0... ... None None None 1080p None 2025-03-01
5 log 2025-03-01 20:43:52 120.138.xxx.xxx Mozilla/5.0... ... None None None 576p None 2025-03-01
6 log 2025-03-01 07:57:31 61.205.xxx.xxx Mozilla/5.0... ... None None None 1080p None 2025-03-01
7 log 2025-03-01 13:58:47 121.87.xx.xxx Mozilla/5.0... ... None None None 1080p None 2025-03-01
8 log 2025-03-01 09:01:08 59.139.xxx.xxx Mozilla/5.0... ... None None None 1080p None 2025-03-01
9 log 2025-03-01 04:14:18 58.91.xxx.xx Mozilla/5.0... ... None None None 1080p None 2025-03-01