🔰

RustでPandasライクにsumを使う(JavaScript->WASMを想定)

2022/08/02に公開

Rust

WebAssembly

idea

Rustを勉強したばかりで、個人的にできないことをまとめています。

前提

Client完結型のWebアプリに組み込める
RustのコードをWASMに変換できる
JavaScriptからWASMへjsonデータが渡されることを想定

やりたいこと1: Pandasライクにsumを使う

Python

import json
import numpy as np
import pandas as pd
json_str = """[ {"a": 1, "b": "hello1"},
                {"a": 2, "b": "hello2"},
                {"a": 3, "b": "hello3"} ]"""
df = pd.DataFrame(json.loads(json_str))
print(df["a"].sum())
# 6

特にカラム指定など必要なく、ただただシンプルに集計を実行できるPandas すごい
今回はこのsumをRustで実装していきます。
また、RustにもPandasのようなクレートpolarsがありますが、wasmに対応していないため今回は利用しません。

Rust take01: 単純な実装

#[derive(Debug)]
struct Dtypes {
    a: u32,
    b: String,
}

#[derive(Debug)]
struct DataFrameJson {
    rec: Vec<Dtypes>,
}

fn sum_u32(df: DataFrameJson) -> u32 {
    let mut result = 0;
    for i in df.rec {
        result += i.a;
    }
    result
}

fn main(){
    let df = DataFrameJson{rec: vec![
        Dtypes{a:1,b:"hello1".to_string()},
        Dtypes{a:2,b:"hello2".to_string()},
        Dtypes{a:3,b:"hello3".to_string()},
    ]};
    println!("{:?}",sum_u32(df));
}
// 6

デメリット:Pandasと比較するとコードが多く、かつカラム変化に対応できない
- -> とりあえず、メソッドとして実装してみる

Rust take02: メソッド

#[derive(Debug)]
struct Dtypes {
    a: u32,
    b: String,
}

#[derive(Debug)]
struct DataFrameJson {
    rec: Vec<Dtypes>,
}

impl DataFrameJson {
    fn sum_u32(self: &Self) -> u32 {
        let mut result = 0;
        for i in &self.rec {
            result += i.a;
        }
        result
    }
}

fn main(){
    let df = DataFrameJson{rec: vec![
        Dtypes{a:1,b:"hello1".to_string()},
        Dtypes{a:2,b:"hello2".to_string()},
        Dtypes{a:3,b:"hello3".to_string()},
    ]};
    println!("{:?}",df.sum_u32());
}
// 6

テーブルデータが増えるとコード量が多くなってしまう

#[derive(Debug)]
struct Dtypes {
    a: u32,
    b: String,
}
#[derive(Debug)]
struct Dtypes02 {
    d: u32,
    e: String,
}

#[derive(Debug)]
struct DataFrameJson {
    rec: Vec<Dtypes>,
}
#[derive(Debug)]
struct DataFrameJson02 {
    rec: Vec<Dtypes02>,
}

impl DataFrameJson {
    fn sum_u32(self: &Self) -> u32 {
        let mut result = 0;
        for i in &self.rec {
            result += i.a;
        }
        result
    }
}
impl DataFrameJson02 {
    fn sum_u32(self: &Self) -> u32 {
        let mut result = 0;
        for i in &self.rec {
            result += i.d;
        }
        result
    }
}


fn main(){
    let df = DataFrameJson{rec: vec![
        Dtypes{a:1,b:"hello1".to_string()},
        Dtypes{a:2,b:"hello2".to_string()},
        Dtypes{a:3,b:"hello3".to_string()},
    ]};
    println!("{:?}",df.sum_u32());
    
    let df = DataFrameJson02{rec: vec![
        Dtypes02{d:10,e:"hello1".to_string()},
        Dtypes02{d:20,e:"hello2".to_string()},
        Dtypes02{d:30,e:"hello3".to_string()},
    ]};
    println!("{:?}",df.sum_u32());
}
//6
//60

よかった点: "."を使った(df.sum_u32)関数の呼び出しができる.
デメリット: テーブルデータが増えたときの実装コストが高い
- -> ジェネリクス
  - ↑ NG: 引数やselfに型が現れないシグニチャ内に型をいれられないため利用できない(カラム数も可変であるためより難しそう)
- -> トレイト
  - ↑ 上と同じ理由で利用不可

そもそも構造がよくない?
-> DataFrameJsonのfieldをvecに変える

Rust take03: fieldをvec

#[derive(Debug)]
struct Dtypes {
    a: u32,
    b: String,
}

#[derive(Debug)]
struct DataFrameJson {
    rec: Vec<Dtypes>,
}

#[derive(Debug)]
struct DataFrame {
    a: Vec<u32>,
    b: Vec<String>,
}

fn json2dataframe(df_json: DataFrameJson) -> DataFrame {
    let length = df_json.rec.len();
    let mut df = DataFrame {
        a: Vec::with_capacity(length),
        b: Vec::with_capacity(length),
    };
    for i in df_json.rec {
        df.a.push(i.a);
        df.b.push(i.b);
    }
    df
}

fn main(){
    let df = DataFrameJson{rec: vec![
        Dtypes{a:1,b:"hello1".to_string()},
        Dtypes{a:2,b:"hello2".to_string()},
        Dtypes{a:3,b:"hello3".to_string()},
    ]};
    println!("{:?}",&df);
    
    let df = json2dataframe(df);
    println!("{:?}",&df);
    
    let result:u32 = df.a.iter().sum();
    println!("{:?}",result);
    println!("{:?}",df.a.iter().min());
    println!("{:?}",df.a.iter().max());
}
// DataFrameJson { rec: [Dtypes { a: 1, b: "hello1" }, Dtypes { a: 2, b: "hello2" }, Dtypes { a: 3, b: "hello3" }] }
// DataFrame { a: [1, 2, 3], b: ["hello1", "hello2", "hello3"] }
// 6
// Some(1)
// Some(3)

iter()が入ってしまっているけどpandasらしい実装にできた！　ついでにminやmaxもすでに組み込まれていて便利

デメリット: json2dataframe という関数をstructの数だけ作らないといけない
- -> 現状アイデアが思い浮かばない...

スコープからはずれるがけれど、take02までは行レベルの操作が簡単そうだが、今回のものはすこし大変そうになってしまった

Rust take04: xxx

GitHubで編集を提案