🌟

【Python】Split data into Train data and Test data for Cross-validation

2022/04/05に公開

Environment

  • Python3
  • Anaconda
  • Jupyter Notebook

Packages

$ pip install flicker
$ pip install pillow
$ pip install sklearn

Assumption

You have data by here

↓↓↓

https://zenn.dev/deeprecommend/articles/e857b15463835c

Coding

gen_data.ipynb

from PIL import Image
import os, glob
import numpy as np
from sklearn import model_selection

classes = ["apple", "banana", "grape"]
num_classes = len(classes)
image_size = 50

X = []
Y = []
for i, classlabel in enumerate(classes):
    photos_dir = "./" + classlabel
    files = glob.glob(photos_dir + "/*.jpg")
    for i, file in enumerate(files):
        if i >= 200 : break
        image = Image.open(file)
        image = image.convert("RGB")
        image = image.resize((image_size, image_size))
        data = np.asarray(image)
        X.append(data)
        Y.append(i)


X = np.array(X)
Y = np.array(Y)

X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y)
xy = (X_train, X_test, Y_train, Y_test)
np.save("./fruit.npy", xy)

Check data

len(X_train)
len(X_test)
len(Y_train)
len(Y_test)

Discussion