🌟
【Python】Split data into Train data and Test data for Cross-validation
Environment
- Python3
- Anaconda
- Jupyter Notebook
Packages
$ pip install flicker
$ pip install pillow
$ pip install sklearn
Assumption
You have data by here
↓↓↓
Coding
gen_data.ipynb
from PIL import Image
import os, glob
import numpy as np
from sklearn import model_selection
classes = ["apple", "banana", "grape"]
num_classes = len(classes)
image_size = 50
X = []
Y = []
for i, classlabel in enumerate(classes):
photos_dir = "./" + classlabel
files = glob.glob(photos_dir + "/*.jpg")
for i, file in enumerate(files):
if i >= 200 : break
image = Image.open(file)
image = image.convert("RGB")
image = image.resize((image_size, image_size))
data = np.asarray(image)
X.append(data)
Y.append(i)
X = np.array(X)
Y = np.array(Y)
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y)
xy = (X_train, X_test, Y_train, Y_test)
np.save("./fruit.npy", xy)
Check data
len(X_train)
len(X_test)
len(Y_train)
len(Y_test)
Discussion