🐕
【A little knowledge】How save python object in a smaller size
1. Preface
Have you ever experienced having to store the feature data you created in a smaller size while developing a machine learning model? I have.
2. Method
You can use zip format to compress, like gzip, bzip2 and xz.
Compress
import gzip
import bz2
import numpy as np
# Example with gzip
data = np.random.rand(1000, 1000) # Create a numpy array
with gzip.open('data.gz', 'wb') as f:
np.save(f, data) # Save with gzip compression
# Example with bz2
with bz2.open('data.bz2', 'wb') as f:
np.save(f, data) # Save with bz2 compression
If use wanna save complex python object, you can use pickle in combination.
import pickle
import gzip
import bz2
import numpy as np
data = np.random.rand(1000, 1000) # Create a numpy array
dictionary = {'sample':data}
with gzip.open('data.pickle.gz', 'wb') as f:
pickle.dump(dictionary, f) # Compress with gzip
with bz2.open('data.pickle.bz2', 'wb') as f:
pickle.dump(dictionary, f) # Compress with bz2
Unzip
You can unzip in similar way.
import gzip
import bz2
import numpy as np
# Example with gzip
with gzip.open('data.gz', 'rb') as f:
data = np.load(f) # Load with gzip compression
print(data)
# Example with bz2
with bz2.open('data.bz2', 'rb') as f:
data = np.load(f) # Load with bz2 compression
print(data)
import pickle
import gzip
import bz2
with gzip.open('data.pickle.gz', 'rb') as f:
data = pickle.load(f) # Load from gzip-compressed file
print(data)
with bz2.open('data.pickle.bz2', 'rb') as f:
data = pickle.load(f) # Load from bz2-compressed file
print(data)
3. Which should use?
Let's compare those compression method.
In this time, introduce comparison table from this article
・Comparison
・Summary
gzip is a good first choice, becauze it has good balance of time(both compression and extension) and compression rate.
If you need more compression rate, you can consider to select other method, but extension time can be a hindrance when infer some machine learning model.
Discussion