F. Binary Files
We used ParaText to load each CSV file into a Pandas data frame. The data frames are then converted to NPY, HDF5, Pickle, and Feather formats. The functions below were used to perform the conversion. This conversion makes use of the minimum bit depths inferred by ParaText. Thus, the binary files are more compact than using Pandas to load the CSV.
Feather
def convert_feather(df, output_filename):
feather.write_dataframe(df, output_filename)
HDF5
def convert_hdf5(df, output_filename):
X = df.values
f = h5py.File(output_filename, "w")
ds=f.create_dataset("mydataset", X.shape, dtype=X.dtype)
ds[...] = X
NPY
def convert_npy(df, output_filename):
X = df.values
np.save(output_filename, X)
Pickle
def convert_pkl(df, output_filename):
fid = open(output_filename, "wb")
pickle.dump(df, fid)
fid.close()