F. Binary Files

We used ParaText to load each CSV file into a Pandas data frame. The data frames are then converted to NPY, HDF5, Pickle, and Feather formats. The functions below were used to perform the conversion. This conversion makes use of the minimum bit depths inferred by ParaText. Thus, the binary files are more compact than using Pandas to load the CSV.

Feather

def convert_feather(df, output_filename):
    feather.write_dataframe(df, output_filename)

HDF5

def convert_hdf5(df, output_filename):
    X = df.values
    f = h5py.File(output_filename, "w")
    ds=f.create_dataset("mydataset", X.shape, dtype=X.dtype)
    ds[...] = X

NPY

def convert_npy(df, output_filename):
    X = df.values
    np.save(output_filename, X)

Pickle

def convert_pkl(df, output_filename):
    fid = open(output_filename, "wb")
    pickle.dump(df, fid)
    fid.close()

results matching ""

    No results matching ""