Methods

We compared ParaText against 5 other CSV readers commonly used for Data Science: Wise.io ParaText, Pandas read_csv, R's built-in read.csv, readr for R, fread for R, NumPy’s text loader, DataBricks SparkCSV, and Dato SFrame read_csv.

Software Version Installation Method
ParaText 0.1.1 python setup.py build install
Pandas read_csv 0.18.0 bundled with Anaconda
R read.csv 3.0.2 bundled with R
readr read_csv 0.2.2 installed via CRAN
data.table fread 1.9.6 installed via CRAN
NumPy loadtxt 1.10.4 bundled with Anaconda
DataBricks SparkCSV com.databricks:spark-csv_2.11:1.4.0 bundled with Spark
Dato SFrame.read_csv 1.9 pip install sframe

We also compared ParaText against 5 binary readers of numeric and data frame data.

Format Software Version Installation Method
HDF5 h5py 2.5.0 Anaconda
Feather feather 0.2.0 pip install sframe
NPY NumPy 1.10.4 Anaconda
Pickle Python 2.7.11 Anaconda

Additionally, the following software was required:

Software Version Installation Method
Ubuntu 14.04 Launch ami-fce3c696 via AWS
Spark PySpark 1.6.1, Hadoop 2.6 Download binary
Anaconda 4.0.0 Download binary
NumPy 1.10.4 Anaconda
mdadm 3.2.5 apt-get install madm
g++ 4.8.4 apt-get install g++
python 2.7.11 Anaconda
R 3.0.2 bundled with standard AWS Ubuntu AMI
SWIG 2.0.11 apt-get install swig
libcurl 2.0.11 `apt-get install libcurl4-openssl-dev
Java (Open JDK) 1.7.0_101 apt-get install openjdk-7-jre

results matching ""

    No results matching ""