mlxtend version: 0.23.0
autompg_data
autompg_data()
Auto MPG dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Auto+MPG -
Number of samples
: 392 -
Continuous target variable
: mpgDataset Attributes:
- 1) cylinders: multi-valued discrete
- 2) displacement: continuous
- 3) horsepower: continuous
- 4) weight: continuous
- 5) acceleration: continuous
- 6) model year: multi-valued discrete
- 7) origin: multi-valued discrete
- 8) car name: string (unique for each instance)
Returns
-
X, y
: [n_samples, n_features], [n_targets]X is the feature matrix with 392 auto samples as rows and 8 feature columns (6 rows with NaNs removed). y is a 1-dimensional array of the target MPG values.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/autompg_data/
boston_housing_data
boston_housing_data()
Boston Housing dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Housing -
Number of samples
: 506 -
Continuous target variable
: MEDVMEDV = Median value of owner-occupied homes in $1000's
Dataset Attributes:
- 1) CRIM per capita crime rate by town
- 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- 3) INDUS proportion of non-retail business acres per town
- 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- 5) NOX nitric oxides concentration (parts per 10 million)
- 6) RM average number of rooms per dwelling
- 7) AGE proportion of owner-occupied units built prior to 1940
- 8) DIS weighted distances to five Boston employment centres
- 9) RAD index of accessibility to radial highways
- 10) TAX full-value property-tax rate per $10,000
- 11) PTRATIO pupil-teacher ratio by town
- 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town
- 13) LSTAT % lower status of the population
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/
iris_data
iris_data(version='uci')
Iris flower dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Iris -
Number of samples
: 150 -
Class labels
: {0, 1, 2}, distribution: [50, 50, 50]0 = setosa, 1 = versicolor, 2 = virginica.
Dataset Attributes:
- 1) sepal length [cm]
- 2) sepal width [cm]
- 3) petal length [cm]
- 4) petal width [cm]
Parameters
-
version
: string, optional (default: 'uci').Version to use {'uci', 'corrected'}. 'uci' loads the dataset as deposited on the UCI machine learning repository, and 'corrected' provides the version that is consistent with Fisher's original paper. See Note for details.
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2}
Note
The Iris dataset (originally collected by Edgar Anderson) and
available in UCI's machine learning repository is different from
the Iris dataset described in the original paper by R.A. Fisher [1]).
Precisely, there are two data points (row number
34 and 37) in UCI's Machine Learning repository are different from the
origianlly published Iris dataset. Also, the original version of the Iris
Dataset, which can be loaded via version='corrected'
is the same
as the one in R.
[1] . A. Fisher (1936). "The use of multiple measurements in taxonomic
problems". Annals of Eugenics. 7 (2): 179–188
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/iris_data/
loadlocal_mnist
loadlocal_mnist(images_path, labels_path)
Read MNIST from ubyte files.
Parameters
-
images_path
: strpath to the test or train MNIST ubyte file
-
labels_path
: strpath to the test or train MNIST class labels file
Returns
-
images
: [n_samples, n_pixels] numpy.arrayPixel values of the images.
-
labels
: [n_samples] numpy arrayTarget class labels
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/loadlocal_mnist/
make_multiplexer_dataset
make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None)
Function to create a binary n-bit multiplexer dataset.
New in mlxtend v0.9
Parameters
-
address_bits
: int (default: 2)A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If
address_bits=3
, then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features. -
sample_size
: int (default: 100)The total number of samples generated.
-
positive_class_ratio
: float (default: 0.5)The fraction (a float between 0 and 1) of samples in the
sample_size
d dataset that have class label 1. Ifpositive_class_ratio=0.5
(default), then the ratio of class 0 and class 1 samples is perfectly balanced. -
shuffle
: Bool (default: False)Whether or not to shuffle the features and labels. If
False
(default), the samples are returned in sorted order starting withsample_size
/2 samples with class label 0 and followed bysample_size
/2 samples with class label 1. -
random_seed
: int (default: None)Random seed used for generating the multiplexer samples and shuffling.
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with the number of samples equal to
sample_size
. The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset
mnist_data
mnist_data()
5000 samples from the MNIST handwritten digits dataset.
Data Source
: https://yann.lecun.com/exdb/mnist/
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 5000 image samples as rows, each row consists of 28x28 pixels that were unrolled into 784 pixel feature vectors. y contains the 10 unique class labels 0-9.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/mnist_data/
three_blobs_data
three_blobs_data()
A random dataset of 3 2D blobs for clustering.
-
Number of samples
: 150 -
Suggested labels
: {0, 1, 2}, distribution: [50, 50, 50]
Returns
-
X, y
: [n_samples, n_features], [n_cluster_labels]X is the feature matrix with 159 samples as rows and 2 feature columns. y is a 1-dimensional array of the 3 suggested cluster labels 0, 1, 2
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/three_blobs_data
wine_data
wine_data()
Wine dataset.
-
Source
: https://archive.ics.uci.edu/ml/datasets/Wine -
Number of samples
: 178 -
Class labels
: {0, 1, 2}, distribution: [59, 71, 48]Dataset Attributes:
- 1) Alcohol
- 2) Malic acid
- 3) Ash
- 4) Alcalinity of ash
- 5) Magnesium
- 6) Total phenols
- 7) Flavanoids
- 8) Nonflavanoid phenols
- 9) Proanthocyanins
- 10) Color intensity
- 11) Hue
- 12) OD280/OD315 of diluted wines
- 13) Proline
Returns
-
X, y
: [n_samples, n_features], [n_class_labels]X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/wine_data