iris_data: The 3-class iris dataset for classification

A function that loads the iris dataset into NumPy arrays.

from mlxtend.data import iris_data

Overview

The Iris dataset for classification.

Features

Sepal length
Sepal width
Petal length
Petal width
Number of samples: 150
Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica}

References

Source: https://archive.ics.uci.edu/ml/datasets/Iris
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

Example 1 - Dataset overview

from mlxtend.data import iris_data
X, y = iris_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['sepal length', 'sepal width',
                        'petal length', 'petal width'])
print('1st row', X[0])

Dimensions: 150 x 4

Header: ['sepal length', 'sepal width', 'petal length', 'petal width']
1st row [5.1 3.5 1.4 0.2]

import numpy as np
print('Classes: Setosa, Versicolor, Virginica')
print(np.unique(y))
print('Class distribution: %s' % np.bincount(y))

Classes: Setosa, Versicolor, Virginica
[0 1 2]
Class distribution: [50 50 50]

API

iris_data(version='uci')

Iris flower dataset.

Source : https://archive.ics.uci.edu/ml/datasets/Iris
Number of samples : 150
Class labels : {0, 1, 2}, distribution: [50, 50, 50]

0 = setosa, 1 = versicolor, 2 = virginica.

Dataset Attributes:
- 1) sepal length [cm]
- 2) sepal width [cm]
- 3) petal length [cm]
- 4) petal width [cm]

Parameters

version : string, optional (default: 'uci').

Version to use {'uci', 'corrected'}. 'uci' loads the dataset as deposited on the UCI machine learning repository, and 'corrected' provides the version that is consistent with Fisher's original paper. See Note for details.

Returns

X, y : [n_samples, n_features], [n_class_labels]

X is the feature matrix with 150 flower samples as rows, and 4 feature columns sepal length, sepal width, petal length, and petal width. y is a 1-dimensional array of the class labels {0, 1, 2}

Note

The Iris dataset (originally collected by Edgar Anderson) and available in UCI's machine learning repository is different from the Iris dataset described in the original paper by R.A. Fisher [1]). Precisely, there are two data points (row number 34 and 37) in UCI's Machine Learning repository are different from the origianlly published Iris dataset. Also, the original version of the Iris Dataset, which can be loaded via version='corrected' is the same as the one in R.

[1] . A. Fisher (1936). "The use of multiple measurements in taxonomic
problems". Annals of Eugenics. 7 (2): 179–188

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/iris_data/

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search