iris_data: The 3-class iris dataset for classification

A function that loads the iris dataset into NumPy arrays.

from import iris_data


The Iris dataset for classification.


  1. Sepal length
  2. Sepal width
  3. Petal length
  4. Petal width

  5. Number of samples: 150

  6. Target variable (discrete): {50x Setosa, 50x Versicolor, 50x Virginica}


Example 1 - Dataset overview

from import iris_data
X, y = iris_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['sepal length', 'sepal width',
                        'petal length', 'petal width'])
print('1st row', X[0])
Dimensions: 150 x 4

Header: ['sepal length', 'sepal width', 'petal length', 'petal width']
1st row [5.1 3.5 1.4 0.2]
import numpy as np
print('Classes: Setosa, Versicolor, Virginica')
print('Class distribution: %s' % np.bincount(y))
Classes: Setosa, Versicolor, Virginica
[0 1 2]
Class distribution: [50 50 50]



Iris flower dataset.




The Iris dataset (originally collected by Edgar Anderson) and available in UCI's machine learning repository is different from the Iris dataset described in the original paper by R.A. Fisher [1]). Precisely, there are two data points (row number 34 and 37) in UCI's Machine Learning repository are different from the origianlly published Iris dataset. Also, the original version of the Iris Dataset, which can be loaded via version='corrected' is the same as the one in R.

[1] . A. Fisher (1936). "The use of multiple measurements in taxonomic
problems". Annals of Eugenics. 7 (2): 179–188


For usage examples, please see