wine_data: A 3-class wine dataset for classification

A function that loads the Wine dataset into NumPy arrays.

from mlxtend.data import wine_data

Overview

The Wine dataset for classification.


Samples	178
Features	13
Classes	3
Data Set Characteristics:	Multivariate
Attribute Characteristics:	Integer, Real
Associated Tasks:	Classification
Missing Values	None

column	attribute
1)	Class Label
2)	Alcohol
3)	Malic acid
4)	Ash
5)	Alcalinity of ash
6)	Magnesium
7)	Total phenols
8)	Flavanoids
9)	Nonflavanoid phenols
10)	Proanthocyanins
11)	Color intensity
12)	Hue
13)	OD280/OD315 of diluted wines
14)	Proline

class	samples
0	59
1	71
2	48

References

Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
Source: https://archive.ics.uci.edu/ml/datasets/Wine
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

Example 1 - Dataset overview

from mlxtend.data import wine_data
X, y = wine_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['alcohol', 'malic acid', 'ash', 'ash alcalinity',
                        'magnesium', 'total phenols', 'flavanoids',
                        'nonflavanoid phenols', 'proanthocyanins',
                        'color intensity', 'hue', 'OD280/OD315 of diluted wines',
                        'proline'])
print('1st row', X[0])

Dimensions: 178 x 13

Header: ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline']
1st row [  1.42300000e+01   1.71000000e+00   2.43000000e+00   1.56000000e+01
   1.27000000e+02   2.80000000e+00   3.06000000e+00   2.80000000e-01
   2.29000000e+00   5.64000000e+00   1.04000000e+00   3.92000000e+00
   1.06500000e+03]

import numpy as np
print('Classes: %s' % np.unique(y))
print('Class distribution: %s' % np.bincount(y))

Classes: [0 1 2]
Class distribution: [59 71 48]

API

wine_data()

Wine dataset.

Source : https://archive.ics.uci.edu/ml/datasets/Wine
Number of samples : 178
Class labels : {0, 1, 2}, distribution: [59, 71, 48]

Dataset Attributes:
- 1) Alcohol
- 2) Malic acid
- 3) Ash
- 4) Alcalinity of ash
- 5) Magnesium
- 6) Total phenols
- 7) Flavanoids
- 8) Nonflavanoid phenols
- 9) Proanthocyanins
- 10) Color intensity
- 11) Hue
- 12) OD280/OD315 of diluted wines
- 13) Proline

Returns

X, y : [n_samples, n_features], [n_class_labels]

X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/wine_data

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search