wine_data: A 3-class wine dataset for classification

A function that loads the Wine dataset into NumPy arrays.

from mlxtend.data import wine_data

Overview

The Wine dataset for classification.

Samples 178
Features 13
Classes 3
Data Set Characteristics: Multivariate
Attribute Characteristics: Integer, Real
Associated Tasks: Classification
Missing Values None
column attribute
1) Class Label
2) Alcohol
3) Malic acid
4) Ash
5) Alcalinity of ash
6) Magnesium
7) Total phenols
8) Flavanoids
9) Nonflavanoid phenols
10) Proanthocyanins
11) Color intensity
12) Hue
13) OD280/OD315 of diluted wines
14) Proline
class samples
0 59
1 71
2 48

References

  • Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy.
  • Source: https://archive.ics.uci.edu/ml/datasets/Wine
  • Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

Example 1 - Dataset overview

from mlxtend.data import wine_data
X, y = wine_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\nHeader: %s' % ['alcohol', 'malic acid', 'ash', 'ash alcalinity',
                        'magnesium', 'total phenols', 'flavanoids',
                        'nonflavanoid phenols', 'proanthocyanins',
                        'color intensity', 'hue', 'OD280/OD315 of diluted wines',
                        'proline'])
print('1st row', X[0])
Dimensions: 178 x 13

Header: ['alcohol', 'malic acid', 'ash', 'ash alcalinity', 'magnesium', 'total phenols', 'flavanoids', 'nonflavanoid phenols', 'proanthocyanins', 'color intensity', 'hue', 'OD280/OD315 of diluted wines', 'proline']
1st row [  1.42300000e+01   1.71000000e+00   2.43000000e+00   1.56000000e+01
   1.27000000e+02   2.80000000e+00   3.06000000e+00   2.80000000e-01
   2.29000000e+00   5.64000000e+00   1.04000000e+00   3.92000000e+00
   1.06500000e+03]
import numpy as np
print('Classes: %s' % np.unique(y))
print('Class distribution: %s' % np.bincount(y))
Classes: [0 1 2]
Class distribution: [59 71 48]

API

wine_data()

Wine dataset.

  • Source : https://archive.ics.uci.edu/ml/datasets/Wine

  • Number of samples : 178

  • Class labels : {0, 1, 2}, distribution: [59, 71, 48]

    Dataset Attributes:

    • 1) Alcohol
    • 2) Malic acid
    • 3) Ash
    • 4) Alcalinity of ash
    • 5) Magnesium
    • 6) Total phenols
    • 7) Flavanoids
    • 8) Nonflavanoid phenols
    • 9) Proanthocyanins
    • 10) Color intensity
    • 11) Hue
    • 12) OD280/OD315 of diluted wines
    • 13) Proline

Returns

  • X, y : [n_samples, n_features], [n_class_labels]

    X is the feature matrix with 178 wine samples as rows and 13 feature columns. y is a 1-dimensional array of the 3 class labels 0, 1, 2

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/wine_data