boston_housing_data: The Boston housing dataset for regression

A function that loads the boston_housing_data dataset into NumPy arrays.

from mlxtend.data import boston_housing_data

Overview

The Boston Housing dataset for regression analysis.

Features

  1. CRIM: per capita crime rate by town
  2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS: proportion of non-retail business acres per town
  4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX: nitric oxides concentration (parts per 10 million)
  6. RM: average number of rooms per dwelling
  7. AGE: proportion of owner-occupied units built prior to 1940
  8. DIS: weighted distances to five Boston employment centres
  9. RAD: index of accessibility to radial highways
  10. TAX: full-value property-tax rate per $10,000
  11. PTRATIO: pupil-teacher ratio by town
  12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of b. by town
  13. LSTAT: % lower status of the population

  14. Number of samples: 506

  15. Target variable (continuous): MEDV, Median value of owner-occupied homes in $1000's

References

Example 1 - Dataset overview

from mlxtend.data import boston_housing_data
X, y = boston_housing_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('1st row', X[0])
(506, 14)
Dimensions: 506 x 13
1st row [  6.32000000e-03   1.80000000e+01   2.31000000e+00   0.00000000e+00
   5.38000000e-01   6.57500000e+00   6.52000000e+01   4.09000000e+00
   1.00000000e+00   2.96000000e+02   1.53000000e+01   3.96900000e+02
   4.98000000e+00]

API

boston_housing_data()

Boston Housing dataset.

  • Source : https://archive.ics.uci.edu/ml/datasets/Housing

  • Number of samples : 506

  • Continuous target variable : MEDV

    MEDV = Median value of owner-occupied homes in $1000's

    Dataset Attributes:

    • 1) CRIM per capita crime rate by town
    • 2) ZN proportion of residential land zoned for lots over 25,000 sq.ft.
    • 3) INDUS proportion of non-retail business acres per town
    • 4) CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    • 5) NOX nitric oxides concentration (parts per 10 million)
    • 6) RM average number of rooms per dwelling
    • 7) AGE proportion of owner-occupied units built prior to 1940
    • 8) DIS weighted distances to five Boston employment centres
    • 9) RAD index of accessibility to radial highways
    • 10) TAX full-value property-tax rate per $10,000
    • 11) PTRATIO pupil-teacher ratio by town
    • 12) B 1000(Bk - 0.63)^2 where Bk is the prop. of b. by town
    • 13) LSTAT % lower status of the population

Returns

  • X, y : [n_samples, n_features], [n_class_labels]

    X is the feature matrix with 506 housing samples as rows and 13 feature columns. y is a 1-dimensional array of the continuous target variable MEDV

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/boston_housing_data/