OneRClassifier -- "One Rule" for Classification

And implementation of the One Rule (OneR) method for classfication.

from mlxtend.classifier import OneRClassifier

Overview

"OneR" stands for One Rule (by Robert Holte [1]), which is a classic algorithm for supervised learning. Note that this algorithm is not known for its good prediction performance; thus, it is rather recommended for teaching purposes and for lower-bound performance baselines in real-world applications.

The name "OneRule" can be a bit misleading, because it is technically about "one feature" and not about "one rule." I.e., OneR returns a feature for which one or more decision rules are defined. Essentially, as a simple classifier, it finds exactly one feature (and one or more feature values for that feature) to classify data instances.

The basic procedure is as follows:

Please note that the OneR algorithm assumes categorical (or discretized) feature values. A nice explanation and OneR classifier can be found in the Interpretable Machine Learning online chapter "4.5.1 Learn Rules from a Single Feature (OneR)" (https://christophm.github.io/interpretable-ml-book/rules.html, [2]).

References

[1] Holte, Robert C. "Very simple classification rules perform well on most commonly used datasets." Machine learning 11.1 (1993): 63-90.

[2] Interpretable Machine Learning (2018) by Christoph Molnar: https://christophm.github.io/interpretable-ml-book/rules.html

Example 1 -- Demonstrating OneR on a discretized Iris dataset

As mentioned in the overview text above, the OneR algorithm expects categorical or discretized features. The OneRClassifier implementation in MLxtend does not modify features in the dataset, and ensuring that the features are categorical is a responsibility of the user.

In the following example, we will discretize the Iris dataset. In particular, we will convert the dataset into quartiles. In other words each feature value gets replaced by a categorical value. For sepal width (the first column in Iris), this would be

Below are the first 15 rows (flowers) of the original Iris data:

from mlxtend.data import iris_data


X, y = iris_data()
X[:15]
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2]])

Below is the discretized dataset. Each feature is divided into 4 quartiles.

import numpy as np


def get_feature_quartiles(X):
    X_discretized = X.copy()
    for col in range(X.shape[1]):
        for q, class_label in zip([1.0, 0.75, 0.5, 0.25], [3, 2, 1, 0]):
            threshold = np.quantile(X[:, col], q=q)
            X_discretized[X[:, col] <= threshold, col] = class_label
    return X_discretized.astype(np.int)

Xd = get_feature_quartiles(X)
Xd[:15]
array([[0, 3, 0, 0],
       [0, 1, 0, 0],
       [0, 2, 0, 0],
       [0, 2, 0, 0],
       [0, 3, 0, 0],
       [1, 3, 1, 1],
       [0, 3, 0, 0],
       [0, 3, 0, 0],
       [0, 1, 0, 0],
       [0, 2, 0, 0],
       [1, 3, 0, 0],
       [0, 3, 0, 0],
       [0, 1, 0, 0],
       [0, 1, 0, 0],
       [1, 3, 0, 0]])

Given a dataset with categorical features, we can use the OneR classifier like similar to a scikit-learn estimator for classification. First, let's divide the dataset into training and test data:

from sklearn.model_selection import train_test_split


Xd_train, Xd_test, y_train, y_test = train_test_split(Xd, y, random_state=0, stratify=y)

Next, we can train a OneRClassifier model on the training set using the fit method:

from mlxtend.classifier import OneRClassifier
oner = OneRClassifier()

oner.fit(Xd_train, y_train);

The column index of the selected feature is accessible via the feature_idx_ attribute after model fitting:

oner.feature_idx_
2

There is also a prediction_dict_ available after model fitting. It lists the total error for the selected feature (i.e., the feature listed under feature_idx_). Also it provides the classification rules:

oner.prediction_dict_
{'total error': 16, 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2}}

I.e., 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2} means that there are 3 rules for the selected feature (petal length): - if value is 0, classify as 0 (Iris-setosa) - if value is 1, classify as 1 (Iris-versicolor) - if value is 2, classify as 1 (Iris-versicolor) - if value is 3, classify as 2 (Iris-virginica)

After model fitting we can use the oner object for making predictions:

oner.predict(Xd_train)
array([1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 2, 2, 1,
       0, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 0, 0, 1, 2, 1, 1, 2, 2, 1, 0, 1,
       1, 1, 2, 0, 1, 2, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 2, 0, 1, 1,
       0, 1, 2, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 0, 1, 1, 2, 0, 0, 0, 1,
       0, 1, 2, 2, 2, 0, 1, 0, 2, 0, 1, 1, 1, 1, 0, 2, 2, 0, 1, 1, 0, 2,
       1, 2])
y_pred = oner.predict(Xd_train)
train_acc = np.mean(y_pred == y_train)  
print(f'Training accuracy {train_acc*100:.2f}%')
Training accuracy 85.71%
y_pred = oner.predict(Xd_test)
test_acc = np.mean(y_pred == y_test)  
print(f'Test accuracy {test_acc*100:.2f}%')
Test accuracy 84.21%

Instead of computing the prediction accuracy manually as shown above, we can also use the score method:

test_acc = oner.score(Xd_test, y_test)
print(f'Test accuracy {test_acc*100:.2f}%')
Test accuracy 84.21%

API

OneRClassifier(resolve_ties='first')

OneR (One Rule) Classifier.

Parameters

Attributes

Methods


fit(X, y)

Learn rule from training data.

Parameters

Returns


get_params(deep=True)

Get parameters for this estimator.

Parameters

Returns


predict(X)

Predict class labels for X.

Parameters

Returns


score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Returns


set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

Returns

ython