OneRClassifier: One Rule (OneR) method for classification

And implementation of the One Rule (OneR) method for classification.

from mlxtend.classifier import OneRClassifier

Overview

"OneR" stands for One Rule (by Robert Holte [1]), which is a classic algorithm for supervised learning. Note that this algorithm is not known for its good prediction performance; thus, it is rather recommended for teaching purposes and for lower-bound performance baselines in real-world applications.

The name "OneRule" can be a bit misleading, because it is technically about "one feature" and not about "one rule." I.e., OneR returns a feature for which one or more decision rules are defined. Essentially, as a simple classifier, it finds exactly one feature (and one or more feature values for that feature) to classify data instances.

The basic procedure is as follows:

• For each feature among all features (columns) in the dataset:
• For each feature value for the given feature:
1. Obtain the training examples with that feature value.
2. Obtain the class labels (and class label counts) corresponding to the training examples identified in the previous step.
3. Regard the class label with the highest frequency (count) as the majority class.
4. Record the number of errors as the number of training examples that have the given feature value but are not the majority class.
• Compute the error of the feature by summing the errors for all possible feature values for that feature.
• Return the best feature, which is defined as the feature with the lowest error.

Please note that the OneR algorithm assumes categorical (or discretized) feature values. A nice explanation and OneR classifier can be found in the Interpretable Machine Learning online chapter "4.5.1 Learn Rules from a Single Feature (OneR)" (https://christophm.github.io/interpretable-ml-book/rules.html, [2]).

References

[1] Holte, Robert C. "Very simple classification rules perform well on most commonly used datasets." Machine learning 11.1 (1993): 63-90.

[2] Interpretable Machine Learning (2018) by Christoph Molnar: https://christophm.github.io/interpretable-ml-book/rules.html

Example 1 -- Demonstrating OneR on a discretized Iris dataset

As mentioned in the overview text above, the OneR algorithm expects categorical or discretized features. The OneRClassifier implementation in MLxtend does not modify features in the dataset, and ensuring that the features are categorical is a responsibility of the user.

In the following example, we will discretize the Iris dataset. In particular, we will convert the dataset into quartiles. In other words each feature value gets replaced by a categorical value. For sepal width (the first column in Iris), this would be

• (0, 5.1] => 0
• (5.1, 5.8] => 1
• (5.8, 6.4] => 2
• (6,4, 7.9] => 3

Below are the first 15 rows (flowers) of the original Iris data:

from mlxtend.data import iris_data

X, y = iris_data()
X[:15]

array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1],
[5.4, 3.7, 1.5, 0.2],
[4.8, 3.4, 1.6, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5.8, 4. , 1.2, 0.2]])


Below is the discretized dataset. Each feature is divided into 4 quartiles.

import numpy as np

def get_feature_quartiles(X):
X_discretized = X.copy()
for col in range(X.shape[1]):
for q, class_label in zip([1.0, 0.75, 0.5, 0.25], [3, 2, 1, 0]):
threshold = np.quantile(X[:, col], q=q)
X_discretized[X[:, col] <= threshold, col] = class_label
return X_discretized.astype(np.int)

Xd = get_feature_quartiles(X)
Xd[:15]

array([[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 2, 0, 0],
[0, 2, 0, 0],
[0, 3, 0, 0],
[1, 3, 1, 1],
[0, 3, 0, 0],
[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 2, 0, 0],
[1, 3, 0, 0],
[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 3, 0, 0]])


Given a dataset with categorical features, we can use the OneR classifier like similar to a scikit-learn estimator for classification. First, let's divide the dataset into training and test data:

from sklearn.model_selection import train_test_split

Xd_train, Xd_test, y_train, y_test = train_test_split(Xd, y, random_state=0, stratify=y)


Next, we can train a OneRClassifier model on the training set using the fit method:

from mlxtend.classifier import OneRClassifier
oner = OneRClassifier()

oner.fit(Xd_train, y_train);


The column index of the selected feature is accessible via the feature_idx_ attribute after model fitting:

oner.feature_idx_

2


There is also a prediction_dict_ available after model fitting. It lists the total error for the selected feature (i.e., the feature listed under feature_idx_). Also it provides the classification rules:

oner.prediction_dict_

{'total error': 16, 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2}}


I.e., 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2} means that there are 3 rules for the selected feature (petal length): - if value is 0, classify as 0 (Iris-setosa) - if value is 1, classify as 1 (Iris-versicolor) - if value is 2, classify as 1 (Iris-versicolor) - if value is 3, classify as 2 (Iris-virginica)

After model fitting we can use the oner object for making predictions:

oner.predict(Xd_train)

array([1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 2, 2, 1,
0, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 0, 0, 1, 2, 1, 1, 2, 2, 1, 0, 1,
1, 1, 2, 0, 1, 2, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 2, 0, 1, 1,
0, 1, 2, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 0, 1, 1, 2, 0, 0, 0, 1,
0, 1, 2, 2, 2, 0, 1, 0, 2, 0, 1, 1, 1, 1, 0, 2, 2, 0, 1, 1, 0, 2,
1, 2])

y_pred = oner.predict(Xd_train)
train_acc = np.mean(y_pred == y_train)
print(f'Training accuracy {train_acc*100:.2f}%')

Training accuracy 85.71%

y_pred = oner.predict(Xd_test)
test_acc = np.mean(y_pred == y_test)
print(f'Test accuracy {test_acc*100:.2f}%')

Test accuracy 84.21%


Instead of computing the prediction accuracy manually as shown above, we can also use the score method:

test_acc = oner.score(Xd_test, y_test)
print(f'Test accuracy {test_acc*100:.2f}%')

Test accuracy 84.21%


API

OneRClassifier(resolve_ties='first')

OneR (One Rule) Classifier.

Parameters

• resolve_ties : str (default: 'first')

Option for how to resolve ties if two or more features have the same error. Options are - 'first' (default): chooses first feature in the list, i.e., feature with the lower column index. - 'chi-squared': performs a chi-squared test for each feature against the target and selects the feature with the lowest p-value.

Attributes

• self.classes_labels_ : array-like, shape = [n_labels]

Array containing the unique class labels found in the training set.

• self.feature_idx_ : int

The index of the rules' feature based on the column in the training set.

• self.p_value_ : float

The p value for a given feature. Only available after calling fit when the OneR attribute resolve_ties = 'chi-squared' is set.

• self.prediction_dict_ : dict

Dictionary containing information about the feature's (self.feature_idx_) rules and total error. E.g., {'total error': 37, 'rules (value: class)': {0: 0, 1: 2}} means the total error is 37, and the rules are "if feature value == 0 classify as 0" and "if feature value == 1 classify as 2". (And classify as class 1 otherwise.)

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/OneRClassifier/

Methods

fit(X, y)

Learn rule from training data.

Parameters

• X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

• y : array-like, shape = [n_samples]

Target values.

Returns

• self : object

get_params(deep=True)

Get parameters for this estimator.

Parameters

• deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

• params : mapping of string to any

Parameter names mapped to their values.

predict(X)

Predict class labels for X.

Parameters

• X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

• maj : array-like, shape = [n_samples]

Predicted class labels.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

• X : array-like of shape (n_samples, n_features)

Test samples.

• y : array-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

• sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

Returns

• score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters

• **params : dict

Estimator parameters.

Returns

• self : object

Estimator instance.

ython