OneRClassifier: One Rule (OneR) method for classfication
And implementation of the One Rule (OneR) method for classfication.
from mlxtend.classifier import OneRClassifier
Overview
"OneR" stands for One Rule (by Robert Holte [1]), which is a classic algorithm for supervised learning. Note that this algorithm is not known for its good prediction performance; thus, it is rather recommended for teaching purposes and for lower-bound performance baselines in real-world applications.
The name "OneRule" can be a bit misleading, because it is technically about "one feature" and not about "one rule." I.e., OneR returns a feature for which one or more decision rules are defined. Essentially, as a simple classifier, it finds exactly one feature (and one or more feature values for that feature) to classify data instances.
The basic procedure is as follows:
- For each feature among all features (columns) in the dataset:
- For each feature value for the given feature:
- Obtain the training examples with that feature value.
- Obtain the class labels (and class label counts) corresponding to the training examples identified in the previous step.
- Regard the class label with the highest frequency (count) as the majority class.
- Record the number of errors as the number of training examples that have the given feature value but are not the majority class.
- Compute the error of the feature by summing the errors for all possible feature values for that feature.
- For each feature value for the given feature:
- Return the best feature, which is defined as the feature with the lowest error.
Please note that the OneR algorithm assumes categorical (or discretized) feature values. A nice explanation and OneR classifier can be found in the Interpretable Machine Learning online chapter "4.5.1 Learn Rules from a Single Feature (OneR)" (https://christophm.github.io/interpretable-ml-book/rules.html, [2]).
References
[1] Holte, Robert C. "Very simple classification rules perform well on most commonly used datasets." Machine learning 11.1 (1993): 63-90.
[2] Interpretable Machine Learning (2018) by Christoph Molnar: https://christophm.github.io/interpretable-ml-book/rules.html
Example 1 -- Demonstrating OneR on a discretized Iris dataset
As mentioned in the overview text above, the OneR algorithm expects categorical or discretized features. The OneRClassifier
implementation in MLxtend does not modify features in the dataset, and ensuring that the features are categorical is a responsibility of the user.
In the following example, we will discretize the Iris dataset. In particular, we will convert the dataset into quartiles. In other words each feature value gets replaced by a categorical value. For sepal width (the first column in Iris), this would be
- (0, 5.1] => 0
- (5.1, 5.8] => 1
- (5.8, 6.4] => 2
- (6,4, 7.9] => 3
Below are the first 15 rows (flowers) of the original Iris data:
from mlxtend.data import iris_data
X, y = iris_data()
X[:15]
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1],
[5.4, 3.7, 1.5, 0.2],
[4.8, 3.4, 1.6, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5.8, 4. , 1.2, 0.2]])
Below is the discretized dataset. Each feature is divided into 4 quartiles.
import numpy as np
def get_feature_quartiles(X):
X_discretized = X.copy()
for col in range(X.shape[1]):
for q, class_label in zip([1.0, 0.75, 0.5, 0.25], [3, 2, 1, 0]):
threshold = np.quantile(X[:, col], q=q)
X_discretized[X[:, col] <= threshold, col] = class_label
return X_discretized.astype(np.int)
Xd = get_feature_quartiles(X)
Xd[:15]
array([[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 2, 0, 0],
[0, 2, 0, 0],
[0, 3, 0, 0],
[1, 3, 1, 1],
[0, 3, 0, 0],
[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 2, 0, 0],
[1, 3, 0, 0],
[0, 3, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 3, 0, 0]])
Given a dataset with categorical features, we can use the OneR classifier like similar to a scikit-learn estimator for classification. First, let's divide the dataset into training and test data:
from sklearn.model_selection import train_test_split
Xd_train, Xd_test, y_train, y_test = train_test_split(Xd, y, random_state=0, stratify=y)
Next, we can train a OneRClassifier
model on the training set using the fit
method:
from mlxtend.classifier import OneRClassifier
oner = OneRClassifier()
oner.fit(Xd_train, y_train);
The column index of the selected feature is accessible via the feature_idx_
attribute after model fitting:
oner.feature_idx_
2
There is also a prediction_dict_
available after model fitting. It lists the total error for the selected feature (i.e., the feature listed under feature_idx_
). Also it provides the classification rules:
oner.prediction_dict_
{'total error': 16, 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2}}
I.e., 'rules (value: class)': {0: 0, 1: 1, 2: 1, 3: 2}
means that there are 3 rules for the selected feature (petal length):
- if value is 0, classify as 0 (Iris-setosa)
- if value is 1, classify as 1 (Iris-versicolor)
- if value is 2, classify as 1 (Iris-versicolor)
- if value is 3, classify as 2 (Iris-virginica)
After model fitting we can use the oner
object for making predictions:
oner.predict(Xd_train)
array([1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 2, 2, 1,
0, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 0, 0, 1, 2, 1, 1, 2, 2, 1, 0, 1,
1, 1, 2, 0, 1, 2, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 2, 0, 1, 1,
0, 1, 2, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 0, 1, 1, 2, 0, 0, 0, 1,
0, 1, 2, 2, 2, 0, 1, 0, 2, 0, 1, 1, 1, 1, 0, 2, 2, 0, 1, 1, 0, 2,
1, 2])
y_pred = oner.predict(Xd_train)
train_acc = np.mean(y_pred == y_train)
print(f'Training accuracy {train_acc*100:.2f}%')
Training accuracy 85.71%
y_pred = oner.predict(Xd_test)
test_acc = np.mean(y_pred == y_test)
print(f'Test accuracy {test_acc*100:.2f}%')
Test accuracy 84.21%
Instead of computing the prediction accuracy manually as shown above, we can also use the score
method:
test_acc = oner.score(Xd_test, y_test)
print(f'Test accuracy {test_acc*100:.2f}%')
Test accuracy 84.21%
API
OneRClassifier(resolve_ties='first')
OneR (One Rule) Classifier.
Parameters
-
resolve_ties
: str (default: 'first')Option for how to resolve ties if two or more features have the same error. Options are - 'first' (default): chooses first feature in the list, i.e., feature with the lower column index. - 'chi-squared': performs a chi-squared test for each feature against the target and selects the feature with the lowest p-value.
Attributes
-
self.classes_labels_
: array-like, shape = [n_labels]Array containing the unique class labels found in the training set.
-
self.feature_idx_
: intThe index of the rules' feature based on the column in the training set.
-
self.p_value_
: floatThe p value for a given feature. Only available after calling
fit
when the OneR attributeresolve_ties = 'chi-squared'
is set. -
self.prediction_dict_
: dictDictionary containing information about the feature's (self.feature_idx_) rules and total error. E.g.,
{'total error': 37, 'rules (value: class)': {0: 0, 1: 2}}
means the total error is 37, and the rules are "if feature value == 0 classify as 0" and "if feature value == 1 classify as 2". (And classify as class 1 otherwise.)For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/classifier/OneRClassifier/
Methods
fit(X, y)
Learn rule from training data.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples]Target values.
Returns
self
: object
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: mapping of string to anyParameter names mapped to their values.
predict(X)
Predict class labels for X.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
Returns
-
maj
: array-like, shape = [n_samples]Predicted class labels.
score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters
-
X
: array-like of shape (n_samples, n_features)Test samples.
-
y
: array-like of shape (n_samples,) or (n_samples, n_outputs)True labels for X.
-
sample_weight
: array-like of shape (n_samples,), default=NoneSample weights.
Returns
-
score
: floatMean accuracy of self.predict(X) wrt. y.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each
component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: objectEstimator instance.
ython