Exhaustive Feature Selector

Implementation of an exhaustive feature selector for sampling and evaluating all possible feature combinations in a specified range.

from mlxtend.feature_selection import ExhaustiveFeatureSelector

Overview

This exhaustive feature selection algorithm is a wrapper approach for brute-force evaluation of feature subsets; the best subset is selected by optimizing a specified performance metric given an arbitrary regressor or classifier. For instance, if the classifier is a logistic regression and the dataset consists of 4 features, the alogorithm will evaluate all 15 feature combinations (if min_features=1 and max_features=4)

  • {0}
  • {1}
  • {2}
  • {3}
  • {0, 1}
  • {0, 2}
  • {0, 3}
  • {1, 2}
  • {1, 3}
  • {2, 3}
  • {0, 1, 2}
  • {0, 1, 3}
  • {0, 2, 3}
  • {1, 2, 3}
  • {0, 1, 2, 3}

and select the one that results in the best performance (e.g., classification accuracy) of the logistic regression classifier.

Example 1 - A simple Iris Example

Initializing a simple classifier from scikit-learn:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

iris = load_iris()
X = iris.data
y = iris.target

knn = KNeighborsClassifier(n_neighbors=3)

efs1 = EFS(knn, 
           min_features=1,
           max_features=4,
           scoring='accuracy',
           print_progress=True,
           cv=5)

efs1 = efs1.fit(X, y)

print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset:', efs1.best_idx_)
Features: 15/15

Best accuracy score: 0.97
Best subset: (0, 2, 3)

Via the subsets_ attribute, we can take a look at the selected feature indices at each step:

efs1.subsets_
{0: {'avg_score': 0.65999999999999992,
  'cv_scores': array([ 0.53333333,  0.63333333,  0.73333333,  0.76666667,  0.63333333]),
  'feature_idx': (0,)},
 1: {'avg_score': 0.56666666666666665,
  'cv_scores': array([ 0.53333333,  0.63333333,  0.6       ,  0.5       ,  0.56666667]),
  'feature_idx': (1,)},
 2: {'avg_score': 0.95333333333333337,
  'cv_scores': array([ 0.93333333,  1.        ,  0.9       ,  0.93333333,  1.        ]),
  'feature_idx': (2,)},
 3: {'avg_score': 0.94666666666666666,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.93333333,  0.86666667,  1.        ]),
  'feature_idx': (3,)},
 4: {'avg_score': 0.72666666666666668,
  'cv_scores': array([ 0.66666667,  0.8       ,  0.63333333,  0.86666667,  0.66666667]),
  'feature_idx': (0, 1)},
 5: {'avg_score': 0.94666666666666666,
  'cv_scores': array([ 0.96666667,  1.        ,  0.86666667,  0.93333333,  0.96666667]),
  'feature_idx': (0, 2)},
 6: {'avg_score': 0.95333333333333337,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.9       ,  0.93333333,  1.        ]),
  'feature_idx': (0, 3)},
 7: {'avg_score': 0.94666666666666666,
  'cv_scores': array([ 0.96666667,  1.        ,  0.9       ,  0.93333333,  0.93333333]),
  'feature_idx': (1, 2)},
 8: {'avg_score': 0.94000000000000006,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.86666667,  0.93333333,  0.96666667]),
  'feature_idx': (1, 3)},
 9: {'avg_score': 0.95333333333333337,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.9       ,  0.93333333,  1.        ]),
  'feature_idx': (2, 3)},
 10: {'avg_score': 0.94000000000000006,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.86666667,  0.93333333,  0.96666667]),
  'feature_idx': (0, 1, 2)},
 11: {'avg_score': 0.94666666666666666,
  'cv_scores': array([ 0.93333333,  0.96666667,  0.9       ,  0.93333333,  1.        ]),
  'feature_idx': (0, 1, 3)},
 12: {'avg_score': 0.97333333333333338,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.96666667,  0.96666667,  1.        ]),
  'feature_idx': (0, 2, 3)},
 13: {'avg_score': 0.95999999999999996,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.93333333,  0.93333333,  1.        ]),
  'feature_idx': (1, 2, 3)},
 14: {'avg_score': 0.96666666666666679,
  'cv_scores': array([ 0.96666667,  0.96666667,  0.93333333,  0.96666667,  1.        ]),
  'feature_idx': (0, 1, 2, 3)}}

Example 2 - Visualizing the feature selection results

For our convenience, we can visualize the output from the feature selection in a pandas DataFrame format using the get_metric_dict method of the ExhaustiveFeatureSelector object. The columns std_dev and std_err represent the standard deviation and standard errors of the cross-validation scores, respectively.

Below, we see the DataFrame of the Sequential Forward Selector from Example 2:

import pandas as pd

iris = load_iris()
X = iris.data
y = iris.target

knn = KNeighborsClassifier(n_neighbors=3)

efs1 = EFS(knn, 
           min_features=1,
           max_features=4,
           scoring='accuracy',
           print_progress=True,
           cv=5)

efs1 = efs1.fit(X, y)

df = pd.DataFrame.from_dict(efs1.get_metric_dict()).T
df.sort_values('avg_score', inplace=True, ascending=False)
df
Features: 15/15
avg_score ci_bound cv_scores feature_idx std_dev std_err
12 0.973333 0.0171372 [0.966666666667, 0.966666666667, 0.96666666666... (0, 2, 3) 0.0133333 0.00666667
14 0.966667 0.0270963 [0.966666666667, 0.966666666667, 0.93333333333... (0, 1, 2, 3) 0.0210819 0.0105409
13 0.96 0.0320608 [0.966666666667, 0.966666666667, 0.93333333333... (1, 2, 3) 0.0249444 0.0124722
2 0.953333 0.0514116 [0.933333333333, 1.0, 0.9, 0.933333333333, 1.0] (2,) 0.04 0.02
6 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (0, 3) 0.0339935 0.0169967
9 0.953333 0.0436915 [0.966666666667, 0.966666666667, 0.9, 0.933333... (2, 3) 0.0339935 0.0169967
3 0.946667 0.0581151 [0.966666666667, 0.966666666667, 0.93333333333... (3,) 0.0452155 0.0226078
5 0.946667 0.0581151 [0.966666666667, 1.0, 0.866666666667, 0.933333... (0, 2) 0.0452155 0.0226078
7 0.946667 0.0436915 [0.966666666667, 1.0, 0.9, 0.933333333333, 0.9... (1, 2) 0.0339935 0.0169967
11 0.946667 0.0436915 [0.933333333333, 0.966666666667, 0.9, 0.933333... (0, 1, 3) 0.0339935 0.0169967
8 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (1, 3) 0.038873 0.0194365
10 0.94 0.0499631 [0.966666666667, 0.966666666667, 0.86666666666... (0, 1, 2) 0.038873 0.0194365
4 0.726667 0.11623 [0.666666666667, 0.8, 0.633333333333, 0.866666... (0, 1) 0.0904311 0.0452155
0 0.66 0.106334 [0.533333333333, 0.633333333333, 0.73333333333... (0,) 0.0827312 0.0413656
1 0.566667 0.0605892 [0.533333333333, 0.633333333333, 0.6, 0.5, 0.5... (1,) 0.0471405 0.0235702
import matplotlib.pyplot as plt

metric_dict = efs1.get_metric_dict()

fig = plt.figure()
k_feat = sorted(metric_dict.keys())
avg = [metric_dict[k]['avg_score'] for k in k_feat]

upper, lower = [], []
for k in k_feat:
    upper.append(metric_dict[k]['avg_score'] +
                 metric_dict[k]['std_dev'])
    lower.append(metric_dict[k]['avg_score'] -
                 metric_dict[k]['std_dev'])

plt.fill_between(k_feat,
                 upper,
                 lower,
                 alpha=0.2,
                 color='blue',
                 lw=1)

plt.plot(k_feat, avg, color='blue', marker='o')
plt.ylabel('Accuracy +/- Standard Deviation')
plt.xlabel('Number of Features')
feature_min = len(metric_dict[k_feat[0]]['feature_idx'])
feature_max = len(metric_dict[k_feat[-1]]['feature_idx'])
plt.xticks(k_feat, 
           [str(metric_dict[k]['feature_idx']) for k in k_feat], 
           rotation=90)
plt.show()

png

Example 3 - Exhaustive Feature Selection for Regression

Similar to the classification examples above, the SequentialFeatureSelector also supports scikit-learn's estimators for regression.

from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

boston = load_boston()
X, y = boston.data, boston.target

lr = LinearRegression()

efs = EFS(lr, 
          min_features=10,
          max_features=12,
          scoring='neg_mean_squared_error',
          cv=10)

efs.fit(X, y)

print('Best MSE score: %.2f' % efs.best_score_ * (-1))
print('Best subset:', efs.best_idx_)
Features: 377/377


Best subset: (0, 1, 4, 6, 7, 8, 9, 10, 11, 12)

Example 4 - Using the Selected Feature Subset For Making New Predictions

# Initialize the dataset

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
         X, y, test_size=0.33, random_state=1)

knn = KNeighborsClassifier(n_neighbors=3)
# Select the "best" three features via
# 5-fold cross-validation on the training set.

from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

efs1 = EFS(knn, 
           min_features=1,
           max_features=4,
           scoring='accuracy',
           cv=5)
efs1 = efs1.fit(X_train, y_train)
Features: 15/15
print('Selected features:', efs1.best_idx_)
Selected features: (2, 3)
# Generate the new subsets based on the selected features
# Note that the transform call is equivalent to
# X_train[:, efs1.k_feature_idx_]

X_train_efs = efs1.transform(X_train)
X_test_efs = efs1.transform(X_test)

# Fit the estimator using the new feature subset
# and make a prediction on the test data
knn.fit(X_train_efs, y_train)
y_pred = knn.predict(X_test_efs)

# Compute the accuracy of the prediction
acc = float((y_test == y_pred).sum()) / y_pred.shape[0]
print('Test set accuracy: %.2f %%' % (acc*100))
Test set accuracy: 96.00 %

Example 5 - Exhaustive Feature Selection and GridSearch

# Initialize the dataset

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
         X, y, test_size=0.33, random_state=1)

Use scikit-learn's GridSearch to tune the hyperparameters of the LogisticRegression estimator inside the ExhaustiveFeatureSelector and use it for prediction in the pipeline. Note that the clone_estimator attribute needs to be set to False.

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS

lr = LogisticRegression(multi_class='multinomial', 
                        solver='lbfgs', 
                        random_state=123)

efs1 = EFS(estimator=lr, 
           min_features=2,
           max_features=3,
           scoring='accuracy',
           print_progress=False,
           clone_estimator=False,
           cv=5,
           n_jobs=1)

pipe = make_pipeline(efs1, lr)

param_grid = {'exhaustivefeatureselector__estimator__C': [0.1, 1.0, 10.0]}

gs = GridSearchCV(estimator=pipe, 
                  param_grid=param_grid, 
                  scoring='accuracy', 
                  n_jobs=1, 
                  cv=2, 
                  verbose=1, 
                  refit=False)

# run gridearch
gs = gs.fit(X_train, y_train)
Fitting 2 folds for each of 3 candidates, totalling 6 fits


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    3.1s finished

... and the "best" parameters determined by GridSearch are ...

print("Best parameters via GridSearch", gs.best_params_)
Best parameters via GridSearch {'exhaustivefeatureselector__estimator__C': 1.0}

Obtaining the best k feature indices after GridSearch

If we are interested in the best k best feature indices via SequentialFeatureSelection.best_idx_, we have to initialize a GridSearchCV object with refit=True. Now, the grid search object will take the complete training dataset and the best parameters, which it found via cross-validation, to train the estimator pipeline.

gs = GridSearchCV(estimator=pipe, 
                  param_grid=param_grid, 
                  scoring='accuracy', 
                  n_jobs=1, 
                  cv=2, 
                  verbose=1, 
                  refit=True)

After running the grid search, we can access the individual pipeline objects of the best_estimator_ via the steps attribute.

gs = gs.fit(X_train, y_train)
gs.best_estimator_.steps
Fitting 2 folds for each of 3 candidates, totalling 6 fits


[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    2.6s finished





[('exhaustivefeatureselector',
  ExhaustiveFeatureSelector(clone_estimator=False, cv=5,
               estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
            intercept_scaling=1, max_iter=100, multi_class='multinomial',
            n_jobs=1, penalty='l2', random_state=123, solver='lbfgs',
            tol=0.0001, verbose=0, warm_start=False),
               max_features=3, min_features=2, n_jobs=1,
               pre_dispatch='2*n_jobs', print_progress=False,
               scoring='accuracy')),
 ('logisticregression',
  LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
            intercept_scaling=1, max_iter=100, multi_class='multinomial',
            n_jobs=1, penalty='l2', random_state=123, solver='lbfgs',
            tol=0.0001, verbose=0, warm_start=False))]

Via sub-indexing, we can then obtain the best-selected feature subset:

print('Best features:', gs.best_estimator_.steps[0][1].best_idx_)
Best features: (2, 3)

During cross-validation, this feature combination had a CV accuracy of:

print('Best score:', gs.best_score_)
Best score: 0.97
gs.best_params_
{'exhaustivefeatureselector__estimator__C': 1.0}

Alternatively, if we can set the "best grid search parameters" in our pipeline manually if we ran GridSearchCV with refit=False. It should yield the same results:

pipe.set_params(**gs.best_params_).fit(X_train, y_train)
print('Best features:', pipe.steps[0][1].best_idx_)
Best features: (2, 3)

API

ExhaustiveFeatureSelector(estimator, min_features=1, max_features=1, print_progress=True, scoring='accuracy', cv=5, n_jobs=1, pre_dispatch='2n_jobs', clone_estimator=True)*

Exhaustive Feature Selection for Classification and Regression. (new in v0.4.3)

Parameters

  • estimator : scikit-learn classifier or regressor

  • min_features : int (default: 1)

    Minumum number of features to select

  • max_features : int (default: 1)

    Maximum number of features to select

  • print_progress : bool (default: True)

    Prints progress as the number of epochs to stderr.

  • scoring : str, (default='accuracy')

    Scoring metric in {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'} for regressors, or a callable object or function with signature scorer(estimator, X, y).

  • cv : int (default: 5)

    Scikit-learn cross-validation generator or int. If estimator is a classifier (or y consists of integer class labels), stratified k-fold is performed, and regular k-fold cross-validation otherwise. No cross-validation if cv is None, False, or 0.

  • n_jobs : int (default: 1)

    The number of CPUs to use for evaluating different feature subsets in parallel. -1 means 'all CPUs'.

  • pre_dispatch : int, or string (default: '2*n_jobs')

    Controls the number of jobs that get dispatched during parallel execution if n_jobs > 1 or n_jobs=-1. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs An int, giving the exact number of total jobs that are spawned A string, giving an expression as a function of n_jobs, as in 2*n_jobs

  • clone_estimator : bool (default: True)

    Clones estimator if True; works with the original estimator instance if False. Set to False if the estimator doesn't implement scikit-learn's set_params and get_params methods. In addition, it is required to set cv=0, and n_jobs=1.

Attributes

  • best_idx_ : array-like, shape = [n_predictions]

    Feature Indices of the selected feature subsets.

  • best_score_ : float

    Cross validation average score of the selected subset.

  • subsets_ : dict

    A dictionary of selected feature subsets during the sequential selection, where the dictionary keys are the lengths k of these feature subsets. The dictionary values are dictionaries themselves with the following keys: 'feature_idx' (tuple of indices of the feature subset) 'cv_scores' (list individual cross-validation scores) 'avg_score' (average cross-validation score)

Examples

For usage examples, please see http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/

Methods


fit(X, y, fit_params)

Perform feature selection and learn model from training data.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples]

    Target values.

  • fit_params : dict of string -> object, optional

    Parameters to pass to to the fit method of classifier.

Returns

  • self : object

fit_transform(X, y, fit_params)

Fit to training data and return the best selected features from X.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples]

    Target values.

  • fit_params : dict of string -> object, optional

    Parameters to pass to to the fit method of classifier.

Returns

Feature subset of X, shape={n_samples, k_features}


get_metric_dict(confidence_interval=0.95)

Return metric dictionary

Parameters

  • confidence_interval : float (default: 0.95)

    A positive float between 0.0 and 1.0 to compute the confidence interval bounds of the CV score averages.

Returns

Dictionary with items where each dictionary value is a list with the number of iterations (number of feature subsets) as its length. The dictionary keys corresponding to these lists are as follows: 'feature_idx': tuple of the indices of the feature subset 'cv_scores': list with individual CV scores 'avg_score': of CV average scores 'std_dev': standard deviation of the CV score average 'std_err': standard error of the CV score average 'ci_bound': confidence interval bound of the CV score average


get_params(deep=True)

Get parameters for this estimator.

Parameters

  • deep : boolean, optional

    If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any

    Parameter names mapped to their values.


set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns

self


transform(X)

Return the best selected features from X.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

Feature subset of X, shape={n_samples, k_features}