mlxtend version: 0.23.1

LinearRegression

LinearRegression(method='direct', eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0)

Ordinary least squares linear regression.

Parameters

  • method : string (default: 'direct')

    For gradient descent-based optimization, use sgd (see minibatch parameter for further options). Otherwise, if direct (default), the analytical method is used. For alternative, numerically more stable solutions, use either qr (QR decomopisition) or svd (Singular Value Decomposition).

  • eta : float (default: 0.01)

    solver learning rate (between 0.0 and 1.0). Used with method = 'sgd'. (See methods parameter for details)

  • epochs : int (default: 50)

    Passes over the training dataset. Prior to each epoch, the dataset is shuffled if minibatches > 1 to prevent cycles in stochastic gradient descent. Used with method = 'sgd'. (See methods parameter for details)

  • minibatches : int (default: None)

    The number of minibatches for gradient-based optimization. If None: Direct method, QR, or SVD method (see method parameter for details) If 1: Gradient Descent learning If len(y): Stochastic Gradient Descent learning If 1 < minibatches < len(y): Minibatch learning

  • random_seed : int (default: None)

    Set random state for shuffling and initializing the weights. Used in method = 'sgd'. (See methods parameter for details)

  • print_progress : int (default: 0)

    Prints progress in fitting to stderr if method = 'sgd'. 0: No output 1: Epochs elapsed and cost 2: 1 plus time elapsed 3: 2 plus estimated time until completion

Attributes

  • w_ : 2d-array, shape={n_features, 1}

    Model weights after fitting.

  • b_ : 1d-array, shape={1,}

    Bias unit after fitting.

  • cost_ : list

    Sum of squared errors after each epoch; ignored if solver='normal equation'

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/regressor/LinearRegression/

Methods


fit(X, y, init_params=True)

Learn model from training data.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples]

    Target values.

  • init_params : bool (default: True)

    Re-initializes model parameters prior to fitting. Set False to continue training with weights from a previous model fitting.

Returns

  • self : object

get_params(deep=True)

Get parameters for this estimator.

Parameters

  • deep : boolean, optional

    If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any

    Parameter names mapped to their values.'

    adapted from https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py Author: Gael Varoquaux gael.varoquaux@normalesup.org License: BSD 3 clause


predict(X)

Predict targets from X.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

  • target_values : array-like, shape = [n_samples]

    Predicted target values.


set_params(params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns

self

adapted from
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py
Author: Gael Varoquaux <gael.varoquaux@normalesup.org>
License: BSD 3 clause

StackingCVRegressor

StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, random_state=None, verbose=0, refit=True, use_features_in_secondary=False, store_train_meta_features=False, n_jobs=None, pre_dispatch='2n_jobs', multi_output=False)*

A 'Stacking Cross-Validation' regressor for scikit-learn estimators.

Parameters

  • regressors : array-like, shape = [n_regressors]

    A list of regressors. Invoking the fit method on the StackingCVRegressor will fit clones of these original regressors that will be stored in the class attribute self.regr_.

  • meta_regressor : object

    The meta-regressor to be fitted on the ensemble of regressor

  • cv : int, cross-validation generator or iterable, optional (default: 5)

    Determines the cross-validation splitting strategy. Possible inputs for cv are: - None, to use the default 5-fold cross validation, - integer, to specify the number of folds in a KFold, - An object to be used as a cross-validation generator. - An iterable yielding train, test splits. For integer/None inputs, it will use KFold cross-validation

  • shuffle : bool (default: True)

    If True, and the cv argument is integer, the training data will be shuffled at fitting stage prior to cross-validation. If the cv argument is a specific cross validation technique, this argument is omitted.

  • random_state : int, RandomState instance or None, optional (default: None)

    Constrols the randomness of the cv splitter. Used when cv is integer and shuffle=True. New in v0.16.0.

  • verbose : int, optional (default=0)

    Controls the verbosity of the building process. New in v0.16.0

  • refit : bool (default: True)

    Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function.

  • use_features_in_secondary : bool (default: False)

    If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors.

  • store_train_meta_features : bool (default: False)

    If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit.

  • n_jobs : int or None, optional (default=None)

    The number of CPUs to use to do the computation. None means 1 unless in a :obj:joblib.parallel_backend context. -1 means using all processors. See :term:Glossary <n_jobs> for more details. New in v0.16.0.

  • pre_dispatch : int, or string, optional

    Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be: - None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs - An int, giving the exact number of total jobs that are spawned - A string, giving an expression as a function of n_jobs, as in '2*n_jobs'

  • multi_output : bool (default: False)

    If True, allow multi-output targets, but forbid nan or inf values. If False, y will be checked to be a vector. (New in v0.19.0.)

Attributes

  • train_meta_features : numpy array, shape = [n_samples, n_regressors]

    meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/regressor/StackingCVRegressor/

Methods


fit(X, y, groups=None, sample_weight=None)

Fit ensemble regressors and the meta-regressor.

Parameters

  • X : numpy array, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : numpy array, shape = [n_samples] or [n_samples, n_targets]

    Target values. Multiple targets are supported only if self.multi_output is True.

  • groups : numpy array/None, shape = [n_samples]

    The group that each sample belongs to. This is used by specific folding strategies such as GroupKFold()

  • sample_weight : array-like, shape = [n_samples], optional

    Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method.

Returns

  • self : object

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to `X` and `y` with optional parameters `fit_params`
and returns a transformed version of `X`.

Parameters

  • X : array-like of shape (n_samples, n_features)

    Input samples.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

    Target values (None for unsupervised transformations).

  • **fit_params : dict

    Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new)

    Transformed array.


get_params(deep=True)

Get parameters for this estimator.

Parameters

  • deep : bool, default=True

    If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : dict

    Parameter names mapped to their values.


predict(X)

Predict target values for X.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

  • y_target : array-like, shape = [n_samples] or [n_samples, n_targets]

    Predicted target values.


predict_meta_features(X)

Get meta-features of test-data.

Parameters

  • X : numpy array, shape = [n_samples, n_features]

    Test vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

  • meta-features : numpy array, shape = [n_samples, len(self.regressors)]

    meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. If self.multi_output is True, then the number of columns is len(self.regressors) * n_targets.


score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual

sum of squares ((y_true - y_pred)** 2).sum() and :math:v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum().

The best possible score is 1.0 and it can be negative (because the

model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a :math:R^2 score of 0.0.

Parameters

  • X : array-like of shape (n_samples, n_features)

    Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs)

    True values for X.

  • sample_weight : array-like of shape (n_samples,), default=None

    Sample weights.

Returns

  • score : float

    :math:R^2 of self.predict(X) w.r.t. y.

Notes

The :math:R^2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of :func:~sklearn.metrics.r2_score. This influences the score method of all the multioutput regressors (except for :class:~sklearn.multioutput.MultiOutputRegressor).


set_output(, transform=None)*

Set output container.

See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
for an example on how to use the API.

Parameters

  • transform : {"default", "pandas"}, default=None

    Configure output of transform and fit_transform.

    • "default": Default output format of a transformer
    • "pandas": DataFrame output
    • None: Transform configuration is unchanged

Returns

  • self : estimator instance

    Estimator instance.


set_params(params)

Set the parameters of this estimator.

Valid parameter keys can be listed with ``get_params()``.

Returns

self

Properties


named_regressors

Returns

List of named estimator tuples, like [('svc', SVC(...))]

StackingRegressor

StackingRegressor(regressors, meta_regressor, verbose=0, use_features_in_secondary=False, store_train_meta_features=False, refit=True, multi_output=False)

A Stacking regressor for scikit-learn estimators for regression.

Parameters

  • regressors : array-like, shape = [n_regressors]

    A list of regressors. Invoking the fit method on the StackingRegressor will fit clones of those original regressors that will be stored in the class attribute self.regr_.

  • meta_regressor : object

    The meta-regressor to be fitted on the ensemble of regressors

  • verbose : int, optional (default=0)

    Controls the verbosity of the building process. - verbose=0 (default): Prints nothing - verbose=1: Prints the number & name of the regressor being fitted - verbose=2: Prints info about the parameters of the regressor being fitted - verbose>2: Changes verbose param of the underlying regressor to self.verbose - 2

  • use_features_in_secondary : bool (default: False)

    If True, the meta-regressor will be trained both on the predictions of the original regressors and the original dataset. If False, the meta-regressor will be trained only on the predictions of the original regressors.

  • store_train_meta_features : bool (default: False)

    If True, the meta-features computed from the training data used for fitting the meta-regressor stored in the self.train_meta_features_ array, which can be accessed after calling fit.

Attributes

  • regr_ : list, shape=[n_regressors]

    Fitted regressors (clones of the original regressors)

  • meta_regr_ : estimator

    Fitted meta-regressor (clone of the original meta-estimator)

  • coef_ : array-like, shape = [n_features]

    Model coefficients of the fitted meta-estimator

  • intercept_ : float

    Intercept of the fitted meta-estimator

  • train_meta_features : numpy array,

    shape = [n_samples, len(self.regressors)] meta-features for training data, where n_samples is the number of samples in training data and len(self.regressors) is the number of regressors.

  • refit : bool (default: True)

    Clones the regressors for stacking regression if True (default) or else uses the original ones, which will be refitted on the dataset upon calling the fit method. Setting refit=False is recommended if you are working with estimators that are supporting the scikit-learn fit/predict API interface but are not compatible to scikit-learn's clone function.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/

Methods


fit(X, y, sample_weight=None)

Learn weight coefficients from training data for each regressor.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : numpy array, shape = [n_samples] or [n_samples, n_targets]

    Target values. Multiple targets are supported only if self.multi_output is True.

  • sample_weight : array-like, shape = [n_samples], optional

    Sample weights passed as sample_weights to each regressor in the regressors list as well as the meta_regressor. Raises error if some regressor does not support sample_weight in the fit() method.

Returns

  • self : object

fit_transform(X, y=None, fit_params)

Fit to data, then transform it.

Fits transformer to `X` and `y` with optional parameters `fit_params`
and returns a transformed version of `X`.

Parameters

  • X : array-like of shape (n_samples, n_features)

    Input samples.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None

    Target values (None for unsupervised transformations).

  • **fit_params : dict

    Additional fit parameters.

Returns

  • X_new : ndarray array of shape (n_samples, n_features_new)

    Transformed array.


get_params(deep=True)

Return estimator parameter names for GridSearch support.


predict(X)

Predict target values for X.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

  • y_target : array-like, shape = [n_samples] or [n_samples, n_targets]

    Predicted target values.


predict_meta_features(X)

Get meta-features of test-data.

Parameters

  • X : numpy array, shape = [n_samples, n_features]

    Test vectors, where n_samples is the number of samples and n_features is the number of features.

Returns

  • meta-features : numpy array, shape = [n_samples, len(self.regressors)]

    meta-features for test data, where n_samples is the number of samples in test data and len(self.regressors) is the number of regressors. If self.multi_output is True, then the number of columns is len(self.regressors) * n_targets


score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual

sum of squares ((y_true - y_pred)** 2).sum() and :math:v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum().

The best possible score is 1.0 and it can be negative (because the

model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a :math:R^2 score of 0.0.

Parameters

  • X : array-like of shape (n_samples, n_features)

    Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y : array-like of shape (n_samples,) or (n_samples, n_outputs)

    True values for X.

  • sample_weight : array-like of shape (n_samples,), default=None

    Sample weights.

Returns

  • score : float

    :math:R^2 of self.predict(X) w.r.t. y.

Notes

The :math:R^2 score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of :func:~sklearn.metrics.r2_score. This influences the score method of all the multioutput regressors (except for :class:~sklearn.multioutput.MultiOutputRegressor).


set_output(, transform=None)*

Set output container.

See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
for an example on how to use the API.

Parameters

  • transform : {"default", "pandas"}, default=None

    Configure output of transform and fit_transform.

    • "default": Default output format of a transformer
    • "pandas": DataFrame output
    • None: Transform configuration is unchanged

Returns

  • self : estimator instance

    Estimator instance.


set_params(params)

Set the parameters of this estimator.

Valid parameter keys can be listed with ``get_params()``.

Returns

self

Properties


coef_

None


intercept_

None


named_regressors

None