mlxtend version: 0.23.1
CopyTransformer
CopyTransformer()
Transformer that returns a copy of the input array
For usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/preprocessing/CopyTransformer/
Methods
fit(X, y=None)
Mock method. Does nothing.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
self
fit_transform(X, y=None)
Return a copy of the input array.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
X_copy
: copy of the input X array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: dictParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: estimator instanceEstimator instance.
transform(X, y=None)
Return a copy of the input array.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
X_copy
: copy of the input X array.
DenseTransformer
DenseTransformer(return_copy=True)
Convert a sparse array into a dense array.
For usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/
Methods
fit(X, y=None)
Mock method. Does nothing.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
self
fit_transform(X, y=None)
Return a dense version of the input array.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
X_dense
: dense version of the input X array.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: dictParameter names mapped to their values.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: estimator instanceEstimator instance.
transform(X, y=None)
Return a dense version of the input array.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Training vectors, where n_samples is the number of samples and n_features is the number of features.
-
y
: array-like, shape = [n_samples] (default: None)
Returns
X_dense
: dense version of the input X array.
MeanCenterer
MeanCenterer()
Column centering of vectors and matrices.
Attributes
-
col_means
: numpy.ndarray [n_columns]NumPy array storing the mean values for centering after fitting the MeanCenterer object.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/MeanCenterer/
Methods
fit(X)
Gets the column means for mean centering.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Array of data vectors, where n_samples is the number of samples and n_features is the number of features.
Returns
self
fit_transform(X)
Fits and transforms an arry.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Array of data vectors, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_tr
: {array-like, sparse matrix}, shape = [n_samples, n_features]A copy of the input array with the columns centered.
transform(X)
Centers a NumPy array.
Parameters
-
X
: {array-like, sparse matrix}, shape = [n_samples, n_features]Array of data vectors, where n_samples is the number of samples and n_features is the number of features.
Returns
-
X_tr
: {array-like, sparse matrix}, shape = [n_samples, n_features]A copy of the input array with the columns centered.
TransactionEncoder
TransactionEncoder()
Encoder class for transaction data in Python lists
Parameters
None
Attributes
columns_: list
List of unique names in the X
input list of lists
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
Methods
fit(X)
Learn unique column names from transaction DataFrame
Parameters
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]
fit_transform(X, sparse=False)
Fit a TransactionEncoder encoder and transform a dataset.
get_params(deep=True)
Get parameters for this estimator.
Parameters
-
deep
: bool, default=TrueIf True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
-
params
: dictParameter names mapped to their values.
inverse_transform(array)
Transforms an encoded NumPy array back into transactions.
Parameters
-
array
: NumPy array [n_transactions, n_unique_items]The NumPy one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order
For example,
array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]])
The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']
Returns
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example,
[['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]
set_output(, transform=None)*
Set output container.
See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py`
for an example on how to use the API.
Parameters
-
transform
: {"default", "pandas"}, default=NoneConfigure output of
transform
andfit_transform
."default"
: Default output format of a transformer"pandas"
: DataFrame outputNone
: Transform configuration is unchanged
Returns
-
self
: estimator instanceEstimator instance.
set_params(params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``<component>__<parameter>`` so that it's
possible to update each component of a nested object.
Parameters
-
**params
: dictEstimator parameters.
Returns
-
self
: estimator instanceEstimator instance.
transform(X, sparse=False)
Transform transactions into a one-hot encoded NumPy array.
Parameters
-
X
: list of listsA python list of lists, where the outer list stores the n transactions and the inner list stores the items in each transaction.
For example, [['Apple', 'Beer', 'Rice', 'Chicken'], ['Apple', 'Beer', 'Rice'], ['Apple', 'Beer'], ['Apple', 'Bananas'], ['Milk', 'Beer', 'Rice', 'Chicken'], ['Milk', 'Beer', 'Rice'], ['Milk', 'Beer'], ['Apple', 'Bananas']]
sparse: bool (default=False) If True, transform will return Compressed Sparse Row matrix instead of the regular one.
Returns
-
array
: NumPy array [n_transactions, n_unique_items]if sparse=False (default). Compressed Sparse Row matrix otherwise The one-hot encoded boolean array of the input transactions, where the columns represent the unique items found in the input array in alphabetic order. Exact representation depends on the sparse argument
For example, array([[True , False, True , True , False, True ], [True , False, True , False, False, True ], [True , False, True , False, False, False], [True , True , False, False, False, False], [False, False, True , True , True , True ], [False, False, True , False, True , True ], [False, False, True , False, True , False], [True , True , False, False, False, False]]) The corresponding column labels are available as self.columns_, e.g., ['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']
minmax_scaling
minmax_scaling(array, columns, min_val=0, max_val=1)
Min max scaling of pandas' DataFrames.
Parameters
-
array
: pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. -
columns
: array-like, shape = [n_columns]Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...]
-
min_val
:int
orfloat
, optional (default=0
)minimum value after rescaling.
-
max_val
:int
orfloat
, optional (default=1
)maximum value after rescaling.
Returns
-
df_new
: pandas DataFrame object.Copy of the array or DataFrame with rescaled columns.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/minmax_scaling/
one_hot
one_hot(y, num_labels='auto', dtype='float')
One-hot encoding of class labels
Parameters
-
y
: array-like, shape = [n_classlabels]Python list or numpy array consisting of class labels.
-
num_labels
: int or 'auto'Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'.
-
dtype
: strNumPy array type (float, float32, float64) of the output array.
Returns
-
ary
: numpy.ndarray, shape = [n_classlabels]One-hot encoded array, where each sample is represented as a row vector in the returned array.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/
shuffle_arrays_unison
shuffle_arrays_unison(arrays, random_seed=None)
Shuffle NumPy arrays in unison.
Parameters
-
arrays
: array-like, shape = [n_arrays]A list of NumPy arrays.
-
random_seed
: int (default: None)Sets the random state.
Returns
shuffled_arrays
: A list of NumPy arrays after shuffling.
Examples
```
>>> import numpy as np
>>> from mlxtend.preprocessing import shuffle_arrays_unison
>>> X1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> y1 = np.array([1, 2, 3])
>>> X2, y2 = shuffle_arrays_unison(arrays=[X1, y1], random_seed=3)
>>> assert(X2.all() == np.array([[4, 5, 6], [1, 2, 3], [7, 8, 9]]).all())
>>> assert(y2.all() == np.array([2, 1, 3]).all())
>>>
For more usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/preprocessing/shuffle_arrays_unison/
```
standardize
standardize(array, columns=None, ddof=0, return_params=False, params=None)
Standardize columns in pandas DataFrames.
Parameters
-
array
: pandas DataFrame or NumPy ndarray, shape = [n_rows, n_columns]. -
columns
: array-like, shape = [n_columns] (default: None)Array-like with column names, e.g., ['col1', 'col2', ...] or column indices [0, 2, 4, ...] If None, standardizes all columns.
-
ddof
: int (default: 0)Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
-
return_params
: dict (default: False)If set to True, a dictionary is returned in addition to the standardized array. The parameter dictionary contains the column means ('avgs') and standard deviations ('stds') of the individual columns.
-
params
: dict (default: None)A dictionary with column means and standard deviations as returned by the
standardize
function ifreturn_params
was set to True. If aparams
dictionary is provided, thestandardize
function will use these instead of computing them from the current array.
Notes
If all values in a given column are the same, these values are all
set to 0.0
. The standard deviation in the parameters
dictionary
is consequently set to 1.0
to avoid dividing by zero.
Returns
-
df_new
: pandas DataFrame object.Copy of the array or DataFrame with standardized columns.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/standardize/