DenseTransformer: Transforms a sparse into a dense NumPy array, e.g., in a scikit-learn pipeline

A simple transformer that converts a sparse into a dense numpy array, e.g., required for scikit-learn's Pipeline when, for example, CountVectorizers are used in combination with estimators that are not compatible with sparse matrices.

from mlxtend.preprocessing import DenseTransformer

Example 1

from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from mlxtend.preprocessing import DenseTransformer
import re
import numpy as np

X_train = np.array(['abc def ghi', 'this is a test',
                    'this is a test', 'this is a test'])
y_train = np.array([0, 0, 1, 1])

pipe_1 = Pipeline([
    ('vect', CountVectorizer()),
    ('to_dense', DenseTransformer()),
    ('clf', RandomForestClassifier())
])

parameters_1 = dict(
    clf__n_estimators=[50, 100, 200],
    clf__max_features=['sqrt', 'log2', None],)

grid_search_1 = GridSearchCV(pipe_1, 
                             parameters_1, 
                             n_jobs=1, 
                             verbose=1,
                             scoring='accuracy',
                             cv=2)


print("Performing grid search...")
print("pipeline:", [name for name, _ in pipe_1.steps])
print("parameters:")
grid_search_1.fit(X_train, y_train)
print("Best score: %0.3f" % grid_search_1.best_score_)
print("Best parameters set:")
best_parameters_1 = grid_search_1.best_estimator_.get_params()
for param_name in sorted(parameters_1.keys()):
    print("\t%s: %r" % (param_name, best_parameters_1[param_name]))
Performing grid search...
pipeline: ['vect', 'to_dense', 'clf']
parameters:
Fitting 2 folds for each of 9 candidates, totalling 18 fits
Best score: 0.500
Best parameters set:
    clf__max_features: 'sqrt'
    clf__n_estimators: 50


[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    3.9s finished

API

DenseTransformer(return_copy=True)

Convert a sparse array into a dense array.

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/preprocessing/DenseTransformer/

Methods


fit(X, y=None)

Mock method. Does nothing.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples] (default: None)

Returns

self


fit_transform(X, y=None)

Return a dense version of the input array.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples] (default: None)

Returns

  • X_dense : dense version of the input X array.

get_params(deep=True)

Get parameters for this estimator.

Parameters

  • deep : boolean, optional

    If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

  • params : mapping of string to any

    Parameter names mapped to their values.


set_params(params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns

self


transform(X, y=None)

Return a dense version of the input array.

Parameters

  • X : {array-like, sparse matrix}, shape = [n_samples, n_features]

    Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y : array-like, shape = [n_samples] (default: None)

Returns

  • X_dense : dense version of the input X array.