Release Notes
The CHANGELOG for the current development version is available at https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.
Version 0.23.3 (15 Nov 2024)
Downloads
New Features and Enhancements
Files updated:
- mlxtend.evaluate.time_series.plot_splits
- Improved plot_splits
for better visualization of time series splits
Changes
mlxtend/feature_selection/exhaustive_feature_selector.py
- np.inf update to support for NumPy 2.0
Version 0.23.2 (5 Nov 2024)
Downloads
New Features and Enhancements
- Implement the FP-Growth and FP-Max algorithms with the possibility of missing values in the input dataset. Added a new metric Representativity for the association rules generated (#1004 via zazass8). Files updated:
- ['mlxtend.frequent_patterns.fpcommon']
- 'mlxtend.frequent_patterns.fpgrowth'
- 'mlxtend.frequent_patterns.fpmax'
- 'mlxtend/feature_selection/utilities.py'
- Modified
_calc_score
function to ensure compatibility with scikit-learn versions 1.4 and above by dynamically selecting betweenfit_params
andparams
incross_val_score
.
- Modified
mlxtend.feature_selection.SequentialFeatureSelector
- Updated negative infinity constant to be compatible with old and new (>=2.0)
numpy
versions
- Updated negative infinity constant to be compatible with old and new (>=2.0)
mlxtend.frequent_patterns.association_rules
- Implemented three new metrics: Jaccard, Certainty, and Kulczynski. (#1096)
- Integrated scikit-learn's
set_output
method intoTransactionEncoder
(#1087 via it176131)
Changes
- [
mlxtend.frequent_patterns.fpcommon
] Added the null_values parameter in valid_input_check signature to check in case the input also includes null values. Changes the returns statements and function signatures for setup_fptree and generated_itemsets respectively to return the disabled array created and to include it as a parameter. Added code in [mlxtend.frequent_patterns.fpcommon
] andmlxtend.frequent_patterns.association_rules
to implement the algorithms in case null values exist when null_values is True. -
mlxtend.frequent_patterns.association_rules
Added optional parameter 'return_metrics' to only return a given list of metrics, rather than every possible metric. -
Add
n_classes_
attribute to stacking classifiers for compatibility with scikit-learn 1.3 (#1091) - Use Scipy's instead of NumPy's decompositions in PCA for improved accuracy in edge cases (#1080 via [fkdosilovic])
Version 0.23.1 (5 Jan 2024)
Downloads
Changes
Version 0.23.0 (22 Sep 2023)
Downloads
Changes
- Address NumPy deprecations to make mlxtend compatible to NumPy 1.24
- Changed the signature of the
LinearRegression
model of sklearn in the test removing thenormalize
parameter as it is deprecated. (#1036) - Add
pyproject.toml
to support PEP 518 builds (#1065 via jmahlik) - Fixed installation from sdist failing (#1065 via jmahlik)
- Converted configuration to
pyproject.toml
(#1065 via jmahlik) - Remove
mlxtend.image
submodule with face recognition functions due to poordlib
support in modern environments.
New Features and Enhancements
- Document how to use
SequentialFeatureSelector
and multiclass ROC AUC.
Version 0.22.0 (4 April 2023)
Downloads
Changes
- When
ExhaustiveFeatureSelector
is run withn_jobs == 1
, joblib is now disabled, which enables more immediate (live) feedback when theverbose
mode is enabled. (#985 via Nima Sarajpoor) - Disabled unnecessary warning in
EnsembleVoteClassifier
(#941) - Fixed various documentation issues (#849 and #951 via Lekshmanan Natarajan)
- Fixed "Edit on GitHub" button (#1024)
New Features and Enhancements
- The
mlxtend.frequent_patterns.association_rules
function has a new metric - Zhang's Metric, which measures both association and dissociation. (#980) - Internal
mlxtend.frequent_patterns.fpmax
code improvement that avoids casting a sparse DataFrame into a dense NumPy array. (#1000 via Tim Kellogg) - The
plot_decision_regions
function now has an_jobs
parameter to parallelize the computation. (In a particular use case, on a small dataset, there was a 21x speed-up (449 seconds vs 21 seconds on local HPC instance of 36 cores). (#998 via Khalid ElHaj) - Added
mlxtend.frequent_patterns.hmine
algorithm and documentation for mining frequent itemsets using the H-Mine algorithm. (#1020 via Fatih Sen)
Version 0.21.0 (09/17/2022)
Downloads
New Features and Enhancements
- The
mlxtend.evaluate.feature_importance_permutation
function has a newfeature_groups
argument to treat user-specified feature groups as single features, which is useful for one-hot encoded features. (#955) - The
mlxtend.feature_selection.ExhaustiveFeatureSelector
andSequentialFeatureSelector
also gained support forfeature_groups
with a behavior similar to the one described above. (#957 and #965 via Nima Sarajpoor)
Changes
- The
custom_feature_names
parameter was removed from theExhaustiveFeatureSelector
due to redundancy and to simplify the code base. TheExhaustiveFeatureSelector
documentation illustrates how the same behavior and outcome can be achieved using pandas DataFrames. (#957)
Bug Fixes
- None
Version 0.20.0 (05/26/2022)
Downloads
New Features and Enhancements
- The
mlxtend.evaluate.bootstrap_point632_score
now supportsfit_params
. (#861) - The
mlxtend/plotting/decision_regions.py
function now has acontourf_kwargs
for matplotlib to change the look of the decision boundaries if desired. (#881 via [pbloem]) - Add a
norm_colormap
parameter tomlxtend.plotting.plot_confusion_matrix
, to allow normalizing the colormap, e.g., usingmatplotlib.colors.LogNorm()
(#895) - Add new
GroupTimeSeriesSplit
class for evaluation in time series tasks with support of custom groups and additional parameters in comparison with scikit-learn'sTimeSeriesSplit
. (#915 via Dmitry Labazkin)
Changes
- Due to compatibility issues with newer package versions, certain functions from six.py have been removed so that mlxtend may not work anymore with Python 2.7.
- As an internal change to speed up unit testing, unit testing is now faciliated by GitHub workflows, and Travis CI and Appveyor hooks have been removed.
- Improved axis label rotation in
mlxtend.plotting.heatmap
andmlxtend.plotting.plot_confusion_matrix
(#872) - Fix various typos in McNemar guides.
- Raises a warning if non-bool arrays are used in the frequent pattern functions
apriori
,fpmax
, andfpgrowth
. (#934 via NimaSarajpoor)
Bug Fixes
- Fix unreadable labels in
heatmap
for certain colormaps. (#852) - Fix an issue in
mlxtend.plotting.plot_confusion_matrix
when string class names are passed (#894)
Version 0.19.0 (2021-09-02)
Downloads
New Features
- Adds a second "balanced accuracy" interpretation ("balanced") to
evaluate.accuracy_score
in addition to the existing "average" option to compute the scikit-learn-style balanced accuracy. (#764) - Adds new
scatter_hist
function tomlxtend.plotting
for generating a scattered histogram. (#757 via Maitreyee Mhasaka) - The
evaluate.permutation_test
function now accepts apaired
argument to specify to support paired permutation/randomization tests. (#768) - The
StackingCVRegressor
now also supports multi-dimensional targets similar toStackingRegressor
viaStackingCVRegressor(..., multi_output=True)
. (#802 via Marco Tiraboschi)
Changes
- Updates unit tests for scikit-learn 0.24.1 compatibility. (#774)
StackingRegressor
now requires settingStackingRegressor(..., multi_output=True)
if the target is multi-dimensional; this allows for better input validation. (#802)- Removes deprecated
res
argument fromplot_decision_regions
. (#803) - Adds a
title_fontsize
parameter toplot_learning_curves
for controlling the title font size; also the plot style is now the matplotlib default. (#818) - Internal change using
'c': 'none'
instead of'c': ''
inmlxtend.plotting.plot_decision_regions
's scatterplot highlights to stay compatible with Matplotlib 3.4 and newer. (#822) - Adds a
fontcolor_threshold
parameter to themlxtend.plotting.plot_confusion_matrix
function as an additional option for determining the font color cut-off manually. (#827) - The
frequent_patterns.association_rules
now raises aValueError
if an empty frequent itemset DataFrame is passed. (#843) - The .632 and .632+ bootstrap method implemented in the
mlxtend.evaluate.bootstrap_point632_score
function now use the whole training set for the resubstitution weighting term instead of the internal training set that is a new bootstrap sample in each round. (#844)
Bug Fixes
- Fixes a typo in the SequentialFeatureSelector documentation (#835 via João Pedro Zanlorensi Cardoso)
Version 0.18.0 (2020-11-25)
Downloads
New Features
- The
bias_variance_decomp
function now supports optionalfit_params
for the estimators that are fit on bootstrap samples. (#748) - The
bias_variance_decomp
function now supports Keras estimators. (#725 via @hanzigs) - Adds new
mlxtend.classifier.OneRClassifier
(One Rule Classfier) class, a simple rule-based classifier that is often used as a performance baseline or simple interpretable model. (#726 - Adds new
create_counterfactual
method for creating counterfactuals to explain model predictions. (#740)
Changes
permutation_test
(mlxtend.evaluate.permutation
) ìs corrected to give the proportion of permutations whose statistic is at least as extreme as the one observed. (#721 via Florian Charlier)- Fixes the McNemar confusion matrix layout to match the convention (and documentation), swapping the upper left and lower right cells. (#744 via mmarius)
Bug Fixes
- The loss in
LogisticRegression
for logging purposes didn't include the L2 penalty for the first weight in the weight vector (this is not the bias unit). However, since this loss function was only used for logging purposes, and the gradient remains correct, this does not have an effect on the main code. (#741) - Fixes a bug in
bias_variance_decomp
where when themse
loss was used, downcasting to integers caused imprecise results for small numbers. (#749)
Version 0.17.3 (2020-07-27)
Downloads
New Features
- Add
predict_proba
kwarg to bootstrap methods, to allow bootstrapping of scoring functions that take in probability values. (#700 via Adam Li) - Add a
cell_values
parameter tomlxtend.plotting.heatmap()
to optionally suppress cell annotations by settingcell_values=False
. (#703
Changes
- Implemented both
use_clones
andfit_base_estimators
(previouslyrefit
inEnsembleVoteClassifier
) forEnsembleVoteClassifier
andStackingClassifier
. (#670 via Katrina Ni) - Switched to using raw strings for regex in
mlxtend.text
to prevent deprecation warning in Python 3.8 (#688) - Slice data in sequential forward selection before sending to parallel backend, reducing memory consumption.
Bug Fixes
- Fixes axis DeprecationWarning in matplotlib v3.1.0 and newer. (#673)
- Fixes an issue with using
meshgrid
inno_information_rate
function used by thebootstrap_point632_score
function for the .632+ estimate. (#688) - Fixes an issue in
fpmax
that could lead to incorrect support values. (#692 via Steve Harenberg)
Version 0.17.2 (2020-02-24)
Downloads
New Features
- -
Changes
- The previously deprecated
OnehotTransactions
has been removed in favor of theTransactionEncoder.
- Removed
SparseDataFrame
support in frequent pattern mining functions in favor of pandas >=1.0's new way for working sparse data. If you usedSparseDataFrame
formats, please see pandas' migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating. (#667) - The
plot_confusion_matrix.py
now also accepts a matplotlib figure and axis as input to which the confusion matrix plot can be added. (#671 via Vahid Mirjalili)
Bug Fixes
- -
Version 0.17.1 (2020-01-28)
Downloads
New Features
- The
SequentialFeatureSelector
now supports using pre-specified feature sets via thefixed_features
parameter. (#578) - Adds a new
accuracy_score
function tomlxtend.evaluate
for computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das) StackingClassifier
andStackingCVClassifier
now have adecision_function
method, which serves as a preferred choice overpredict_proba
in calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)
Changes
- Improve the runtime performance for the
apriori
frequent itemset generating function whenlow_memory=True
. Settinglow_memory=False
(default) is still faster for small itemsets, butlow_memory=True
can be much faster for large itemsets and requires less memory. Also, input validation forapriori
, ̀ fpgrowthand
fpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when using
TransactionEncoder`. (#619 via Denis Barbier) - Add support for newer sparse pandas DataFrame for frequent itemset algorithms. Also, input validation for
apriori
, ̀ fpgrowthand
fpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier) - Let
fpgrowth
andfpmax
directly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)
Bug Fixes
- Fixes a bug in
mlxtend.plotting.plot_pca_correlation_graph
that caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira) - Behavior of
fpgrowth
andapriori
consistent for edgecases such asmin_support=0
. (#573 via Steve Harenberg) fpmax
returns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)- Fixes and issue in
mlxtend.plotting.plot_confusion_matrix
, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi) - The
svd
mode ofmlxtend.feature_extraction.PrincipalComponentAnalysis
now also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior ofeigen
. #595 - Disable input validation for
StackingCVClassifier
because it causes issues if pipelines are used as input. #606
Version 0.17.0 (2019-07-19)
Downloads
New Features
- Added an enhancement to the existing
iris_data()
such that both the UCI Repository version of the Iris dataset as well as the corrected, original version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad) - Added optional
groups
parameter toSequentialFeatureSelector
andExhaustiveFeatureSelector
fit()
methods for forwarding to sklearn CV (#537 via arc12) - Added a new
plot_pca_correlation_graph
function to themlxtend.plotting
submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira) - Added a
zoom_factor
parameter to themlxten.plotting.plot_decision_region
function that allows users to zoom in and out of the decision region plots. (#545) - Added a function
fpgrowth
that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existingapriori
algorithm. (#550 via Steve Harenberg) - New
heatmap
function inmlxtend.plotting
. (#552) - Added a function
fpmax
that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for thefpgrowth
algorithm. (#553 via Steve Harenberg) - New
figsize
parameter for theplot_decision_regions
function inmlxtend.plotting
. (#555 via Mirza Hasanbasic) - New
low_memory
option for theapriori
frequent itemset generating function. Settinglow_memory=False
(default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True
). (#567 via jmayse) - Added numerically stable OLS methods which uses
QR decomposition
andSingular Value Decomposition
(SVD) methods toLinearRegression
inmlxtend.regressor.linear_regression
. (#575 via PuneetGrov3r)
Changes
- Now uses the latest joblib library under the hood for multiprocessing instead of
sklearn.externals.joblib
. (#547) - Changes to
StackingCVClassifier
andStackingCVRegressor
such that first-level models are allowed to generate output of non-numeric type. (#562)
Bug Fixes
- Fixed documentation of
iris_data()
underiris.py
by adding a note about differences in the iris data in R and UCI machine learning repo. - Make sure that if the
'svd'
mode is used in PCA, the number of eigenvalues is the same as when using'eigen'
(append 0's zeros in that case) (#565)
Version 0.16.0 (2019-05-12)
Downloads
New Features
StackingCVClassifier
andStackingCVRegressor
now supportrandom_state
parameter, which, together withshuffle
, controls the randomness in the cv splitting. (#523 via Qiang Gu)StackingCVClassifier
andStackingCVRegressor
now have a newdrop_last_proba
parameter. It drops the last "probability" column in the feature set since ifTrue
, because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)- Other stacking estimators, including
StackingClassifier
,StackingCVClassifier
andStackingRegressor
, support grid search over theregressors
and even a single base regressor. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVClassifier
. (#522 via Qiang Gu) - Adds multiprocessing support to
StackingCVRegressor
. (#512 via Qiang Gu) - Now, the
StackingCVRegressor
also enables grid search over theregressors
and even a single base regressor. When there are level-mixed parameters,GridSearchCV
will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu) - Adds a
verbose
parameter toapriori
to show the current iteration number as well as the itemset size currently being sampled. (#519 - Adds an optional
class_name
parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle) - Adds a
pca.e_vals_normalized_
attribute to PCA for storing the eigenvalues also in normalized form; this is commonly referred to as variance explained ratios. #545
Changes
- Due to new features, restructuring, and better scikit-learn support (for
GridSearchCV
, etc.) theStackingCVRegressor
's meta regressor is now being accessed via'meta_regressor__*
in the parameter grid. E.g., if aRandomForestRegressor
as meta- egressor was previously tuned via'randomforestregressor__n_estimators'
, this has now changed to'meta_regressor__n_estimators'
. (#515 via Qiang Gu) - The same change mentioned above is now applied to other stacking estimators, including
StackingClassifier
,StackingCVClassifier
andStackingRegressor
. (#522 via Qiang Gu) - Automatically performs mean centering for PCA solver 'SVD' such that using SVD is always equal to using the covariance matrix approach #545
Bug Fixes
- The
feature_selection.ColumnSelector
now also supports column names of typeint
(in addition tostr
names) if the input is a pandas DataFrame. (#500 via tetrar124 - Fix unreadable labels in
plot_confusion_matrix
for imbalanced datasets ifshow_absolute=True
andshow_normed=True
. (#504) - Raises a more informative error if a
SparseDataFrame
is passed toapriori
and the dataframe has integer column names that don't start with0
due to current limitations of theSparseDataFrame
implementation in pandas. (#503) - SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
mlxtend.evaluate.feature_importance_permutation
now correctly accepts scoring functions with proper function signature asmetric
argument. #528
Version 0.15.0 (2019-01-19)
Downloads
New Features
- Adds a new transformer class to
mlxtend.image
,EyepadAlign
, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili) - Adds a new function,
mlxtend.evaluate.bias_variance_decomp
that decomposes the loss of a regressor or classifier into bias and variance terms. (#470) - Adds a
whitening
parameter toPrincipalComponentAnalysis
, to optionally whiten the transformed data such that the features have unit variance. (#475)
Changes
- Changed the default solver in
PrincipalComponentAnalysis
to'svd'
instead of'eigen'
to improve numerical stability. (#474) - The
mlxtend.image.extract_face_landmarks
now returnsNone
if no facial landmarks were detected instead of an array of all zeros. (#466)
Bug Fixes
- The eigenvectors maybe have not been sorted in certain edge cases if solver was
'eigen'
inPrincipalComponentAnalysis
andLinearDiscriminantAnalysis
. (#477, #478)
Version 0.14.0 (2018-11-09)
Downloads
New Features
- Added a
scatterplotmatrix
function to theplotting
module. (#437) - Added
sample_weight
option toStackingRegressor
,StackingClassifier
,StackingCVRegressor
,StackingCVClassifier
,EnsembleVoteClassifier
. (#438) - Added a
RandomHoldoutSplit
class to perform a random train/valid split without rotation inSequentialFeatureSelector
, scikit-learnGridSearchCV
etc. (#442) - Added a
PredefinedHoldoutSplit
class to perform a train/valid split, based on user-specified indices, without rotation inSequentialFeatureSelector
, scikit-learnGridSearchCV
etc. (#443) - Created a new
mlxtend.image
submodule for working on image processing-related tasks. (#457) - Added a new convenience function
extract_face_landmarks
based ondlib
tomlxtend.image
. (#458) - Added a
method='oob'
option to themlxtend.evaluate.bootstrap_point632_score
method to compute the classic out-of-bag bootstrap estimate (#459) - Added a
method='.632+'
option to themlxtend.evaluate.bootstrap_point632_score
method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459) - Added a new
mlxtend.evaluate.ftest
function to perform an F-test for comparing the accuracies of two or more classification models. (#460) - Added a new
mlxtend.evaluate.combined_ftest_5x2cv
function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461) - Added a new
mlxtend.evaluate.difference_proportions
test for comparing two proportions (e.g., classifier accuracies) (#462)
Changes
- Addressed deprecations warnings in NumPy 0.15. (#425)
- Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility
Bug Fixes
- Fixed an issue with a missing import in
mlxtend.plotting.plot_confusion_matrix
. (#428)
Version 0.13.0 (2018-07-20)
Downloads
New Features
- A meaningful error message is now raised when a cross-validation generator is used with
SequentialFeatureSelector
. (#377) - The
SequentialFeatureSelector
now accepts custom feature names via thefit
method for more interpretable feature subset reports. (#379) - The
SequentialFeatureSelector
is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379) ColumnSelector
now works with Pandas DataFrames columns. (#378 by Manuel Garrido)- The
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c. (#380) - Two new functions,
vectorspace_orthonormalization
andvectorspace_dimensionality
were added tomlxtend.math
to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382) mlxtend.frequent_patterns.apriori
now supports pandasSparseDataFrame
s to generate frequent itemsets. (#404 via Daniel Morales)- The
plot_confusion_matrix
function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes. - Added support for merging the meta features with the original input features in
StackingRegressor
(viause_features_in_secondary
) like it is already supported in the other Stacking classes. (#418) - Added a
support_only
to theassociation_rules
function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)
Changes
- Itemsets generated with
apriori
are nowfrozenset
s (#393 by William Laney and #394) - Now raises an error if a input DataFrame to
apriori
contains non 0, 1, True, False values. #419)
Bug Fixes
- Allow mlxtend estimators to be cloned via scikit-learn's
clone
function. (#374) - Fixes bug to allow the correct use of
refit=False
inStackingRegressor
andStackingCVRegressor
(#384 and (#385) by selay01) - Allow
StackingClassifier
to work with sparse matrices whenuse_features_in_secondary=True
(#408 by Floris Hoogenbook) - Allow
StackingCVRegressor
to work with sparse matrices whenuse_features_in_secondary=True
(#416) - Allow
StackingCVClassifier
to work with sparse matrices whenuse_features_in_secondary=True
(#417)
Version 0.12.0 (2018-21-04)
Downloads
New Features
- A new
feature_importance_permuation
function to compute the feature importance in classifiers and regressors via the permutation importance method (#358) - The fit method of the
ExhaustiveFeatureSelector
now optionally accepts**fit_params
for the estimator that is used for the feature selection. (#354 by Zach Griffith) - The fit method of the
SequentialFeatureSelector
now optionally accepts**fit_params
for the estimator that is used for the feature selection. (#350 by Zach Griffith)
Changes
- Replaced
plot_decision_regions
colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348) - All stacking estimators now raise
NonFittedErrors
if any method for inference is called prior to fitting the estimator. (#353) - Renamed the
refit
parameter of both theStackingClassifier
andStackingCVClassifier
touse_clones
to be more explicit and less misleading. (#368)
Bug Fixes
- Various changes in the documentation and documentation tools to fix formatting issues (#363)
- Fixes a bug where the
StackingCVClassifier
's meta features were not stored in the original order whenshuffle=True
(#370) - Many documentation improvements, including links to the User Guides in the API docs (#371)
Version 0.11.0 (2018-03-14)
Downloads
New Features
- New function implementing the resampled paired t-test procedure (
paired_ttest_resampled
) to compare the performance of two models. (#323) - New function implementing the k-fold paired t-test procedure (
paired_ttest_kfold_cv
) to compare the performance of two models (also called k-hold-out paired t-test). (#324) - New function implementing the 5x2cv paired t-test procedure (
paired_ttest_5x2cv
) proposed by Dieterrich (1998) to compare the performance of two models. (#325) - A
refit
parameter was added to stacking classes (similar to therefit
parameter in theEnsembleVoteClassifier
), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn'sclone
function. (#322) - The
ColumnSelector
now has adrop_axis
argument to use it in pipelines withCountVectorizers
. (#333)
Changes
- Raises an informative error message if
predict
orpredict_meta_features
is called prior to calling thefit
method inStackingRegressor
andStackingCVRegressor
. (#315) - The
plot_decision_regions
function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The oldres
parameter has been deprecated. (#309 by Guillaume Poirier-Morency) - Apriori code is faster due to optimization in
onehot transformation
and the amount of candidates generated by theapriori
algorithm. (#327 by Jakub Smid) - The
OnehotTransactions
class (which is typically often used in combination with theapriori
function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, theOnehotTransactions
class can be now be provided withsparse
argument to generate sparse representations of theonehot
matrix to further improve memory efficiency. (#328 by Jakub Smid) - The
OneHotTransactions
has been deprecated and replaced by theTransactionEncoder
. (#332 - The
plot_decision_regions
function now has three new parameters,scatter_kwargs
,contourf_kwargs
, andscatter_highlight_kwargs
, that can be used to modify the plotting style. (#342 by James Bourbeau)
Bug Fixes
- Fixed issue when class labels were provided to the
EnsembleVoteClassifier
whenrefit
was set tofalse
. (#322) - Allow arrays with 16-bit and 32-bit precision in
plot_decision_regions
function. (#337) - Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. (#340)
Version 0.10.0 (2017-12-22)
Downloads
New Features
- New
store_train_meta_features
parameter forfit
in StackingCVRegressor. if True, train meta-features are stored inself.train_meta_features_
. Newpred_meta_features
method forStackingCVRegressor
. People can get test meta-features using this method. (#294 via takashioya) - The new
store_train_meta_features
attribute andpred_meta_features
method for theStackingCVRegressor
were also added to theStackingRegressor
,StackingClassifier
, andStackingCVClassifier
(#299 & #300) - New function (
evaluate.mcnemar_tables
) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307) - New function (
evaluate.cochrans_q
) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)
Changes
- Added
requirements.txt
tosetup.py
. (#304 via Colin Carrol)
Bug Fixes
- Improved numerical stability for p-values computed via the the exact McNemar test (#306)
nose
is not required to use the library (#302)
Version 0.9.1 (2017-11-19)
Downloads
New Features
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283) - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. (#270)
Changes
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262) - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. (#262) utils.Counter
now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
Bug Fixes
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)
Version 0.9.0 (2017-10-21)
Downloads
New Features
- Added
evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250) - Added
'leverage'
and'conviction
as evaluation metrics to thefrequent_patterns.association_rules
function. (#246 & #247) - Added a
loadings_
attribute toPrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251) - Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
- New
make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263) - Added a new
BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265) - The parameters for
StackingClassifier
,StackingCVClassifier
,StackingRegressor
,StackingCVRegressor
, andEnsembleVoteClassifier
can now be tuned using scikit-learn'sGridSearchCV
(#254 via James Bourbeau)
Changes
- The
'support'
column returned byfrequent_patterns.association_rules
was changed to compute the support of "antecedant union consequent", and newantecedant support'
and'consequent support'
column were added to avoid ambiguity. (#245) - Allow the
OnehotTransactions
to be cloned via scikit-learn'sclone
function, which is required by e.g., scikit-learn'sFeatureUnion
orGridSearchCV
(via Iaroslav Shcherbatyi). (#249)
Bug Fixes
- Fix issues with
self._init_time
parameter in_IterativeModel
subclasses. (#256) - Fix imprecision bug that occurred in
plot_ecdf
when run on Python 2.7. (264) - The vectors from SVD in
PrincipalComponentAnalysis
are now being scaled so that the eigenvalues viasolver='eigen'
andsolver='svd'
now store eigenvalues that have the same magnitudes. (#251)
Version 0.8.0 (2017-09-09)
Downloads
New Features
- Added a
mlxtend.evaluate.bootstrap
that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232 SequentialFeatureSelecor
'sk_features
now accepts a string argument "best" or "parsimonious" for more "automated" feature selection. For instance, if "best" is provided, the feature selector will return the feature subset with the best cross-validation performance. If "parsimonious" is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
Changes
SequentialFeatureSelector
now usesnp.nanmean
over normal mean to support scorers that may returnnp.nan
#211 (via mrkaiser)- The
skip_if_stuck
parameter was removed fromSequentialFeatureSelector
in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237 ExhaustiveFeatureSelector
was modified to consume substantially less memory #195 (via Adam Erickson)
Bug Fixes
- Fixed a bug where the
SequentialFeatureSelector
selected a feature subset larger than then specified via thek_features
tuple max-value #213
Version 0.7.0 (2017-06-22)
Downloads
New Features
- New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions (#196).
- New
StackingCVRegressor
for stacking regressors with out-of-fold predictions to prevent overfitting (#201via Eike Dehling).
Changes
- The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete.
plot_decision_regions
now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).- Parallel execution in
mlxtend.feature_selection.SequentialFeatureSelector
andmlxtend.feature_selection.ExhaustiveFeatureSelector
is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman). - Raise meaningful error messages if pandas
DataFrame
s or Python lists of lists are fed into theStackingCVClassifer
as afit
arguments (198). - The
n_folds
parameter of theStackingCVClassifier
was changed tocv
and can now accept any kind of cross validation technique that is available from scikit-learn. For example,StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3))
orStackingCVClassifier(..., cv=GroupKFold(n_splits=3))
(#203, via Konstantinos Paliouras).
Bug Fixes
SequentialFeatureSelector
now correctly accepts aNone
argument for thescoring
parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).- The
plot_decision_regions
function now supports pre-existing axes objects generated via matplotlib'splt.subplots
. (#184, see example) - Made
math.num_combinations
andmath.num_permutations
numerically stable for large numbers of combinations and permutations (#200).
Version 0.6.0 (2017-03-18)
Downloads
New Features
- An
association_rules
function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
Changes
- Adds a black
edgecolor
to plots viaplotting.plot_decision_regions
to make markers more distinguishable from the background inmatplotlib>=2.0
. - The
association
submodule was renamed tofrequent_patterns
.
Bug Fixes
- The
DataFrame
index ofapriori
results are now unique and ordered. - Fixed typos in autompg and wine datasets (via James Bourbeau).
Version 0.5.1 (2017-02-14)
Downloads
New Features
- The
EnsembleVoteClassifier
has a newrefit
attribute that prevents refitting classifiers ifrefit=False
to save computational time. - Added a new
lift_score
function inevaluate
to compute lift score (via Batuhan Bardak). StackingClassifier
andStackingRegressor
support multivariate targets if the underlying models do (via kernc).StackingClassifier
has a newuse_features_in_secondary
attribute likeStackingCVClassifier
.
Changes
- Changed default verbosity level in
SequentialFeatureSelector
to 0 - The
EnsembleVoteClassifier
now raises aNotFittedError
if the estimator wasn'tfit
before callingpredict
. (via Anton Loss) - Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0
Bug Fixes
- Fixed wrong default value for
k_features
inSequentialFeatureSelector
- Cast selected feature subsets in the
SequentialFeautureSelector
as sets to prevent the iterator from getting stuck if thek_idx
are different permutations of the same combination (via Zac Wellmer). - Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko)
- Fixed a bug that could occur in the
SequentialFeatureSelector
if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
Version 0.5.0 (2016-11-09)
Downloads
New Features
- New
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
for evaluating all feature combinations in a specified range - The
StackingClassifier
has a new parameteraverage_probas
that is set toTrue
by default to maintain the current behavior. A deprecation warning was added though, and it will default toFalse
in future releases (0.6.0);average_probas=False
will result in stacking of the level-1 predicted probabilities rather than averaging these. - New
StackingCVClassifier
estimator in 'mlxtend.classifier' for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano) - New
OnehotTransactions
encoder class added to thepreprocessing
submodule for transforming transaction data into a one-hot encoded array - The
SequentialFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c, and deprecatedprint_progress
in favor of a more tunableverbose
parameter (Will McGinnis) - New
apriori
function inassociation
to extract frequent itemsets from transaction data for association rule mining - New
checkerboard_plot
function inplotting
to plot checkerboard tables / heat maps - New
mcnemar_table
andmcnemar
functions inevaluate
to compute 2x2 contingency tables and McNemar's test
Changes
- All plotting functions have been moved to
mlxtend.plotting
for compatibility reasons with continuous integration services and to make the installation ofmatplotlib
optional for users ofmlxtend
's core functionality - Added a compatibility layer for
scikit-learn 0.18
using the newmodel_selection
module while maintaining backwards compatibility to scikit-learn 0.17.
Bug Fixes
mlxtend.plotting.plot_decision_regions
now draws decision regions correctly if more than 4 class labels are present- Raise
AttributeError
inplot_decision_regions
when theX_higlight
argument is a 1D array (chkoar)
Version 0.4.2 (2016-08-24)
Downloads
New Features
- Added
preprocessing.CopyTransformer
, a mock class that returns copies of imput arrays viatransform
andfit_transform
Changes
- Added AppVeyor to CI to ensure MS Windows compatibility
- Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects
feature_selection.SequentialFeatureSelector
now supports the selection ofk_features
using a tuple to specify a "min-max"k_features
range- Added "SVD solver" option to the
PrincipalComponentAnalysis
- Raise a
AttributeError
with "not fitted" message inSequentialFeatureSelector
iftransform
orget_metric_dict
are called prior tofit
- Use small, positive bias units in
TfMultiLayerPerceptron
's hidden layer(s) if the activations are ReLUs in order to avoid dead neurons - Added an optional
clone_estimator
parameter to theSequentialFeatureSelector
that defaults toTrue
, avoiding the modification of the original estimator objects - More rigorous type and shape checks in the
evaluate.plot_decision_regions
function DenseTransformer
now doesn't raise and error if the input array is not sparse- API clean-up using scikit-learn's
BaseEstimator
as parent class forfeature_selection.ColumnSelector
Bug Fixes
- Fixed a problem when a tuple-range was provided as argument to the
SequentialFeatureSelector
'sk_features
parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn's MSE scoring function) (wahutch](https://github.com/wahutch)) - Fixed an
AttributeError
issue whenverbose
> 1 inStackingClassifier
- Fixed a bug in
classifier.SoftmaxRegression
where the mean values of the offsets were used to update the bias units rather than their sum - Fixed rare bug in MLP
_layer_mapping
functions that caused a swap between the random number generation seed when initializing weights and biases
Version 0.4.1 (2016-05-01)
Downloads
New Features
- New TensorFlow estimator for Linear Regression (
tf_regressor.TfLinearRegression
) - New k-means clustering estimator (
cluster.Kmeans
) - New TensorFlow k-means clustering estimator (
tf_cluster.Kmeans
)
Changes
- Due to refactoring of the estimator classes, the
init_weights
parameter of thefit
methods was globally renamed toinit_params
- Overall performance improvements of estimators due to code clean-up and refactoring
- Added several additional checks for correct array types and more meaningful exception messages
- Added optional
dropout
to thetf_classifier.TfMultiLayerPerceptron
classifier for regularization - Added an optional
decay
parameter to thetf_classifier.TfMultiLayerPerceptron
classifier for adaptive learning via an exponential decay of the learning rate eta - Replaced old
NeuralNetMLP
by more streamlinedMultiLayerPerceptron
(classifier.MultiLayerPerceptron
); now also with softmax in the output layer and categorical cross-entropy loss. - Unified
init_params
parameter for fit functions to continue training where the algorithm left off (if supported)
Version 0.4.0 (2016-04-09)
New Features
- New
TfSoftmaxRegression
classifier using Tensorflow (tf_classifier.TfSoftmaxRegression
) - New
SoftmaxRegression
classifier (classifier.SoftmaxRegression
) - New
TfMultiLayerPerceptron
classifier using Tensorflow (tf_classifier.TfMultiLayerPerceptron
) - New
StackingRegressor
(regressor.StackingRegressor
) - New
StackingClassifier
(classifier.StackingClassifier
) - New function for one-hot encoding of class labels (
preprocessing.one_hot
) - Added
GridSearch
support to theSequentialFeatureSelector
(feature_selection/.SequentialFeatureSelector
) evaluate.plot_decision_regions
improvements:- Function now handles class y-class labels correctly if array is of type
float
- Correct handling of input arguments
markers
andcolors
- Accept an existing
Axes
via theax
argument
- Function now handles class y-class labels correctly if array is of type
- New
print_progress
parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch - Minibatch learning for
classifier.LogisticRegression
,classifier.Adaline
, andregressor.LinearRegression
plus streamlined API - New Principal Component Analysis class via
mlxtend.feature_extraction.PrincipalComponentAnalysis
- New RBF Kernel Principal Component Analysis class via
mlxtend.feature_extraction.RBFKernelPCA
- New Linear Discriminant Analysis class via
mlxtend.feature_extraction.LinearDiscriminantAnalysis
Changes
- The
column
parameter inmlxtend.preprocessing.standardize
now defaults toNone
to standardize all columns more conveniently
Version 0.3.0 (2016-01-31)
Downloads
New Features
- Added a progress bar tracker to
classifier.NeuralNetMLP
- Added a function to score predicted vs. target class labels
evaluate.scoring
- Added confusion matrix functions to create (
evaluate.confusion_matrix
) and plot (evaluate.plot_confusion_matrix
) confusion matrices - New style parameter and improved axis scaling in
mlxtend.evaluate.plot_learning_curves
- Added
loadlocal_mnist
tomlxtend.data
for streaming MNIST from a local byte files into numpy arrays - New
NeuralNetMLP
parameters:random_weights
,shuffle_init
,shuffle_epoch
- New
SFS
features such as the generation of pandasDataFrame
results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars) - Added support for regression estimators in
SFS
- Added Boston
housing dataset
- New
shuffle
parameter forclassifier.NeuralNetMLP
Changes
- The
mlxtend.preprocessing.standardize
function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes thestandardize
function smarter in order to avoid zero-division errors - Cosmetic improvements to the
evaluate.plot_decision_regions
function such as hiding plot axes - Renaming of
classifier.EnsembleClassfier
toclassifier.EnsembleVoteClassifier
- Improved random weight initialization in
Perceptron
,Adaline
,LinearRegression
, andLogisticRegression
- Changed
learning
parameter ofmlxtend.classifier.Adaline
tosolver
and added "normal equation" as closed-form solution solver - Hide y-axis labels in
mlxtend.evaluate.plot_decision_regions
in 1 dimensional evaluations - Sequential Feature Selection algorithms were unified into a single
SequentialFeatureSelector
class with parameters to enable floating selection and toggle between forward and backward selection. - Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories)
- Renaming
mlxtend.plotting
tomlxtend.general_plotting
in order to distinguish general plotting function from specialized utility function such asevaluate.plot_decision_regions
Version 0.2.9 (2015-07-14)
Downloads
New Features
- Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS
Changes
- Changed
regularization
&lambda
parameters inLogisticRegression
to single parameterl2_lambda
Version 0.2.8 (2015-06-27)
- API changes:
mlxtend.sklearn.EnsembleClassifier
->mlxtend.classifier.EnsembleClassifier
mlxtend.sklearn.ColumnSelector
->mlxtend.feature_selection.ColumnSelector
mlxtend.sklearn.DenseTransformer
->mlxtend.preprocessing.DenseTransformer
mlxtend.pandas.standardizing
->mlxtend.preprocessing.standardizing
mlxtend.pandas.minmax_scaling
->mlxtend.preprocessing.minmax_scaling
mlxtend.matplotlib
->mlxtend.plotting
- Added momentum learning parameter (alpha coefficient) to
mlxtend.classifier.NeuralNetMLP
. - Added adaptive learning rate (decrease constant) to
mlxtend.classifier.NeuralNetMLP
. mlxtend.pandas.minmax_scaling
becamemlxtend.preprocessing.minmax_scaling
and also supports NumPy arrays nowmlxtend.pandas.standardizing
becamemlxtend.preprocessing.standardizing
and now supports both NumPy arrays and pandas DataFrames; also, nowddof
parameters to set the degrees of freedom when calculating the standard deviation
Version 0.2.7 (2015-06-20)
- Added multilayer perceptron (feedforward artificial neural network) classifier as
mlxtend.classifier.NeuralNetMLP
. - Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to
mlxtend.data
Version 0.2.6 (2015-05-08)
- Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation)
- Added option for random weight initialization to logistic regression classifier and updated l2 regularization
- Added
wine
dataset tomlxtend.data
- Added
invert_axes
parametermlxtend.matplotlib.enrichtment_plot
to optionally plot the "Count" on the x-axis - New
verbose
parameter formlxtend.sklearn.EnsembleClassifier
by Alejandro C. Bahnsen - Added
mlxtend.pandas.standardizing
to standardize columns in a Pandas DataFrame - Added parameters
linestyles
andmarkers
tomlxtend.matplotlib.enrichment_plot
mlxtend.regression.lin_regplot
automatically adds np.newaxis and works w. python lists- Added tokenizers:
mlxtend.text.extract_emoticons
andmlxtend.text.extract_words_and_emoticons
Version 0.2.5 (2015-04-17)
- Added Sequential Backward Selection (mlxtend.sklearn.SBS)
- Added
X_highlight
parameter tomlxtend.evaluate.plot_decision_regions
for highlighting test data points. - Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression.
- Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas
DataFrame
s. - Added mlxtend.matplotlib.enrichment_plot
Version 0.2.4 (2015-03-15)
- Added
scoring
tomlxtend.evaluate.learning_curves
(by user pfsq) - Fixed setup.py bug caused by the missing README.html file
- matplotlib.category_scatter for pandas DataFrames and Numpy arrays
Version 0.2.3 (2015-03-11)
- Added Logistic regression
- Gradient descent and stochastic gradient descent perceptron was changed to Adaline (Adaptive Linear Neuron)
- Perceptron and Adaline for {0, 1} classes
- Added
mlxtend.preprocessing.shuffle_arrays_unison
function to shuffle one or more NumPy arrays. - Added shuffle and random seed parameter to stochastic gradient descent classifier.
- Added
rstrip
parameter tomlxtend.file_io.find_filegroups
to allow trimming of base names. - Added
ignore_substring
parameter tomlxtend.file_io.find_filegroups
andfind_files
. - Replaced .rstrip in
mlxtend.file_io.find_filegroups
with more robust regex. - Gridsearch support for
mlxtend.sklearn.EnsembleClassifier
Version 0.2.2 (2015-03-01)
- Improved robustness of EnsembleClassifier.
- Extended plot_decision_regions() functionality for plotting 1D decision boundaries.
- Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions .
- evaluate.plot_learning_curves() function added.
- Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons.
Version 0.2.1 (2015-01-20)
- Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns.
- Slight update to the EnsembleClassifier interface (additional
voting
parameter) - Fixed EnsembleClassifier to return correct class labels if class labels are not integers from 0 to n.
- Added new matplotlib function to plot decision regions of classifiers.
Version 0.2.0 (2015-01-13)
- Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue.
- Added
recursive
search parameter to mlxtend.file_io.find_files. - Added
check_ext
parameter mlxtend.file_io.find_files to search based on file extensions. - Default parameter to ignore invisible files for mlxtend.file_io.find.
- Added
transform
andfit_transform
to theEnsembleClassifier
. - Added mlxtend.file_io.find_filegroups function.
Version 0.1.9 (2015-01-10)
- Implemented scikit-learn EnsembleClassifier (majority voting rule) class.
Version 0.1.8 (2015-01-07)
- Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.).
- Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates.
Version 0.1.7 (2015-01-07)
- Added text utilities with name generalization function.
- Added and file_io utilities.
Version 0.1.6 (2015-01-04)
- Added combinations and permutations estimators.
Version 0.1.5 (2014-12-11)
- Added
DenseTransformer
for pipelines and grid search.
Version 0.1.4 (2014-08-20)
mean_centering
function is now a Class that createsMeanCenterer
objects that can be used to fit data via thefit
method, and center data at the column means via thetransform
andfit_transform
method.
Version 0.1.3 (2014-08-19)
- Added
preprocessing
module andmean_centering
function.
Version 0.1.2 (2014-08-19)
- Added
matplotlib
utilities andremove_borders
function.
Version 0.1.1 (2014-08-13)
- Simplified code for ColumnSelector.