mlxtend version: 0.23.1
category_scatter
category_scatter(x, y, label_col, data, markers='sxo^v', colors=('blue', 'green', 'red', 'purple', 'gray', 'cyan'), alpha=0.7, markersize=20.0, legend_loc='best')
Scatter plot to plot categories in different colors/markerstyles.
Parameters
-
x
: str or intDataFrame column name of the x-axis values or integer for the numpy ndarray column index.
-
y
: strDataFrame column name of the y-axis values or integer for the numpy ndarray column index
-
data
: Pandas DataFrame object or NumPy ndarray. -
markers
: strMarkers that are cycled through the label category.
-
colors
: tupleColors that are cycled through the label category.
-
alpha
: float (default: 0.7)Parameter to control the transparency.
-
markersize
: float (default` : 20.0)Parameter to control the marker size.
-
legend_loc
: str (default: 'best')Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False
Returns
fig
: matplotlig.pyplot figure object
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/category_scatter/
checkerboard_plot
checkerboard_plot(ary, cell_colors=('white', 'black'), font_colors=('black', 'white'), fmt='%.1f', figsize=None, row_labels=None, col_labels=None, fontsize=None)
Plot a checkerboard table / heatmap via matplotlib.
Parameters
-
ary
: array-like, shape = [n, m]A 2D Nnumpy array.
-
cell_colors
: tuple or list (default: ('white', 'black'))Tuple or list containing the two colors of the checkerboard pattern.
-
font_colors
: tuple or list (default: ('black', 'white'))Font colors corresponding to the cell colors.
-
figsize
: tuple (default: (2.5, 2.5))Height and width of the figure
-
fmt
: str (default: '%.1f')Python string formatter for cell values. The default '%.1f' results in floats with 1 digit after the decimal point. Use '%d' to show numbers as integers.
-
row_labels
: list (default: None)List of the row labels. Uses the array row indices 0 to n by default.
-
col_labels
: list (default: None)List of the column labels. Uses the array column indices 0 to m by default.
-
fontsize
: int (default: None)Specifies the font size of the checkerboard table. Uses matplotlib's default if None.
Returns
fig
: matplotlib Figure object.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/checkerboard_plot/
ecdf
ecdf(x, y_label='ECDF', x_label=None, ax=None, percentile=None, ecdf_color=None, ecdf_marker='o', percentile_color='black', percentile_linestyle='--')
Plots an Empirical Cumulative Distribution Function
Parameters
-
x
: array or list, shape=[n_samples,]Array-like object containing the feature values
-
y_label
: str (default='ECDF')Text label for the y-axis
-
x_label
: str (default=None)Text label for the x-axis
-
ax
: matplotlib.axes.Axes (default: None)An existing matplotlib Axes. Creates one if ax=None
-
percentile
: float (default=None)Float between 0 and 1 for plotting a percentile threshold line
-
ecdf_color
: matplotlib color (default=None)Color for the ECDF plot; uses matplotlib defaults if None
-
ecdf_marker
: matplotlib marker (default='o')Marker style for the ECDF plot
-
percentile_color
: matplotlib color (default='black')Color for the percentile threshold if percentile is not None
-
percentile_linestyle
: matplotlib linestyle (default='--')Line style for the percentile threshold if percentile is not None
Returns
-
ax
: matplotlib.axes.Axes object -
percentile_threshold
: floatFeature threshold at the percentile or None if
percentile=None
-
percentile_count
: Number of if percentile is not NoneNumber of samples that have a feature less or equal than the feature threshold at a percentile threshold or None if
percentile=None
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/ecdf/
enrichment_plot
enrichment_plot(df, colors='bgrkcy', markers=' ', linestyles='-', alpha=0.5, lw=2, where='post', grid=True, count_label='Count', xlim='auto', ylim='auto', invert_axes=False, legend_loc='best', ax=None)
Plot stacked barplots
Parameters
-
df
: pandas.DataFrameA pandas DataFrame where columns represent the different categories. colors: str (default: 'bgrcky') The colors of the bars.
-
markers
: str (default: ' ')Matplotlib markerstyles, e.g, 'sov' for square,circle, and triangle markers.
-
linestyles
: str (default: '-')Matplotlib linestyles, e.g., '-,--' to cycle normal and dashed lines. Note that the different linestyles need to be separated by commas.
-
alpha
: float (default: 0.5)Transparency level from 0.0 to 1.0.
-
lw
: int or float (default: 2)Linewidth parameter.
-
where
: {'post', 'pre', 'mid'} (default: 'post')Starting location of the steps.
-
grid
: bool (default:True
)Plots a grid if True.
-
count_label
: str (default: 'Count')Label for the "Count"-axis.
-
xlim
: 'auto' or array-like [min, max] (default: 'auto')Min and maximum position of the x-axis range.
-
ylim
: 'auto' or array-like [min, max] (default: 'auto')Min and maximum position of the y-axis range.
-
invert_axes
: bool (default: False)Plots count on the x-axis if True.
-
legend_loc
: str (default: 'best')Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False
-
ax
: matplotlib axis, optional (default: None)Use this axis for plotting or make a new one otherwise
Returns
ax
: matplotlib axis
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/enrichment_plot/
heatmap
heatmap(matrix, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=True, row_names=None, column_names=None, column_name_rotation=45, cell_values=True, cell_fmt='.2f', cell_font_size=None, text_color_threshold=None)
Plot a heatmap via matplotlib.
Parameters
-
conf_mat
: array-like, shape = [n_rows, n_columns]And arbitrary 2D array.
-
hide_spines
: bool (default: False)Hides axis spines if True.
-
hide_ticks
: bool (default: False)Hides axis ticks if True
-
figsize
: tuple (default: (2.5, 2.5))Height and width of the figure
-
cmap
: matplotlib colormap (default:None
)Uses matplotlib.pyplot.cm.viridis if
None
-
colorbar
: bool (default: True)Shows a colorbar if True
-
row_names
: array-like, shape = [n_rows] (default: None)List of row names to be used as y-axis tick labels.
-
column_names
: array-like, shape = [n_columns] (default: None)List of column names to be used as x-axis tick labels.
-
column_name_rotation
: int (default: 45)Number of degrees for rotating column x-tick labels.
-
cell_values
: bool (default: True)Plots cell values if True.
-
cell_fmt
: string (default: '.2f')Format specification for cell values (if
cell_values=True
) -
cell_font_size
: int (default: None)Font size for cell values (if
cell_values=True
) -
text_color_threshold
: float (default: None)Threshold for the black/white text threshold of the text annotation. Default (None) tried to infer a good threshold automatically using
np.max(normed_matrix) / 2
.
Returns
-
fig, ax
: matplotlib.pyplot subplot objectsFigure and axis elements of the subplot.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/heatmap/
plot_confusion_matrix
plot_confusion_matrix(conf_mat, hide_spines=False, hide_ticks=False, figsize=None, cmap=None, colorbar=False, show_absolute=True, show_normed=False, norm_colormap=None, class_names=None, figure=None, axis=None, fontcolor_threshold=0.5)
Plot a confusion matrix via matplotlib.
Parameters
-
conf_mat
: array-like, shape = [n_classes, n_classes]Confusion matrix from evaluate.confusion matrix.
-
hide_spines
: bool (default: False)Hides axis spines if True.
-
hide_ticks
: bool (default: False)Hides axis ticks if True
-
figsize
: tuple (default: (2.5, 2.5))Height and width of the figure
-
cmap
: matplotlib colormap (default:None
)Uses matplotlib.pyplot.cm.Blues if
None
-
colorbar
: bool (default: False)Shows a colorbar if True
-
show_absolute
: bool (default: True)Shows absolute confusion matrix coefficients if True. At least one of
show_absolute
orshow_normed
must be True. -
show_normed
: bool (default: False)Shows normed confusion matrix coefficients if True. The normed confusion matrix coefficients give the proportion of training examples per class that are assigned the correct label. At least one of
show_absolute
orshow_normed
must be True. -
norm_colormap
: bool (default: False)Matplotlib color normalization object to normalize the color scale, e.g.,
matplotlib.colors.LogNorm()
. -
class_names
: array-like, shape = [n_classes] (default: None)List of class names. If not
None
, ticks will be set to these values. -
figure
: None or Matplotlib figure (default: None)If None will create a new figure.
-
axis
: None or Matplotlib figure axis (default: None)If None will create a new axis.
-
fontcolor_threshold
: Float (default: 0.5)Sets a threshold for choosing black and white font colors for the cells. By default all values larger than 0.5 times the maximum cell value are converted to white, and everything equal or smaller than 0.5 times the maximum cell value are converted to black.
Returns
-
fig, ax
: matplotlib.pyplot subplot objectsFigure and axis elements of the subplot.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_confusion_matrix/
plot_decision_regions
plot_decision_regions(X, y, clf, feature_index=None, filler_feature_values=None, filler_feature_ranges=None, ax=None, X_highlight=None, zoom_factor=1.0, legend=1, hide_spines=True, markers='s^oxv<>', colors='#1f77b4,#ff7f0e,#3ca02c,#d62728,#9467bd,#8c564b,#e377c2,#7f7f7f,#bcbd22,#17becf', scatter_kwargs=None, contourf_kwargs=None, contour_kwargs=None, scatter_highlight_kwargs=None, n_jobs=None)
Plot decision regions of a classifier.
Please note that this functions assumes that class labels are
labeled consecutively, e.g,. 0, 1, 2, 3, 4, and 5. If you have class
labels with integer labels > 4, you may want to provide additional colors
and/or markers as `colors` and `markers` arguments.
See https://matplotlib.org/examples/color/named_colors.html for more
information.
Parameters
-
X
: array-like, shape = [n_samples, n_features]Feature Matrix.
-
y
: array-like, shape = [n_samples]True class labels.
-
clf
: Classifier object.Must have a .predict method.
-
feature_index
: array-like (default: (0,) for 1D, (0, 1) otherwise)Feature indices to use for plotting. The first index in
feature_index
will be on the x-axis, the second index will be on the y-axis. -
filler_feature_values
: dict (default: None)Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted.
-
filler_feature_ranges
: dict (default: None)Only needed for number features > 2. Dictionary of feature index-value pairs for the features not being plotted. Will use the ranges provided to select training samples for plotting.
-
ax
: matplotlib.axes.Axes (default: None)An existing matplotlib Axes. Creates one if ax=None.
-
X_highlight
: array-like, shape = [n_samples, n_features] (default: None)An array with data points that are used to highlight samples in
X
. -
zoom_factor
: float (default: 1.0)Controls the scale of the x- and y-axis of the decision plot.
-
hide_spines
: bool (default: True)Hide axis spines if True.
-
legend
: int (default: 1)Integer to specify the legend location. No legend if legend is 0.
-
markers
: str (default: 's^oxv<>')Scatterplot markers.
-
colors
: str (default: 'red,blue,limegreen,gray,cyan')Comma separated list of colors.
-
scatter_kwargs
: dict (default: None)Keyword arguments for underlying matplotlib scatter function.
-
contourf_kwargs
: dict (default: None)Keyword arguments for underlying matplotlib contourf function.
-
contour_kwargs
: dict (default: None)Keyword arguments for underlying matplotlib contour function (which draws the lines between decision regions).
-
scatter_highlight_kwargs
: dict (default: None)Keyword arguments for underlying matplotlib scatter function.
-
n_jobs
: int or None, optional (default=None)The number of CPUs to use to do the computation using Python's multiprocessing library.
None
means 1.-1
means using all processors. New in v0.22.0.
Returns
ax
: matplotlib.axes.Axes object
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/
plot_learning_curves
plot_learning_curves(X_train, y_train, X_test, y_test, clf, train_marker='o', test_marker='^', scoring='misclassification error', suppress_plot=False, print_model=True, title_fontsize=12, style='default', legend_loc='best')
Plots learning curves of a classifier.
Parameters
-
X_train
: array-like, shape = [n_samples, n_features]Feature matrix of the training dataset.
-
y_train
: array-like, shape = [n_samples]True class labels of the training dataset.
-
X_test
: array-like, shape = [n_samples, n_features]Feature matrix of the test dataset.
-
y_test
: array-like, shape = [n_samples]True class labels of the test dataset.
-
clf
: Classifier object. Must have a .predict .fit method. -
train_marker
: str (default: 'o')Marker for the training set line plot.
-
test_marker
: str (default: '^')Marker for the test set line plot.
-
scoring
: str (default: 'misclassification error')If not 'misclassification error', accepts the following metrics (from scikit-learn): {'accuracy', 'average_precision', 'f1_micro', 'f1_macro', 'f1_weighted', 'f1_samples', 'log_loss', 'precision', 'recall', 'roc_auc', 'adjusted_rand_score', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'r2'}
-
suppress_plot=False
: bool (default: False)Suppress matplotlib plots if True. Recommended for testing purposes.
-
print_model
: bool (default: True)Print model parameters in plot title if True.
-
title_fontsize
: int (default: 12)Determines the size of the plot title font.
-
style
: str (default: 'default')Matplotlib style. For more styles, please see https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html
-
legend_loc
: str (default: 'best')Where to place the plot legend: {'best', 'upper left', 'upper right', 'lower left', 'lower right'}
Returns
errors
: (training_error, test_error): tuple of lists
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_learning_curves/
plot_linear_regression
plot_linear_regression(X, y, model=LinearRegression(), corr_func='pearsonr', scattercolor='blue', fit_style='k--', legend=True, xlim='auto')
Plot a linear regression line fit.
Parameters
-
X
: numpy array, shape = [n_samples,]Samples.
-
y
: numpy array, shape (n_samples,)Target values model: object (default: sklearn.linear_model.LinearRegression) Estimator object for regression. Must implement a .fit() and .predict() method. corr_func: str or function (default: 'pearsonr') Uses
pearsonr
from scipy.stats if corr_func='pearsonr'. to compute the regression slope. If not 'pearsonr', thecorr_func
, thecorr_func
parameter expects a function of the form func(, ) as inputs, which is expected to return a tuple (<correlation_coefficient>, <some_unused_value>)
. scattercolor: string (default: blue) Color of scatter plot points. fit_style: string (default: k--) Style for the line fit. legend: bool (default: True) Plots legend with corr_coeff coef., fit coef., and intercept values. xlim: array-like (x_min, x_max) or 'auto' (default: 'auto') X-axis limits for the linear line fit.
Returns
-
regression_fit
: tupleintercept, slope, corr_coeff (float, float, float)
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_linear_regression/
plot_pca_correlation_graph
plot_pca_correlation_graph(X, variables_names, dimensions=(1, 2), figure_axis_size=6, X_pca=None, explained_variance=None)
Compute the PCA for X and plots the Correlation graph
Parameters
-
X
: 2d array like.The columns represent the different variables and the rows are the samples of thos variables
-
variables_names
: array likeName of the columns (the variables) of X
dimensions: tuple with two elements. dimensions to be plotted (x,y)
figure_axis_size : size of the final frame. The figure created is a square with length and width equal to figure_axis_size.
-
X_pca
: np.ndarray, shape = [n_samples, n_components].Optional.
X_pca
is the matrix of the transformed components from X. If not provided, the function computes PCA automatically using mlxtend.feature_extraction.PrincipalComponentAnalysis Expectedn_componentes >= max(dimensions)
-
explained_variance
: 1 dimension np.ndarray, length = n_componentsOptional.
explained_variance
are the eigenvalues from the diagonalized covariance matrix on the PCA transformatiopn. If not provided, the function computes PCA independently Expectedn_componentes == X.shape[1]
Returns
matplotlib_figure, correlation_matrix
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/
plot_sequential_feature_selection
plot_sequential_feature_selection(metric_dict, figsize=None, kind='std_dev', color='blue', bcolor='steelblue', marker='o', alpha=0.2, ylabel='Performance', confidence_interval=0.95)
Plot feature selection results.
Parameters
-
metric_dict
: mlxtend.SequentialFeatureSelector.get_metric_dict() object -
figsize
: tuple (default: None)Height and width of the figure
-
kind
: str (default: "std_dev")The kind of error bar or confidence interval in {'std_dev', 'std_err', 'ci', None}.
-
color
: str (default: "blue")Color of the lineplot (accepts any matplotlib color name)
-
bcolor
: str (default: "steelblue").Color of the error bars / confidence intervals (accepts any matplotlib color name).
-
marker
: str (default: "o")Marker of the line plot (accepts any matplotlib marker name).
-
alpha
: float in [0, 1] (default: 0.2)Transparency of the error bars / confidence intervals.
-
ylabel
: str (default: "Performance")Y-axis label.
-
confidence_interval
: float (default: 0.95)Confidence level if
kind='ci'
.
Returns
fig
: matplotlib.pyplot.figure() object
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/plot_sequential_feature_selection/
remove_borders
remove_borders(axes, left=False, bottom=False, right=True, top=True)
Remove chart junk from matplotlib plots.
Parameters
-
axes
: iterableAn iterable containing plt.gca() or plt.subplot() objects, e.g. [plt.gca()].
-
left
: bool (default:False
)Hide left axis spine if True.
-
bottom
: bool (default:False
)Hide bottom axis spine if True.
-
right
: bool (default:True
)Hide right axis spine if True.
-
top
: bool (default:True
)Hide top axis spine if True.
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/remove_chartjunk/
scatter_hist
scatter_hist(x, y, xlabel=None, ylabel=None, figsize=(5, 5))
Scatter plot and individual feature histograms along axes.
Parameters
-
x
: 1D array-like or Pandas SeriesX-axis values.
-
y
: 1D array-like or Pandas SeriesY-axis values.
-
xlabel
: str (default:None
)Label for the X-axis values. If
x
is a pandas Series, andxlabel
isNone
, the label is inferred automatically. -
ylabel
: str (default:None
)Label for the X-axis values. If
y
is a pandas Series, andylabel
isNone
, the label is inferred automatically. -
figsize
: tuple (default:(5, 5)
)Matplotlib figure size.
Returns
plot
: Matplotlib Figure object
scatterplotmatrix
scatterplotmatrix(X, fig_axes=None, names=None, figsize=(8, 8), alpha=1.0, kwargs)
Lower triangular of a scatterplot matrix
Parameters
-
X
: array-like, shape={num_examples, num_features}Design matrix containing data instances (examples) with multiple exploratory variables (features).
-
fix_axes
: tuple (default: None)A
(fig, axes)
tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplotsubplot
functionfig, axes = plt.subplots(...)
-
names
: list (default: None)A list of string names, which should have the same number of elements as there are features (columns) in
X
. -
figsize
: tuple (default: (8, 8))Height and width of the subplot grid. Ignored if fig_axes is not
None
. -
alpha
: float (default: 1.0)Transparency for both the scatter plots and the histograms along the diagonal.
-
**kwargs
: kwargsKeyword arguments for the scatterplots.
Returns
-
fix_axes
: tupleA
(fig, axes)
tuple, where fig is an figure object and axes is an axes object created via matplotlib, for example, by calling the pyplotsubplot
functionfig, axes = plt.subplots(...)
Examples
For more usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/scatterplotmatrix/
stacked_barplot
stacked_barplot(df, bar_width='auto', colors='bgrcky', labels='index', rotation=90, legend_loc='best')
Function to plot stacked barplots
Parameters
-
df
: pandas.DataFrameA pandas DataFrame where the index denotes the x-axis labels, and the columns contain the different measurements for each row. bar_width: 'auto' or float (default: 'auto') Parameter to set the widths of the bars. if 'auto', the width is automatically determined by the number of columns in the dataset. colors: str (default: 'bgrcky') The colors of the bars. labels: 'index' or iterable (default: 'index') If 'index', the DataFrame index will be used as x-tick labels. rotation: int (default: 90) Parameter to rotate the x-axis labels.
-
legend_loc
: str (default: 'best')Location of the plot legend {best, upper left, upper right, lower left, lower right} No legend if legend_loc=False
Returns
fig
: matplotlib.pyplot figure object
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/plotting/stacked_barplot/