association_rules
association_rules(df: pandas.core.frame.DataFrame, num_itemsets: Union[int, NoneType] = 1, df_orig: Union[pandas.core.frame.DataFrame, NoneType] = None, null_values=False, metric='confidence', min_threshold=0.8, support_only=False, return_metrics: list = ['antecedent support', 'consequent support', 'support', 'confidence', 'lift', 'representativity', 'leverage', 'conviction', 'zhangs_metric', 'jaccard', 'certainty', 'kulczynski']) -> pandas.core.frame.DataFrame
Generates a DataFrame of association rules including the metrics 'score', 'confidence', and 'lift'
Parameters
-
df
: pandas DataFramepandas DataFrame of frequent itemsets with columns ['support', 'itemsets']
-
df_orig
: pandas DataFrame (default: None)DataFrame with original input data. Only provided when null_values exist
-
num_itemsets
: int (default: 1)Number of transactions in original input data (df_orig)
-
null_values
: bool (default: False)In case there are null values as NaNs in the original input data
-
metric
: string (default: 'confidence')Metric to evaluate if a rule is of interest. Automatically set to 'support' if
support_only=True
. Otherwise, supported metrics are 'support', 'confidence', 'lift',
'leverage', 'conviction' and 'zhangs_metric' These metrics are computed as follows:
- support(A->C) = support(A+C) [aka 'support'], range: [0, 1]
- confidence(A->C) = support(A+C) / support(A), range: [0, 1]
- lift(A->C) = confidence(A->C) / support(C), range: [0, inf]
- leverage(A->C) = support(A->C) - support(A)*support(C),
range: [-1, 1]
- conviction = [1 - support(C)] / [1 - confidence(A->C)],
range: [0, inf]
- zhangs_metric(A->C) =
leverage(A->C) / max(support(A->C)*(1-support(A)), support(A)*(support(C)-support(A->C)))
range: [-1,1]
-
min_threshold
: float (default: 0.8)Minimal threshold for the evaluation metric, via the
metric
parameter, to decide whether a candidate rule is of interest. -
support_only
: bool (default: False)Only computes the rule support and fills the other metric columns with NaNs. This is useful if:
a) the input DataFrame is incomplete, e.g., does not contain support values for all rule antecedents and consequents
b) you simply want to speed up the computation because you don't need the other metrics.
Returns
pandas DataFrame with columns "antecedents" and "consequents"
that store itemsets, plus the scoring metric columns:
"antecedent support", "consequent support",
"support", "confidence", "lift",
"leverage", "conviction"
of all rules for which
metric(rule) >= min_threshold.
Each entry in the "antecedents" and "consequents" columns are
of type frozenset
, which is a Python built-in type that
behaves similarly to sets except that it is immutable
(For more info, see
https://docs.python.org/3.6/library/stdtypes.html#frozenset).
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/