Classification

class rulekit.classification.RuleClassifier(minsupp_new: int = 5, induction_measure: rulekit.params.Measures = <Measures.Correlation: 'Correlation'>, pruning_measure: Union[rulekit.params.Measures, str] = <Measures.Correlation: 'Correlation'>, voting_measure: rulekit.params.Measures = <Measures.Correlation: 'Correlation'>, max_growing: float = 0.0, enable_pruning: bool = True, ignore_missing: bool = False, max_uncovered_fraction: float = 0.0, select_best_candidate: bool = False, min_rule_covered: Optional[int] = None)

Classification model.

Parameters:
  • minsupp_new (int = 5) – positive integer representing minimum number of previously uncovered examples to be covered by a new rule (positive examples for classification problems); default: 5

  • induction_measure (rulekit.params.Measures = rulekit.params.            Measures.Correlation) – measure used during induction; default measure is correlation

  • pruning_measure (Union[rulekit.params.Measures, str] = rulekit.params.Measures.Correlation) – measure used during pruning. Could be user defined (string), for example 2 * p / n; default measure is correlation

  • voting_measure (rulekit.params.Measures = rulekit.params.Measures.Correlation) – measure used during voting; default measure is correlation

  • max_growing (int = 0.0) – non-negative integer representing maximum number of conditions which can be added to the rule in the growing phase (use this parameter for large datasets if execution time is prohibitive); 0 indicates no limit; default: 0,

  • enable_pruning (bool = True) – enable or disable pruning, default is True.

  • ignore_missing (bool = False) – boolean telling whether missing values should be ignored (by default, a missing value of given attribute is always cconsidered as not fulfilling the condition build upon that attribute); default: False.

  • max_uncovered_fraction (float = 0.0) – Floating-point number from [0,1] interval representing maximum fraction of examples that may remain uncovered by the rule set, default: 0.0.

  • select_best_candidate (bool = False) – Flag determining if best candidate should be selected from growing phase; default: False.

  • min_rule_covered (int = None) –

    alias to minsupp_new. Parameter is deprecated and will be removed in the next major version, use minsupp_new

    Deprecated since version 1.7.0: Use parameter minsupp_new instead.

fit(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], labels: Union[numpy.ndarray, pandas.core.frame.DataFrame, list])rulekit.classification.RuleClassifier

Train model on given dataset.

Parameters:
Returns:

self

Return type:

RuleClassifier

get_coverage_matrix(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list])numpy.ndarray

Calculates coverage matrix for ruleset.

Parameters:

values (rulekit.operator.Data) – dataset

Returns:

coverage_matrix – Each row of the matrix represent single example from dataset and every column represent on rule from rule set. Value 1 in the matrix cell means that rule covered certain example, value 0 means that it doesn’t.

Return type:

np.ndarray

get_params()dict
Returns:

hyperparameters – Dictionary containing model hyperparameters.

Return type:

np.ndarray

predict(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], return_metrics: bool = False)Union[numpy.ndarray, tuple]

Perform prediction and returns predicted labels.

Parameters:
  • values (rulekit.operator.Data) – attributes

  • return_metrics (bool = False) – Optional flag. If set to True method will calculate some additional model metrics. Method will then return tuple instead of just predicted labels.

Returns:

result – If return_metrics flag wasn’t set it will return just prediction, otherwise a tuple will be returned with first element being prediction and second one being metrics.

Return type:

Union[np.ndarray, Tuple[np.ndarray, Dict[str, float]]]

predict_proba(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], return_metrics: bool = False)Union[numpy.ndarray, tuple]

Perform prediction and returns class probabilities for each example.

Parameters:
  • values (rulekit.operator.Data) – attributes

  • return_metrics (bool = False) – Optional flag. If set to True method will calculate some additional model metrics. Method will then return tuple instead of just probabilities.

Returns:

result – If return_metrics flag wasn’t set it will return just probabilities matrix, otherwise a tuple will be returned with first element being prediction and second one being metrics.

Return type:

Union[np.ndarray, Tuple[np.ndarray, Dict[str, float]]]

score(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], labels: Union[numpy.ndarray, pandas.core.frame.DataFrame, list])float

Return the accuracy on the given test data and labels.

Parameters:
Returns:

score – Accuracy of self.predict(values) wrt. labels.

Return type:

float

set_params(**kwargs)object

Set models hyperparameters. Parameters are the same as in constructor.

class rulekit.classification.ExpertRuleClassifier(minsupp_new: int = 5, induction_measure: rulekit.params.Measures = <Measures.Correlation: 'Correlation'>, pruning_measure: Union[rulekit.params.Measures, str] = <Measures.Correlation: 'Correlation'>, voting_measure: rulekit.params.Measures = <Measures.Correlation: 'Correlation'>, max_growing: float = 0.0, enable_pruning: bool = True, ignore_missing: bool = False, max_uncovered_fraction: float = 0.0, select_best_candidate: bool = False, extend_using_preferred: Optional[bool] = None, extend_using_automatic: Optional[bool] = None, induce_using_preferred: Optional[bool] = None, induce_using_automatic: Optional[bool] = None, consider_other_classes: Optional[bool] = None, preferred_conditions_per_rule: Optional[int] = None, preferred_attributes_per_rule: Optional[int] = None, min_rule_covered: Optional[int] = None)

Classification model using expert knowledge.

Parameters:
  • minsupp_new (int = 5) – positive integer representing minimum number of previously uncovered examples to be covered by a new rule (positive examples for classification problems); default: 5

  • induction_measure (rulekit.params.Measures = rulekit.params.Measures.Correlation) – measure used during induction; default measure is correlation

  • pruning_measure (Union[rulekit.params.Measures, str] = rulekit.params.Measures.Correlation) – measure used during pruning. Could be user defined (string), for example 2 * p / n; default measure is correlation

  • voting_measure (rulekit.params.Measures = rulekit.params.Measures.Correlation) – measure used during voting; default measure is correlation

  • max_growing (int = 0.0) – non-negative integer representing maximum number of conditions which can be added to the rule in the growing phase (use this parameter for large datasets if execution time is prohibitive); 0 indicates no limit; default: 0,

  • enable_pruning (bool = True) – enable or disable pruning, default is True.

  • ignore_missing (bool = False) – boolean telling whether missing values should be ignored (by default, a missing value of given attribute is always considered as not fulfilling the condition build upon that attribute); default: False.

  • max_uncovered_fraction (float = 0.0) – Floating-point number from [0,1] interval representing maximum fraction of examples that may remain uncovered by the rule set, default: 0.0.

  • select_best_candidate (bool = False) – Flag determining if best candidate should be selected from growing phase; default: False.

  • extend_using_preferred (bool = False) – boolean indicating whether initial rules should be extended with a use of preferred conditions and attributes; default is False

  • extend_using_automatic (bool = False) – boolean indicating whether initial rules should be extended with a use of automatic conditions and attributes; default is False

  • induce_using_preferred (bool = False) – boolean indicating whether new rules should be induced with a use of preferred conditions and attributes; default is False

  • induce_using_automatic (bool = False) – boolean indicating whether new rules should be induced with a use of automatic conditions and attributes; default is False

  • consider_other_classes (bool = False) –

    boolean indicating whether automatic induction should be performed for classes for

    which no user’s knowledge has been defined (classification only); default is False.

  • preferred_conditions_per_rule (int = None) – maximum number of preferred conditions per rule; default: unlimited,

  • preferred_attributes_per_rule (int = None) – maximum number of preferred attributes per rule; default: unlimited.

  • min_rule_covered (int = None) –

    alias to minsupp_new. Parameter is deprecated and will be removed in the next major version, use minsupp_new

    Deprecated since version 1.7.0: Use parameter minsupp_new instead.

fit(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], labels: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], expert_rules: Optional[list] = None, expert_preferred_conditions: Optional[list] = None, expert_forbidden_conditions: Optional[list] = None)rulekit.classification.ExpertRuleClassifier

Train model on given dataset.

Parameters:
  • values (rulekit.operator.Data) – attributes

  • labels (rulekit.operator.Data) – labels

  • expert_rules (List[Union[str, Tuple[str, str]]]) – set of initial rules, either passed as a list of strings representing rules or as list of tuples where first element is name of the rule and second one is rule string.

  • expert_preferred_conditions (List[Union[str, Tuple[str, str]]]) – multiset of preferred conditions (used also for specifying preferred attributes by using special value Any). Either passed as a list of strings representing rules or as list of tuples where first element is name of the rule and second one is rule string.

  • expert_forbidden_conditions (List[Union[str, Tuple[str, str]]]) – set of forbidden conditions (used also for specifying forbidden attributes by using special valye Any). Either passed as a list of strings representing rules or as list of tuples where first element is name of the rule and second one is rule string.

Returns:

self

Return type:

ExpertRuleClassifier

get_coverage_matrix(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list])numpy.ndarray

Calculates coverage matrix for ruleset.

Parameters:

values (rulekit.operator.Data) – dataset

Returns:

coverage_matrix – Each row of the matrix represent single example from dataset and every column represent on rule from rule set. Value 1 in the matrix cell means that rule covered certain example, value 0 means that it doesn’t.

Return type:

np.ndarray

get_params()dict
Returns:

hyperparameters – Dictionary containing model hyperparameters.

Return type:

np.ndarray

predict(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], return_metrics: bool = False)Union[numpy.ndarray, tuple]

Perform prediction and returns predicted labels.

Parameters:
  • values (rulekit.operator.Data) – attributes

  • return_metrics (bool = False) – Optional flag. If set to True method will calculate some additional model metrics. Method will then return tuple instead of just predicted labels.

Returns:

result – If return_metrics flag wasn’t set it will return just prediction, otherwise a tuple will be returned with first element being prediction and second one being metrics.

Return type:

Union[np.ndarray, Tuple[np.ndarray, Dict[str, float]]]

predict_proba(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], return_metrics: bool = False)Union[numpy.ndarray, tuple]

Perform prediction and returns class probabilities for each example.

Parameters:
  • values (rulekit.operator.Data) – attributes

  • return_metrics (bool = False) – Optional flag. If set to True method will calculate some additional model metrics. Method will then return tuple instead of just probabilities.

Returns:

result – If return_metrics flag wasn’t set it will return just probabilities matrix, otherwise a tuple will be returned with first element being prediction and second one being metrics.

Return type:

Union[np.ndarray, Tuple[np.ndarray, Dict[str, float]]]

score(values: Union[numpy.ndarray, pandas.core.frame.DataFrame, list], labels: Union[numpy.ndarray, pandas.core.frame.DataFrame, list])float

Return the accuracy on the given test data and labels.

Parameters:
Returns:

score – Accuracy of self.predict(values) wrt. labels.

Return type:

float