Classification
- class rulekit.classification.RuleClassifier(minsupp_new: float = 5.0, induction_measure: Measures = Measures.Correlation, pruning_measure: Measures | str = Measures.Correlation, voting_measure: Measures = Measures.Correlation, max_growing: float = 0.0, enable_pruning: bool = True, ignore_missing: bool = False, max_uncovered_fraction: float = 0.0, select_best_candidate: bool = False, complementary_conditions: bool = False, control_apriori_precision: bool = True, max_rule_count: int = 0, approximate_induction: bool = False, approximate_bins_count: int = 100)
Classification model.
- Parameters:
minsupp_new (float = 5.0) –
- a minimum number (or fraction, if value < 1.0) of previously uncovered examples
to be covered by a new rule (positive examples for classification problems); default: 5,
induction_measure (
rulekit.params.Measures=rulekit.params. Measures.Correlation) – measure used during induction; default measure is correlationpruning_measure (Union[
rulekit.params.Measures, str] =rulekit.params.Measures.Correlation) –- measure used during pruning. Could be user defined (string), for example
2 * p / n; default measure is correlation
voting_measure (
rulekit.params.Measures=rulekit.params.Measures.Correlation) – measure used during voting; default measure is correlationmax_growing (int = 0.0) – non-negative integer representing maximum number of conditions which can be added to the rule in the growing phase (use this parameter for large datasets if execution time is prohibitive); 0 indicates no limit; default: 0,
enable_pruning (bool = True) – enable or disable pruning, default is True.
ignore_missing (bool = False) –
boolean telling whether missing values should be ignored (by default, a missing value of given attribute is always cconsidered as not fulfilling the condition build upon
that attribute); default: False.
max_uncovered_fraction (float = 0.0) – Floating-point number from [0,1] interval representing maximum fraction of examples that may remain uncovered by the rule set, default: 0.0.
select_best_candidate (bool = False) – Flag determining if best candidate should be selected from growing phase; default: False.
complementary_conditions (bool = False) – If enabled, complementary conditions in the form a = !{value} for nominal attributes are supported.
control_apriori_precision (bool = True) –
- When inducing classification rules, verify if candidate precision is higher than
apriori precision of the investigated class.
max_rule_count (int = 0) –
- Maximum number of rules to be generated (for classification data sets it applies
to a single class); 0 indicates no limit.
approximate_induction (bool = False) – Use an approximate induction heuristic which does not check all possible splits; note: this is an experimental feature and currently works only for classification data sets, results may change in future;
approximate_bins_count (int = 100) – maximum number of bins for an attribute evaluated in the approximate induction.
- add_event_listener(listener: RuleInductionProgressListener)
- Add event listener object to the operator which allows to monitor
rule induction progress.
Example
>>> from rulekit.events import RuleInductionProgressListener >>> from rulekit.classification import RuleClassifier >>> >>> class MyEventListener(RuleInductionProgressListener): >>> def on_new_rule(self, rule): >>> print('Do something with new rule', rule) >>> >>> operator = RuleClassifier() >>> operator.add_event_listener(MyEventListener())
- Parameters:
listener (RuleInductionProgressListener) – listener object
- fit(values: ndarray | DataFrame | list, labels: ndarray | DataFrame | list) RuleClassifier
Train model on given dataset.
- Parameters:
values (
rulekit.operator.Data) – attributeslabels (
rulekit.operator.Data) – labels
- Returns:
self
- Return type:
- get_coverage_matrix(values: ndarray | DataFrame | list) ndarray
Calculates coverage matrix for ruleset.
- Parameters:
values (
rulekit.operator.Data) – dataset- Returns:
coverage_matrix – Each row of the matrix represent single example from dataset and every column represent on rule from rule set. Value 1 in the matrix cell means that rule covered certain
example, value 0 means that it doesn’t.
- Return type:
np.ndarray
- get_metadata_routing() None
Warning
Scikit-learn metadata routing is not supported yet.
- Raises:
NotImplementedError – _description_
- get_params(deep: bool = True) dict[str, Any]
- Parameters:
deep (
rulekit.operator.Data) – Parameter for scikit-learn compatibility. Not used.- Returns:
hyperparameters – Dictionary containing model hyperparameters.
- Return type:
np.ndarray
- predict(values: ndarray | DataFrame | list, return_metrics: bool = False) ndarray | tuple[ndarray, ClassificationPredictionMetrics]
Perform prediction and returns predicted labels.
- Parameters:
values (
rulekit.operator.Data) – attributesreturn_metrics (bool = False) –
- Optional flag. If set to True method will calculate some additional model metrics.
Method will then return tuple instead of just predicted labels.
- Returns:
result –
- If return_metrics flag wasn’t set it will return just prediction, otherwise a tuple
will be returned with first element being prediction and second one being metrics.
- Return type:
Union[np.ndarray, tuple[np.ndarray,
rulekit.classification.ClassificationPredictionMetrics]]
- predict_proba(values: ndarray | DataFrame | list, return_metrics: bool = False) ndarray | tuple[ndarray, ClassificationPredictionMetrics]
Perform prediction and returns class probabilities for each example.
- Parameters:
values (
rulekit.operator.Data) – attributesreturn_metrics (bool = False) –
- Optional flag. If set to True method will calculate some additional model metrics.
Method will then return tuple instead of just probabilities.
- Returns:
result – If return_metrics flag wasn’t set it will return just probabilities matrix, otherwise a tuple will be returned with first element being prediction and second one being metrics.
- Return type:
Union[np.ndarray, tuple[np.ndarray,
rulekit.classification.ClassificationPredictionMetrics]]
- score(values: ndarray | DataFrame | list, labels: ndarray | DataFrame | list) float
Return the accuracy on the given test data and labels.
- Parameters:
values (
rulekit.operator.Data) – attributeslabels (
rulekit.operator.Data) – true labels
- Returns:
score – Accuracy of self.predict(values) wrt. labels.
- Return type:
float
- set_params(**kwargs) object
Set models hyperparameters. Parameters are the same as in constructor.
- class rulekit.classification.ExpertRuleClassifier(minsupp_new: float = 5.0, induction_measure: Measures = Measures.Correlation, pruning_measure: Measures | str = Measures.Correlation, voting_measure: Measures = Measures.Correlation, max_growing: float = 0.0, enable_pruning: bool = True, ignore_missing: bool = False, max_uncovered_fraction: float = 0.0, select_best_candidate: bool = False, complementary_conditions: bool = False, control_apriori_precision: bool = True, max_rule_count: int = 0, approximate_induction: bool = False, approximate_bins_count: int = 100, extend_using_preferred: bool = False, extend_using_automatic: bool = False, induce_using_preferred: bool = False, induce_using_automatic: bool = False, consider_other_classes: bool = False, preferred_conditions_per_rule: int = 2147483647, preferred_attributes_per_rule: int = 2147483647)
Classification model using expert knowledge.
- Parameters:
minsupp_new (float = 5.0) –
- a minimum number (or fraction, if value < 1.0) of previously uncovered examples
to be covered by a new rule (positive examples for classification problems); default: 5,
induction_measure (
rulekit.params.Measures=rulekit.params.Measures.Correlation) – measure used during induction; default measure is correlationpruning_measure (Union[
rulekit.params.Measures, str] =rulekit.params.Measures.Correlation) –- measure used during pruning. Could be user defined (string), for example
2 * p / n; default measure is correlation
voting_measure (
rulekit.params.Measures=rulekit.params.Measures.Correlation) – measure used during voting; default measure is correlationmax_growing (int = 0.0) – non-negative integer representing maximum number of conditions which can be added to the rule in the growing phase (use this parameter for large datasets if execution time is prohibitive); 0 indicates no limit; default: 0,
enable_pruning (bool = True) – enable or disable pruning, default is True.
ignore_missing (bool = False) –
boolean telling whether missing values should be ignored (by default, a missing value of given attribute is always considered as not fulfilling the condition build upon
that attribute); default: False.
max_uncovered_fraction (float = 0.0) – Floating-point number from [0,1] interval representing maximum fraction of examples that may remain uncovered by the rule set, default: 0.0.
select_best_candidate (bool = False) –
- Flag determining if best candidate should be selected from growing phase; default:
False.
complementary_conditions (bool = False) – If enabled, complementary conditions in the form a = !{value} for nominal attributes are supported.
control_apriori_precision (bool = True) –
- When inducing classification rules, verify if candidate precision is higher than
apriori precision of the investigated class.
max_rule_count (int = 0) –
- Maximum number of rules to be generated (for classification data sets it applies
to a single class); 0 indicates no limit.
approximate_induction (bool = False) – Use an approximate induction heuristic which does not check all possible splits; note: this is an experimental feature and currently works only for classification data sets, results may change in future;
approximate_bins_count (int = 100) – maximum number of bins for an attribute evaluated in the approximate induction.
extend_using_preferred (bool = False) – boolean indicating whether initial rules should be extended with a use of preferred conditions and attributes; default is False
extend_using_automatic (bool = False) –
- boolean indicating whether initial rules should be extended with a use of automatic
conditions and attributes; default is False
induce_using_preferred (bool = False) –
- boolean indicating whether new rules should be induced with a use of preferred
conditions and attributes; default is False
induce_using_automatic (bool = False) –
- boolean indicating whether new rules should be induced with a use of automatic
conditions and attributes; default is False
consider_other_classes (bool = False) –
- boolean indicating whether automatic induction should be performed for classes for
which no user’s knowledge has been defined (classification only); default is False.
preferred_conditions_per_rule (int = None) – maximum number of preferred conditions per rule; default: unlimited,
preferred_attributes_per_rule (int = None) – maximum number of preferred attributes per rule; default: unlimited.
- add_event_listener(listener: RuleInductionProgressListener)
- Add event listener object to the operator which allows to monitor
rule induction progress.
Example
>>> from rulekit.events import RuleInductionProgressListener >>> from rulekit.classification import RuleClassifier >>> >>> class MyEventListener(RuleInductionProgressListener): >>> def on_new_rule(self, rule): >>> print('Do something with new rule', rule) >>> >>> operator = RuleClassifier() >>> operator.add_event_listener(MyEventListener())
- Parameters:
listener (RuleInductionProgressListener) – listener object
- fit(values: ndarray | DataFrame | list, labels: ndarray | DataFrame | list, expert_rules: list[str | tuple[str, str]] | None = None, expert_preferred_conditions: list[str | tuple[str, str]] | None = None, expert_forbidden_conditions: list[str | tuple[str, str]] | None = None) ExpertRuleClassifier
Train model on given dataset.
- Parameters:
values (
rulekit.operator.Data) – attributeslabels (
rulekit.operator.Data) – labelsexpert_rules (List[Union[str, Tuple[str, str]]]) – set of initial rules, either passed as a list of strings representing rules or as list of tuples where first element is name of the rule and second one is rule string.
expert_preferred_conditions (List[Union[str, Tuple[str, str]]]) – multiset of preferred conditions (used also for specifying preferred attributes by using special value Any). Either passed as a list of strings representing rules or as list of tuples where first element is name of the rule and second one is rule string.
expert_forbidden_conditions (List[Union[str, Tuple[str, str]]]) –
- set of forbidden conditions (used also for specifying forbidden attributes by using
special valye Any). Either passed as a list of strings representing rules or as list
of tuples where first element is name of the rule and second one is rule string.
- Returns:
self
- Return type:
- get_coverage_matrix(values: ndarray | DataFrame | list) ndarray
Calculates coverage matrix for ruleset.
- Parameters:
values (
rulekit.operator.Data) – dataset- Returns:
coverage_matrix – Each row of the matrix represent single example from dataset and every column represent on rule from rule set. Value 1 in the matrix cell means that rule covered certain
example, value 0 means that it doesn’t.
- Return type:
np.ndarray
- get_metadata_routing() None
Warning
Scikit-learn metadata routing is not supported yet.
- Raises:
NotImplementedError – _description_
- get_params(deep: bool = True) dict[str, Any]
- Parameters:
deep (
rulekit.operator.Data) – Parameter for scikit-learn compatibility. Not used.- Returns:
hyperparameters – Dictionary containing model hyperparameters.
- Return type:
np.ndarray
- predict(values: ndarray | DataFrame | list, return_metrics: bool = False) ndarray | tuple[ndarray, ClassificationPredictionMetrics]
Perform prediction and returns predicted labels.
- Parameters:
values (
rulekit.operator.Data) – attributesreturn_metrics (bool = False) –
- Optional flag. If set to True method will calculate some additional model metrics.
Method will then return tuple instead of just predicted labels.
- Returns:
result –
- If return_metrics flag wasn’t set it will return just prediction, otherwise a tuple
will be returned with first element being prediction and second one being metrics.
- Return type:
Union[np.ndarray, tuple[np.ndarray,
rulekit.classification.ClassificationPredictionMetrics]]
- predict_proba(values: ndarray | DataFrame | list, return_metrics: bool = False) ndarray | tuple[ndarray, ClassificationPredictionMetrics]
Perform prediction and returns class probabilities for each example.
- Parameters:
values (
rulekit.operator.Data) – attributesreturn_metrics (bool = False) –
- Optional flag. If set to True method will calculate some additional model metrics.
Method will then return tuple instead of just probabilities.
- Returns:
result – If return_metrics flag wasn’t set it will return just probabilities matrix, otherwise a tuple will be returned with first element being prediction and second one being metrics.
- Return type:
Union[np.ndarray, tuple[np.ndarray,
rulekit.classification.ClassificationPredictionMetrics]]
- score(values: ndarray | DataFrame | list, labels: ndarray | DataFrame | list) float
Return the accuracy on the given test data and labels.
- Parameters:
values (
rulekit.operator.Data) – attributeslabels (
rulekit.operator.Data) – true labels
- Returns:
score – Accuracy of self.predict(values) wrt. labels.
- Return type:
float
- set_params(**kwargs) object
Set models hyperparameters. Parameters are the same as in constructor.