{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# What's new in RuleKit version 1.7.6?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Manually initializing RuleKit is not longer necessary.\n", "\n", "Prior to this version, RuleKit had to be manually initialised using the `rulekit.RuleKit.init` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from rulekit import RuleKit\n", "from rulekit.classification import RuleClassifier\n", "\n", "RuleKit.init()\n", "\n", "clf = RuleClassifier()\n", "clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it is no longer necessary, and you can simply use any of the RuleKit operators directly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from rulekit.classification import RuleClassifier\n", "\n", "clf = RuleClassifier()\n", "clf.fit(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Introducing negated conditions for nominal attributes in rules.\n", "\n", "Using the new `complementary_conditions` parameter, the induction of negated conditions for nominal attributes can be enabled. Such conditions are of the form **attribute = !{value}**. This parameter has been added to all operator classes." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "IF stalk_surface_below_ring = !{y} AND spore_print_color = !{u} AND odor = !{n} AND stalk_root = !{c} THEN type = {p}\n", "IF bruises = {f} AND odor = !{n} THEN type = {p}\n", "IF stalk_surface_above_ring = {k} AND gill_spacing = {c} THEN type = {p}\n", "IF bruises = {f} AND stalk_surface_above_ring = !{f} AND stalk_surface_below_ring = !{f} AND ring_number = !{t} AND stalk_root = !{e} AND gill_attachment = {f} THEN type = {p}\n", "IF stalk_surface_below_ring = !{f} AND stalk_color_below_ring = !{n} AND spore_print_color = !{u} AND odor = !{a} AND gill_size = {n} AND cap_surface = !{f} THEN type = {p}\n", "IF cap_shape = !{s} AND cap_color = !{c} AND habitat = !{w} AND stalk_color_below_ring = !{g} AND stalk_surface_below_ring = !{y} AND spore_print_color = !{n} AND gill_spacing = {c} AND gill_color = !{u} AND stalk_root = !{c} AND stalk_color_above_ring = !{g} AND ring_type = !{f} AND veil_color = {w} THEN type = {p}\n", "IF cap_shape = !{c} AND stalk_surface_below_ring = !{y} AND spore_print_color = !{r} AND odor = {n} AND cap_surface = !{g} THEN type = {e}\n", "IF cap_color = !{y} AND cap_shape = !{c} AND stalk_color_below_ring = !{y} AND spore_print_color = !{r} AND odor = {n} AND cap_surface = !{g} THEN type = {e}\n", "IF spore_print_color = !{r} AND odor = !{f} AND stalk_color_above_ring = !{c} AND gill_size = {b} THEN type = {e}\n", "IF cap_color = !{p} AND cap_shape = !{c} AND habitat = !{u} AND stalk_color_below_ring = !{y} AND gill_color = !{b} AND spore_print_color = !{r} AND ring_number = !{n} AND odor = !{f} AND cap_surface = !{g} THEN type = {e}\n" ] } ], "source": [ "import pandas as pd\n", "from rulekit.classification import RuleClassifier\n", "\n", "df = pd.read_csv('https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/mushrooms.csv')\n", "X = df.drop('type', axis=1)\n", "y = df['type']\n", "\n", "clf = RuleClassifier(complementary_conditions=True)\n", "clf.fit(X, y)\n", "\n", "for rule in clf.model.rules:\n", " print(rule)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Approximate induction for classification rulesets.\n", "\n", "To reduce the training time on the classification datasets, the so-called *approximate induction* can now be used. This will force the algorithm not to check all possible numerical conditions during the rule induction phase. You can configure the number of bins you want to use as possible splits to limit the calculation.\n", "\n", "To enable *aproximate induction* use the `approximate_induction` parameter. To configure the maximum number of bins, use the `approximate_bins_count` parameter. At the moment, *aproximate induction* is only available for classification rule sets.\n", "\n", "The following example shows how using this function can reduce training time without sacrificing predictive accuracy." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Variant | \n", "Training time [s] | \n", "BAcc on test dataset | \n", "
|---|---|---|---|
| 0 | \n", "Without approximate induction | \n", "5.730046 | \n", "0.688744 | \n", "
| 1 | \n", "Without approximate induction | \n", "0.142259 | \n", "0.703959 | \n", "