Summaries from Neural Information Processing Systems Conference on ShortScience.org

nips.djvuzone.org
sci-hub
scholar.google.com

Extracting Tree-Structured Representations of Trained Networks
Craven, Mark and Shavlik, Jude W.
Neural Information Processing Systems Conference - 1995 via Local Bibsonomy
Keywords: dblp

[link] Summary by Cubs Reading Group 8 years ago

#### Problem addressed:
Neural nets are good at learning complex functions but they are blackboxes since they cannot be used to reason why a decision is made. This paper addresses this problem by learning a decision tree that approximates the hypothesis learned by the NN.

#### Summary:
While NNs have been shown to be able to learn complex functions efficiently, we cannot find the logic/reason why a particular test sample is classified into a particular class. Thus when a net when a mistake, we cannot know the reason behind it to rectify it. This raises critical concerns regarding the reliability of such learning algorithms. This paper introduces the idea of inducing a decision tree that approximates the hypothesis of the trained network. This has 2 advantages:
1. the decision tree can query as many test cases as it needs from the trained NN as the task is to learn the parent hypothesis.
2. This implies much more training examples for leaf nodes as compared to conventional tree algorithms.

Another difference from traditional methods is that for splitting a node, the authors use 'm-of-n expressions'. Here, a conjunction of n conditions are used at least m of which must be satisfied to go on one side or the other. These conditions are derived in a greedy manner.

The evaluation criteria of a node is to check its fidelity with the NN and check how many training examples reach a node. If too many examples reach a node then it is not performing the task of classification.

#### Novelty:
learning a decision tree to approximate the hypothesis learned by a NN, the algorithm to train the decision tree

#### Drawbacks:
1. As this paper is from 1996, the empirical results are very weak since the dataset sizes are within a few hundreds while the feature dimensions are less than 100.
2. The algorithm is also not explained very clearly.
3. No qualitative results are shown for the core claim of reasoning the decisions of a NN.

#### Datasets:
Congressional voting dataset, Cleveland heart-disease dataset, UC-Irvine database, Un-named dataset for 'recognizing protein-coding regions in DNA'

#### Additional remarks:
Since every node is a raw feature and reasoning is being done on the same, this algorithm seems more suitable for Medical data.

A good extension of this work would be to learn a decision tree that approximates a NN hypothesis in a more semantic feature space.

#### Presenter:
Devansh Arpit