Metrics for Splitting
- Gini Impurity: Measures the likelihood of an incorrect classification of a new instance if it was randomly classified according to the distribution of classes in the dataset.
- [Tex]\text{Gini} = 1 – \sum_{i=1}^{n} (p_i)^2 [/Tex], where pi is the probability of an instance being classified into a particular class.
- Entropy: Measures the amount of uncertainty or impurity in the dataset.
- [Tex]\text{Entropy} = -\sum_{i=1}^{n} p_i \log_2 (p_i) [/Tex], where pi is the probability of an instance being classified into a particular class.
- Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split on an attribute.
- [Tex]\text{InformationGain} = \text{Entropy}_\text{parent} – \sum_{i=1}^{n} \left( \frac{|D_i|}{|D|} \ast \text{Entropy}(D_i) \right) [/Tex], where Di is the subset of D after splitting by an attribute.
Decision Tree
Decision trees are a popular and powerful tool used in various fields such as machine learning, data mining, and statistics. They provide a clear and intuitive way to make decisions based on data by modeling the relationships between different variables. This article is all about what decision trees are, how they work, their advantages and disadvantages, and their applications.
Contact Us