# Boosting Machine Learning Adaboost Guide

**Boosting** is one of the most widely used classes of algorithms in machine learning, applied globally to tackle a variety of complex problems. It involves combining multiple **weak learners**—typically simple models that perform just slightly better than random guessing—to collectively form a strong predictive model. Each learner is trained sequentially, with a focus on correcting the mistakes made by its predecessors. Misclassified or hard-to-learn data points are given more importance in subsequent rounds. In this blog, we’ll explore one of the simplest and most well-known boosting algorithms, **AdaBoost** (Adaptive Boosting). We'll also implement it from scratch and demonstrate how it can significantly improve model accuracy.

Before diving into AdaBoost, let’s first understand the core principle of boosting through the following illustration.

[![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/boost.png?w=544 align="left")](https://criticalmind.tech.blog/wp-content/uploads/2021/09/9ca3e-1m2uhkzwwj0kfqyl5tbfnsq.png)

In the illustration above, Boxes 1, 2, and 3 represent classifications made by individual models—D1, D2, and D3—each of which is a weak learner and performs poorly when used alone. However, when these models are combined, as shown in Box 4, they work together to make highly accurate predictions.

This is the core idea behind **AdaBoost** as well. In AdaBoost, each model’s prediction is weighted, and these weights are updated during training based on whether the model classifies each instance correctly or not. Incorrectly classified instances are given higher weights, making them more influential in the next iteration.

We’ll walk through each step of the AdaBoost algorithm and its implementation. For practical understanding, we’ll use the **Breast Cancer** dataset from the `sklearn` library. Let's start by loading and visualizing the dataset.

```python
df = pd.DataFrame(load_breast_cancer().data, columns=load_breast_cancer().feature_names)
df['label'] = load_breast_cancer().target
 
copy = df.copy()
 
df.head()
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/da1.png?w=951 align="left")

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/da2.png?w=399 align="left")

Since the dataset contains a large number of features, we’ll primarily focus on the `label` column along with the new columns that will be appended next to it during the boosting process.

To begin, let’s use a **Decision Tree** as our initial weak learner to make a first-round prediction and observe the accuracy achieved.

```python
dt = DecisionTreeClassifier(random_state=50, min_samples_split=100)
dt.fit(df.iloc[:,:-1],df.iloc[:,-1])
df['prediction'] = dt.predict(df.iloc[:,:-1])
df.iloc[:5,-2:]
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/pr-1.png?w=139 align="left")

```bash
Accuracy = 0.945518453427065
We are getting an accuracy of around 95%.
```

Let’s now apply AdaBoost and see whether it improves the model's accuracy.

## Step 1

Assign an initial weight of 1\\m to each data point, where mmm is the total number of data points in the dataset.

```python
df['weight'] = 1/len(df)
df.iloc[:5,-3:]
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/wei.png?w=199 align="left")

## Step 2

Calculate the number of incorrectly classified data points.

```python
	
no_of_errors = len(df[df.label != df.prediction1])
no_of_errors
```

**There are 31 incorrectly classified data points.**

Next, let's calculate the total error, where the total error is given by:

**total error = no\_of\_errors/total number of data points**

```python
total_errors = no_of_errors/len(df)
total_errors
```

The total error is calculated to be **0.0545**.

### Step 3

Calculating the amount of say (α)

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/say.png?w=976 align="left")

```python
alpha = 0.5 * np.log((1-total_errors)/total_errors)
alpha
```

We get 1.423 as the value for the amount of say.

### Step 4

**The weight update is performed using the following rules:**

* For a correct prediction:
    
    **updated weight = old weight X e<sup>–α</sup>**
    
* For an incorrect prediction:
    
    **updated weight = old weight X e<sup>α</sup>**
    

```python
df['weight_updated'] = df.loc[df.label != df.prediction].weight * np.exp(alpha)
df.weight_updated = df['weight_updated'].fillna(df[df.label == df.prediction].weight * np.exp(-alpha))
df.iloc[:5,-4:]
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/upda.png?w=300 align="left")

The updated weights are then normalized, ensuring that the sum of all the weights in the column equals 1.

```python
df.weight_updated = df.weight_updated/df.weight_updated.sum()
df.iloc[:5,-4:]
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/nor.png?w=304 align="left")

Now, we can observe that in the updated weights column, the values for the incorrectly classified data points are significantly higher compared to those for the correctly classified ones.

## Step 5

Next, ranges are created for each updated weight, which represent the cumulative sum of the values in the column. For example, the range for index 0 is '0 to 0.016129', the range for index 1 is '0.016123 to (0.016129 + 0.000929)', the range for index 2 is '(0.016129 + 0.000929) to ((0.016129 + 0.000929) + 0.000929)', and so on.

In this manner, the range values for the last index will sum up to 1, as the weights have been normalized, ensuring their total is equal to 1.

## Step 6

Resampling of data is performed by selecting a random number between 0 and 1. The range in which this number falls determines which index from the 'df' dataframe is included in the new resampled dataframe. Since the weights of the incorrectly predicted data points are higher, the corresponding ranges will also be larger. As a result, many of the randomly chosen numbers will fall within the ranges of the incorrectly predicted data points. Consequently, these data points will be repeated more frequently in the resampled dataframe, giving them more priority in subsequent iterations.

```python
resampled = pd.DataFrame(columns=df.columns[:31])
for i in range(len(df)):
    index = df[df.ranges == df[np.random.rand()<df.ranges].ranges.min()].index
    resampled.loc[i] = list(df.iloc[index,:31].values[0])
     
resampled.head()
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/res.png?w=944 align="left")

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/res2.png?w=394 align="left")

The resampled data is then processed in the same way as in step 1. These 6 steps are repeated iteratively until the total error becomes zero, or the number of iterations reaches infinity. To automate this process, let's build a function that accumulates all the above steps to perform the iterations.

```python
def adaboost(df):
    dt = DecisionTreeClassifier(random_state=50, min_samples_split=100)
    dt.fit(df.iloc[:,:30],df.iloc[:,30])
    df['prediction'] = dt.predict(df.iloc[:,:30])
     
    df['weight'] = 1/len(df)
     
    no_of_errors = len(df[df.label != df.prediction])
     
    total_errors = no_of_errors/len(df)
     
    alpha = 0.5 * np.log((1-total_errors)/total_errors)
     
    df['weight_updated'] = df.loc[df.label != df.prediction].weight * np.exp(alpha)
    df.weight_updated = df['weight_updated'].fillna(df[df.label == df.prediction].weight * np.exp(-alpha))
     
    df.weight_updated = df.weight_updated/df.weight_updated.sum()
     
    p = 0
    for i in range(len(df)):
        df.loc[i,'ranges'] = df.loc[i,'weight_updated'] + p
        p = df.loc[i,'ranges']
         
    resampled = pd.DataFrame(columns=df.columns[:31])
    for i in range(len(df)):
        index = df[df.ranges == df[np.random.rand()<df.ranges].ranges.min()].index
        resampled.loc[i] = list(df.iloc[index,:31].values[0])  
     
    df = resampled
     
    return [df, dt]
```

The above function returns the resampled DataFrame and the trained model from each iteration. Upon execution, it stores the final resampled DataFrame along with the list of trained models across all iterations.

```python
df = copy.copy()
 
models = []    
     
try:
    for iter in range(20):        
        ada = adaboost(df)
        df = ada[0]    
        models.append(ada[1])
        print('Decision stamp {0}'.format(iter+1))
     
except Exception:
    pass
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/dec-1.png?w=154 align="left")

We have obtained 10 decision stumps (weak learners), which will now be used collectively to make future predictions. These models will be applied to the same dataset on which we initially observed an accuracy of 95%. Let's aggregate the outputs from all these models to evaluate the performance of the boosted ensemble.

```python
pred = np.zeros(len(df))
for i in range(len(models)):    
    pred += models[i].predict(copy.iloc[:,:-1])
 
pred
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/out-1.png?w=550 align="left")

These values represent the aggregated predictions from all the models. Since each model outputs either a 0 or 1, a value like 2 in the array indicates that 2 models classified the instance as class 1, while a value of 6 means 6 out of the 10 models predicted class 1 for that particular instance.

Based on the number of models used, a threshold is set at half that number. If the aggregate output for any data point exceeds this threshold, it is classified as 1; otherwise, it is classified as 0. In our case, since we used 10 models, any output value greater than 5 is considered class 1, and the rest are classified as class 0.

```python
threshold = len(models)/2
vec = np.vectorize(lambda x: 1 if x>threshold else 0)
final_prediction = vec(pred)
final_prediction
```

![](https://criticalmind.tech.blog/wp-content/uploads/2021/09/out2.png?w=669 align="left")

Now, using the output above, we calculate the accuracy.

```python
copy['final_prediction'] = final_prediction
 
print('Accuracy =',accuracy_score(copy.label, copy.final_prediction))
```

```bash
Accuracy = 0.9753954305799648
This time we obtained an accuracy of 98%.
```

Thus, we can see that we have successfully improved the accuracy using Boosting.

*Point to note:* This entire process is purely for demonstration and conceptual understanding. It is **not recommended** to use this manual implementation for solving real-world problems. For practical purposes, you should use the automated and optimized `AdaBoostClassifier` provided by the `sklearn` library.

I hope I was able to explain the algorithm clearly enough for you to understand and experiment with.  
Your valuable feedback is always appreciated!

<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text"><em>For more insights, projects, and articles, visit my portfolio at </em><a target="_new" rel="noopener" class="" href="https://www.tuhindutta.com" style="pointer-events: none"><em>www.tuhindutta.com</em></a><em>.</em></div>
</div>
