ISME

Explore - Experience - Excel

Introduction to Random Forest and Its Applications – Prof. Sanjit Kumar Ghosh

29th January 2026

https://medium.com/@sanjit.kr.ghosh12/introduction-to-random-forest-and-its-applications-ae3fb2bffd2f

Course Relevance: This caselet will be useful for PGDM Analytics (specially Advance machine learning course) and MCA, BCA Artificial intelligence course. 

Academic Concepts:  In this case let, students will gain an understanding of what is Random Forest, it’s advantages and disadvantages how to implement a random forest problem using Python. 

Teaching Note:  

The caselet presents the concept of a Random Tree along with its step-by-step implementation using Python. It outlines the key advantages and limitations of the approach while emphasizing its business relevance. A Python program is included to demonstrate house price prediction. Additionally, the caselet highlights diverse applications across domains such as healthcare, retail, finance, environmental science, and engineering sector. 

Learning Objectives: 

By the end of this caselet, students should be able to: 

  • Understand the fundamentals of Random Trees in machine learning. 
  • Implement Random Trees step by step using Python. 
  • Evaluate the advantages and disadvantages of Random Trees. 
  • Recognize the business relevance of Random Trees. 
  • Apply Random Trees to predict house prices with Python. 
  • Identify applications of Random Trees in healthcare, retail, finance, environmental science, and engineering. 

Introduction: 

Machine learning plays important role in decision making and Analytics. In my previous blog I had discussed decision tree and it’s application in business analytics. As a logical extension in this blog I will to explain the use of another key algorithm of machine learning called “Random Forest” and it’s various usage in industry.  

Random forest is one of the most powerful algorithms used in machine learning for various data driven analysis. It is a most versatile and reliable tool used in Analytics. One of main feature of random forest is that it is comparatively easy to understand. We can use this for solving both classification and regression problems. Let us see how explore more about Random Forest. 

What is Random Forest? 

It is basically a supervised machine learning algorithm and uses different types of ensemble learning methods. Since Random forest use multiple decision trees trained with different data set, hence we use the term “Forest” in Random Forest.  This is in line to ensemble learning where we use multiple models to solve a complex problem with increased efficiency and accuracy. It is the power of collective models over usage of single model. For classification and regression, we need to use different approach while using Random Forest.  

For problem related to Classification- we need to consider voting method. Final prediction will be determined by the majority vote. 

And for Problem related to Regression: we need to average out the prediction from all the trees used in Random Forest to get the final output. 

To make the model accurate and to eliminate the issue with overfitting, diversity is needed in trees which can be achieved through randomness in data sampling and feature selection. 

Steps involved to implement Random Forest:  

 Let us discuss how we can implement Random Forest step by steps: 

Bagging: The Bootstrap Sample: 

                         This is the first step where algorithm selects random subsets from the training dataset which is already available, then each subset of selected data used to train the different decision trees. 

  1. Selection of Random Feature : 

For each split in a tree, only a random subset of features is considered. This step is important because it will ensure that trees become too identical, and it will make diverse features. Every tree will have a subset of features. This way it will ensure more diversity in forest. This will also allow less common features to contribute in the splitting process. 

  1. Construction of Tree: 

In these steps each decision trees allowed to grow independently till it meets any stopping criteria or maximum depth allowed. 

  1. Prediction of outcome through Aggregation: 

Random forest can handle both classification and regression problems with different strategies. For classification it will be based on majority voting and for regression outcome will be nothing but average of all prediction given by different trees.  

Advantages of Random Forest: 

  • High level of Accuracy:  in this process since we are using multiple decision tress, it reduce variances and increase accuracy of overall prediction. 
  • Works with large Datasets: Random Forest has ability to work on huge data set with multiple features since it is using multiple trees.  
  • Prevents Overfitting: With the introduction of diverseness and randomness in sampling it handles overfitting very well. 
  • Ensuring Feature Importance: Random forest helps to provide a measure of feature importance. It gives insights on most influential variables in prediction. 
  • Wide usage scope:  Random Forest can be used widely since it can work with both classification and regression problems 
  •  Missing Values handling: Since we are using many decisions trees in random forest model hence it can achieve good accuracy even if some proportion of the data is missing. 

 What are the Disadvantages of random Forest: 

              Here are few Disadvantages of Random Forest: 

  1. It’s Complexity & Interpretability: A single decision tree is easy to understand, but a “forest” where hundreds or may be thousands of trees makes the model like a  “black box,” making it hard to see why it made a specific decision, which is crucial in fields like medicine or finance. 
  1. Computational Resources:  Since it requires many decisions, tress so building many trees requires significant memory and processing power, especially with large datasets, increasing training time and resource demands. 
  1. Slower Prediction Speed:   While training can be parallelized, making predictions means running the input through every tree in the forest, which can be slow for real-time applications where quick responses are needed 

Implementing Random Forest through Python Program:  

 Let us consider a case for predicting house prices based on area and number of bedrooms. In the following problem we will use a simple data set for training and use RandomForestRegressor. 

# Example of Random Forest Regression  

# Problem: Predicting house prices based on simple features 

from sklearn.ensemble import RandomForestRegressor 

from sklearn.model_selection import train_test_split 

from sklearn.metrics import mean_squared_error 

import pandas as pd 

# Step 1: Create a small dataset 

data = { 

    ‘size_sqft’: [850, 900, 1200, 1500, 2000, 2500, 3000], 

    ‘bedrooms’: [2, 2, 3, 3, 4, 4, 5], 

    ‘price’: [150000, 160000, 200000, 250000, 320000, 400000, 480000] 

df = pd.DataFrame(data) 

# Features (X) and Target (y) 

X = df[[‘size_sqft’, ‘bedrooms’]] 

y = df[‘price’] 

# Step 2: Split into train and test sets 

X_train, X_test, y_train, y_test = train_test_split( 

    X, y, test_size=0.3, random_state=42 

# Step 3: Initialize RandomForestRegressor 

model = RandomForestRegressor(n_estimators=100, random_state=42) 

# Step 4: Train the model with given dataset 

model.fit(X_train, y_train) 

# Step 5: Make predictions 

y_pred = model.predict(X_test) 

# Step 6: Evaluate performance 

mse = mean_squared_error(y_test, y_pred) 

print(“Mean Squared Error:”, mse) 

# Step 7: Predict with DataFrame (fixes warning) 

new_house = pd.DataFrame([[1800, 3]], columns=[‘size_sqft’, ‘bedrooms’]) 

predicted_price = model.predict(new_house) 

print(“Predicted price for new house:”, predicted_price[0]) 

# Step 8: Show feature importance (extra insight) 

importance = model.feature_importances_ 

for feature, score in zip(X.columns, importance): 

    print(f”Feature: {feature}, Importance: {score:.4f}”)  

Output:  

Mean Squared Error: 4608580000.0 

Predicted price for new house: 269300.0 

Feature: size_sqft, Importance: 0.5352 

Feature: bedrooms, Importance: 0.4648 

Note: The high Mean Squared Error occurs because the dataset is too small (only 7 samples), so the train-test split leaves very few examples to learn from. With such limited data, predictions deviate heavily from actual prices, and since errors are squared, even moderate differences produce a very large MSE. 

Applications of Random Forest 

Random forest has many applications in different sectors, following are some important sectors where we can use Random Forest base application: 

1. Healthcare and Pharma: 

              Random forest has many applications in this sector few are like following: 

  • To Predict Disease: We can use Random forest-based model to predict the possibility of occurrence of diseases such as cancer, diabetes, or heart conditions. 
  • In Medical Imaging: Random forest can be used to classify images of different types of tumours like benign vs malignant with high level of accuracy. 
  • In Drug Discovery:  It can help scientist to design potential drug by analysing data on various drugs interaction with human beings. 

2. Finance sector: 

  • To determine Credit Scoring: Random Forest can help in determining credit scoring in finance sector, this will reduce risk of lending loan. 
  • In Fraud Detection:  Proactive detection of Fraudulent transaction is very important. Random forest can help here because it can analyse huge data accurately and find out any unusual pattern. 
  • Analysis of Stock Market: Predicting stock price movements is very important in stock market prediction. Random forest can help here. 

3. Retail sector: 

  • In Customer Segmentation: proper segmentation of customer is needed for planning and target marketing. This segmentation can be done by analysing purchasing pattern the customers. Here random forest can be useful.  
  • Developing Recommendation Systems: Recommendation system can suggest customer about products that might be useful based on analysing customer purchase or browsing history. Here recommendation system plays important role for analysing customer data. 
  • Customer Churn Prediction: To predict customer who might discontinue any service, huge amount of data should be checked for any relevant pattern which may give lead for churn prediction. Random forest model is a useful model for this purpose.    

4. Application in Engineering Sector: 

  • Fault Detection: Proactive fault detection may save huge money for any industry because with this they can avoid service interruption. This can be achieved through analysing data and performance of machines. In doing so Random Forest can play an important role. 
  • Quality Control: in automated quality control, random forest-based applications can classify products as defective or non-defective. This can lead to efficient quality control, without any manual interference. 

5. Environmental Research  

  • Climate Modelling:  it requires crunching of huge number of data related to different atmospheric parameters. Based on this we can predict weather. Here Random Forest can be useful in analysing the data efficiently.  
  • In Ecology: classification of species and predicting biodiversity in different region is important in ecology related research. This also requires analysis of lots of data which can be done though Random Forest efficiently. 

Let us discuss about limitations of Random Forest 

Here are some drawbacks: 

  • Complexity of Model: For large dataset this model can be expensive in terms of computational resource and complex 
  • Interpretability of Model: Random Forest is quite harder to interpret compare to model using single decision tree. 
  • Higher Training Time: Random forest may use thousands of trees hence overall training time can be higher than model use single decision tree. 

Conclusion: 

In this blog we have seen how important role random forest-based application plays in Analytics and its application in different sectors. Though training time is high and model is comparatively complex, it is one of the popular model of machine learning. Now a days with popularity of bigdata and advancement in deep learning, random forest is getting more importance in handling complex dataset across domains with high level of accuracy.  

References: 

  1. https://www.ibm.com/think/topics/random-forest 
  1. https://www.researchgate.net/publication/382419308_Random_Forest_Algorithm_Overview 
  1. https://developers.google.com/machine-learning/decision-forests/random-forests