Explore - Experience - Excel

Phishing detection elevated: Machine learning’s vigilant eye never sleeps – Prof S. Chithra

18th June 2024

Phishing emails: the only place where ‘You’ve won a million dollars!’ is as believable as ‘Your package is arriving today.’ Talk about real-time scams with real-time laughs!

Phishing attacks continue to proliferate, posing a growing threat that organizations of all sizes worldwide must contend with. These attacks are constantly evolving, utilizing both vulnerability exploits and social engineering tactics to target organizations. As a result, the security of critical data and sensitive user information is increasingly at risk. It’s crucial for organizations to develop a comprehensive understanding of phishing as a foundational element in fortifying their security posture and mitigating potential threats.

Have you or anyone you know ever been a victim of a phishing attack? It seemed to be such a regular threat these days. Do you understand what hackers hope to achieve with phishing attacks? I have heard that it can vary from stealing personal information to installing malware on your device.

I’m always worried about accidentally clicking on a malicious link or giving away sensitive information.

Types of Phishing Attack: Image

Phishing attacks commonly target three sorts of data: login credentials, personal information, and medical records. These are essential resources for hackers seeking to exploit individuals and organizations. Login credentials for online accounts, personal information such as addresses and phone numbers, and sensitive medical data are all routinely compromised as a result of deceptive phishing tactics. It emphasizes the necessity of being cautious and taking proactive steps to protect our digital identities and personal information from cyber threats.

In 2023, phishing attacks increased globally, with India ranked third after the United States and the United Kingdom. India’s technology sector suffered the brunt of these attacks, accounting for nearly 33% of them, making it the most vulnerable industry. The finance and insurance sector has seen an enormous surge in phishing attempts, making it a top target for cyber attackers. Following closely, the manufacturing industry saw a 31% increase in attacks. These industries, which rely heavily on AI technology, have become appealing targets because to their reliance on data. Notably, the top three countries targeted were the United States, the United Kingdom, and India, with India receiving over 79 million phishing attacks. With such rising dangers, governments are implementing measures like the Digital Personal Data Protection.

In the realm of cyber threats, social media platforms have increasingly become targets for phishing attacks. Among the most exploited platforms globally, Telegram led the pack with a staggering 7,92,883 phishing hits. Following closely behind, Facebook recorded 5,32,243 phishing hits, while WhatsApp faced 3,78,968 phishing incidents. Additionally, other platforms such as Instagram, Twitter, LinkedIn, and Snapchat have also experienced a significant number of phishing attempts, emphasizing the pervasive nature of this cyber threat across various social media channels

It is important to protect our self from phishing attacks by staying cautious and sceptical of unsolicited emails; remember, our data is the prize, and phishing is the game of chance hackers use to exploit vulnerabilities.

Identifying phishing websites has traditionally relied on keeping up-to-date blacklists of URLs and IP addresses in antivirus databases, a technique known as the “blacklist” method. However, fraudsters are always innovating to get around these measures. They use a variety of approaches, such as obfuscating URLs to appear legitimate, using fast-flux techniques to quickly change hosting proxies, and using algorithms to generate new URLs on the fly. The primary drawback of the blacklist approach is that it cannot reliably identify zero-hour phishing attempts, even with these evasion techniques in place. To address the limitations of blacklist and heuristic-based methods, many security experts are increasingly focusing on machine learning techniques.

In cybersecurity, machine learning is transforming phishing detection approaches by employing powerful algorithms to improve predictive capabilities. As we face an ever-changing world of cyber dangers, machine learning emerges as a strong sentinel against phishing attempts, applying advanced tools to scan large datasets and identify fraudulent patterns with pinpoint accuracy. Phishing detection systems are strengthened by the inclusion of machine learning intelligence, allowing for proactive threat detection and mitigation. This revolutionary approach enables enterprises to remain ahead of adversaries by securing sensitive data and improving security defences in the face of increasing cyber hazards.

Popular Phishing Datasets for Machine Learning Model Training

AlexaRank : It is a popular benchmarking dataset for web analytics. Alexa is a business entity that specializes in web data analytics. It combines user browsing patterns from numerous channels and uses objective analysis to categorize and report on internet URLs. Researchers use Alexa’s rankings to construct a collection of high-quality websites that serve as a standard dataset for website classification and testing. This dataset is often provided in raw text format, with each line organized in ascending order and specifying the website grade and domain name.

Phishtank : It is a well-known benchmark dataset for phishing sites. It works as a collaborative system to monitor web platforms for phishing attempts. Users and third-party contributors report potential phishing sites, which are then validated by the community before being confirmed as real or fraudulent. This real-time process offers researchers with a consistent dataset for testing and detecting phishing websites. The Phishtank dataset is readily provided in CSV format, with complete descriptions of each item, such as ID, URL, submission time, verification status, online availability, and target URLs.

 URL content contains four types of features:

Address Bar Features: Having IP address, URL length, Shortening Service, Having @ symbol, Double slash Redirection, Prefix Suffix, Having subdomain, SSL state, Domain registration length, Favicon

Abnormal Features: Using Nonstandard port, HTTPS token, Request URL, URL of Anchor, Links in tags, Server from Handler, Submitting information To Email, Abnormal URL

HTML and Java Script based features: Website Redirect count, Status bar customization, Disabling right click, Using Pop-up window, Iframe

Domain based Features: Age of Domain, DNS Record, Web Traffic, Page rank, Google Index, Links pointing to Page, Statistical Report 

 Predicting the Phishing Attacks using Machine Learning Algorithm:

β€œIn a world full of phish, be the ML magician who predicts their tricks before they even cast their lines”.

Initially, a comprehensive dataset including information distinguishing real and fraudulent websites is gathered. This data is pre-processed to verify consistency and remove noise, then feature selection or extraction to discover relevant attributes. Next, relevant machine learning methods, such as logistic regression or decision trees, are chosen and trained on the dataset, with hyper parameters tuned to avoid overfitting. The models’ performance is measured using multiple metrics, and their interpretability is tested to better understand the elements that influence predictions. Finally, the models that have been trained are put into production for real-time prediction, with continual monitoring and retraining to respond to changing phishing methods.

The ML algorithms used by researchers for detecting the phishing attacks are Logistic Regression, NaΓ―ve Bayes Classifier, Decision Tree and Ensemble classifiers etc..


The internet’s spread throughout society is uninterrupted, and this is happening at the same time that cyber threats are on the rise. Malicious and suspicious URLs proliferate on a regular basis, creating serious problems to the quality of services supplied by internet and industrial enterprises. These URLs masquerade as legitimate sources, heightening the cybersecurity dangers. However, incorporating Machine Learning (ML) into phishing detection and prevention constitutes a major leap in cybersecurity. Organizations can effectively combat the changing threat landscape by leveraging AI and ML technology. These technologies improve detection capability, predict potential attacks, and provide real-time threat intelligence. The future of phishing protection depends on the persistent pursuit of information and adaptation, as well as the use of cutting-edge technologies to defeat cyber adversaries.

Relevant Queries 

  1. What are the potential consequences of clicking on a link or attachment in a phishing email?
  2. What are the signs of a fraudulent URL?
  3. How can you protect yourself from clicking on a phishing link?

Leave a Reply

Your email address will not be published. Required fields are marked *