Malicious URL Detection Models
This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:
- benign
- defacement
- malware
- phishing
Model Performance Summary
The following table summarizes the accuracy of each model on the test dataset:
| Model | Accuracy |
|---|---|
| Extra Trees Classifier | 97% |
| Random Forest | 97% |
| Decision Tree | 96% |
| MLP Classifier | 96% |
| XGBoost | 96% |
| Gradient Boosting Classifier | 94% |
| Logistic Regression | 87% |
| SGD Classifier | 87% |
| Adaboost | 85% |
| Gaussian Naive Bayes | 80% |
Detailed Performance Reports
Adaboost
- Accuracy: 0.85
- Report:
precision recall f1-score support
benign 0.90 0.97 0.93 85778
defacement 0.82 0.76 0.79 19104
malware 0.55 0.74 0.63 6521
phishing 0.68 0.42 0.52 18836
accuracy 0.85 130239
macro avg 0.74 0.72 0.72 130239
weighted avg 0.84 0.85 0.84 130239
Decision Tree
- Accuracy: 0.96
- Report:
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.98 0.99 0.98 19104
malware 0.95 0.94 0.95 6521
phishing 0.87 0.85 0.86 18836
accuracy 0.96 130239
macro avg 0.95 0.94 0.94 130239
weighted avg 0.96 0.96 0.96 130239
Extra Trees Classifier
- Accuracy: 0.97
- Report:
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.98 0.99 0.99 19104
malware 0.98 0.94 0.96 6521
phishing 0.91 0.86 0.88 18836
accuracy 0.97 130239
macro avg 0.96 0.95 0.95 130239
weighted avg 0.97 0.97 0.97 130239
Gaussian Naive Bayes
- Accuracy: 0.80
- Report:
precision recall f1-score support
benign 0.86 0.90 0.88 85778
defacement 0.67 0.99 0.80 19104
malware 0.63 0.69 0.66 6521
phishing 0.68 0.19 0.29 18836
accuracy 0.80 130239
macro avg 0.71 0.69 0.66 130239
weighted avg 0.80 0.80 0.77 130239
Gradient Boosting Classifier
- Accuracy: 0.94
- Report:
precision recall f1-score support
benign 0.96 0.99 0.97 85778
defacement 0.92 0.97 0.94 19104
malware 0.94 0.80 0.87 6521
phishing 0.89 0.78 0.83 18836
accuracy 0.94 130239
macro avg 0.93 0.88 0.90 130239
weighted avg 0.94 0.94 0.94 130239
Logistic Regression
- Accuracy: 0.87
- Report:
precision recall f1-score support
benign 0.89 0.97 0.93 85778
defacement 0.85 0.95 0.90 19104
malware 0.81 0.69 0.74 6521
phishing 0.77 0.42 0.55 18836
accuracy 0.87 130239
macro avg 0.83 0.76 0.78 130239
weighted avg 0.87 0.87 0.86 130239
MLP Classifier
- Accuracy: 0.96
- Report:
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.97 0.97 0.97 19104
malware 0.95 0.90 0.92 6521
phishing 0.88 0.83 0.86 18836
accuracy 0.96 130239
macro avg 0.94 0.92 0.93 130239
weighted avg 0.96 0.96 0.96 130239
Random Forest
- Accuracy: 0.97
- Report:
precision recall f1-score support
benign 0.98 0.98 0.98 85778
defacement 0.98 0.99 0.99 19104
malware 0.98 0.94 0.96 6521
phishing 0.91 0.87 0.89 18836
accuracy 0.97 130239
macro avg 0.96 0.95 0.95 130239
weighted avg 0.97 0.97 0.97 130239
SGD Classifier
- Accuracy: 0.87
- Report:
precision recall f1-score support
benign 0.89 0.96 0.93 85778
defacement 0.83 0.95 0.89 19104
malware 0.79 0.71 0.75 6521
phishing 0.74 0.40 0.52 18836
accuracy 0.87 130239
macro avg 0.81 0.76 0.77 130239
weighted avg 0.86 0.87 0.85 130239
XGBoost
- Accuracy: 0.96
- Report:
precision recall f1-score support
benign 0.97 0.99 0.98 85778
defacement 0.97 0.99 0.98 19104
malware 0.98 0.92 0.95 6521
phishing 0.91 0.84 0.88 18836
accuracy 0.96 130239
macro avg 0.96 0.93 0.95 130239
weighted avg 0.96 0.96 0.96 130239
Usage
To load a model in Python, you can use joblib or pickle.
Using joblib
import joblib
# Load the model
model = joblib.load('models/random_forest.pkl')
# Make predictions
prediction = model.predict(X_test)
Using pickle
import pickle
# Load the model
with open('models/random_forest.pkl', 'rb') as f:
model = pickle.load(f)
# Make predictions
prediction = model.predict(X_test)