Malicious URL Detection Models

This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:

benign
defacement
malware
phishing

Model Performance Summary

The following table summarizes the accuracy of each model on the test dataset:

Model	Accuracy
Extra Trees Classifier	97%
Random Forest	97%
Decision Tree	96%
MLP Classifier	96%
XGBoost	96%
Gradient Boosting Classifier	94%
Logistic Regression	87%
SGD Classifier	87%
Adaboost	85%
Gaussian Naive Bayes	80%

Detailed Performance Reports

Adaboost

Accuracy: 0.85
Report:

              precision    recall  f1-score   support

      benign       0.90      0.97      0.93     85778
  defacement       0.82      0.76      0.79     19104
     malware       0.55      0.74      0.63      6521
    phishing       0.68      0.42      0.52     18836

    accuracy                           0.85    130239
   macro avg       0.74      0.72      0.72    130239
weighted avg       0.84      0.85      0.84    130239

Decision Tree

Accuracy: 0.96
Report:

              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.98     19104
     malware       0.95      0.94      0.95      6521
    phishing       0.87      0.85      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.95      0.94      0.94    130239
weighted avg       0.96      0.96      0.96    130239

Extra Trees Classifier

Accuracy: 0.97
Report:

              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.86      0.88     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239

Gaussian Naive Bayes

Accuracy: 0.80
Report:

              precision    recall  f1-score   support

      benign       0.86      0.90      0.88     85778
  defacement       0.67      0.99      0.80     19104
     malware       0.63      0.69      0.66      6521
    phishing       0.68      0.19      0.29     18836

    accuracy                           0.80    130239
   macro avg       0.71      0.69      0.66    130239
weighted avg       0.80      0.80      0.77    130239

Gradient Boosting Classifier

Accuracy: 0.94
Report:

              precision    recall  f1-score   support

      benign       0.96      0.99      0.97     85778
  defacement       0.92      0.97      0.94     19104
     malware       0.94      0.80      0.87      6521
    phishing       0.89      0.78      0.83     18836

    accuracy                           0.94    130239
   macro avg       0.93      0.88      0.90    130239
weighted avg       0.94      0.94      0.94    130239

Logistic Regression

Accuracy: 0.87
Report:

              precision    recall  f1-score   support

      benign       0.89      0.97      0.93     85778
  defacement       0.85      0.95      0.90     19104
     malware       0.81      0.69      0.74      6521
    phishing       0.77      0.42      0.55     18836

    accuracy                           0.87    130239
   macro avg       0.83      0.76      0.78    130239
weighted avg       0.87      0.87      0.86    130239

MLP Classifier

Accuracy: 0.96
Report:

              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.97      0.97      0.97     19104
     malware       0.95      0.90      0.92      6521
    phishing       0.88      0.83      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.94      0.92      0.93    130239
weighted avg       0.96      0.96      0.96    130239

Random Forest

Accuracy: 0.97
Report:

              precision    recall  f1-score   support

      benign       0.98      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.87      0.89     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239

SGD Classifier

Accuracy: 0.87
Report:

              precision    recall  f1-score   support

      benign       0.89      0.96      0.93     85778
  defacement       0.83      0.95      0.89     19104
     malware       0.79      0.71      0.75      6521
    phishing       0.74      0.40      0.52     18836

    accuracy                           0.87    130239
   macro avg       0.81      0.76      0.77    130239
weighted avg       0.86      0.87      0.85    130239

XGBoost

Accuracy: 0.96
Report:

              precision    recall  f1-score   support

      benign       0.97      0.99      0.98     85778
  defacement       0.97      0.99      0.98     19104
     malware       0.98      0.92      0.95      6521
    phishing       0.91      0.84      0.88     18836

    accuracy                           0.96    130239
   macro avg       0.96      0.93      0.95    130239
weighted avg       0.96      0.96      0.96    130239

Usage

To load a model in Python, you can use joblib or pickle.

Using joblib

import joblib

# Load the model
model = joblib.load('models/random_forest.pkl')

# Make predictions
prediction = model.predict(X_test)

Using pickle

import pickle

# Load the model
with open('models/random_forest.pkl', 'rb') as f:
    model = pickle.load(f)

# Make predictions
prediction = model.predict(X_test)

Downloads last month: -; Downloads are not tracked for this model. How to track