Malicious URL Detection Models

This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:

  • benign
  • defacement
  • malware
  • phishing

Model Performance Summary

The following table summarizes the accuracy of each model on the test dataset:

Model Accuracy
Extra Trees Classifier 97%
Random Forest 97%
Decision Tree 96%
MLP Classifier 96%
XGBoost 96%
Gradient Boosting Classifier 94%
Logistic Regression 87%
SGD Classifier 87%
Adaboost 85%
Gaussian Naive Bayes 80%

Detailed Performance Reports

Adaboost

  • Accuracy: 0.85
  • Report:
              precision    recall  f1-score   support

      benign       0.90      0.97      0.93     85778
  defacement       0.82      0.76      0.79     19104
     malware       0.55      0.74      0.63      6521
    phishing       0.68      0.42      0.52     18836

    accuracy                           0.85    130239
   macro avg       0.74      0.72      0.72    130239
weighted avg       0.84      0.85      0.84    130239

Decision Tree

  • Accuracy: 0.96
  • Report:
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.98     19104
     malware       0.95      0.94      0.95      6521
    phishing       0.87      0.85      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.95      0.94      0.94    130239
weighted avg       0.96      0.96      0.96    130239

Extra Trees Classifier

  • Accuracy: 0.97
  • Report:
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.86      0.88     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239

Gaussian Naive Bayes

  • Accuracy: 0.80
  • Report:
              precision    recall  f1-score   support

      benign       0.86      0.90      0.88     85778
  defacement       0.67      0.99      0.80     19104
     malware       0.63      0.69      0.66      6521
    phishing       0.68      0.19      0.29     18836

    accuracy                           0.80    130239
   macro avg       0.71      0.69      0.66    130239
weighted avg       0.80      0.80      0.77    130239

Gradient Boosting Classifier

  • Accuracy: 0.94
  • Report:
              precision    recall  f1-score   support

      benign       0.96      0.99      0.97     85778
  defacement       0.92      0.97      0.94     19104
     malware       0.94      0.80      0.87      6521
    phishing       0.89      0.78      0.83     18836

    accuracy                           0.94    130239
   macro avg       0.93      0.88      0.90    130239
weighted avg       0.94      0.94      0.94    130239

Logistic Regression

  • Accuracy: 0.87
  • Report:
              precision    recall  f1-score   support

      benign       0.89      0.97      0.93     85778
  defacement       0.85      0.95      0.90     19104
     malware       0.81      0.69      0.74      6521
    phishing       0.77      0.42      0.55     18836

    accuracy                           0.87    130239
   macro avg       0.83      0.76      0.78    130239
weighted avg       0.87      0.87      0.86    130239

MLP Classifier

  • Accuracy: 0.96
  • Report:
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.97      0.97      0.97     19104
     malware       0.95      0.90      0.92      6521
    phishing       0.88      0.83      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.94      0.92      0.93    130239
weighted avg       0.96      0.96      0.96    130239

Random Forest

  • Accuracy: 0.97
  • Report:
              precision    recall  f1-score   support

      benign       0.98      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.87      0.89     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239

SGD Classifier

  • Accuracy: 0.87
  • Report:
              precision    recall  f1-score   support

      benign       0.89      0.96      0.93     85778
  defacement       0.83      0.95      0.89     19104
     malware       0.79      0.71      0.75      6521
    phishing       0.74      0.40      0.52     18836

    accuracy                           0.87    130239
   macro avg       0.81      0.76      0.77    130239
weighted avg       0.86      0.87      0.85    130239

XGBoost

  • Accuracy: 0.96
  • Report:
              precision    recall  f1-score   support

      benign       0.97      0.99      0.98     85778
  defacement       0.97      0.99      0.98     19104
     malware       0.98      0.92      0.95      6521
    phishing       0.91      0.84      0.88     18836

    accuracy                           0.96    130239
   macro avg       0.96      0.93      0.95    130239
weighted avg       0.96      0.96      0.96    130239

Usage

To load a model in Python, you can use joblib or pickle.

Using joblib

import joblib

# Load the model
model = joblib.load('models/random_forest.pkl')

# Make predictions
prediction = model.predict(X_test)

Using pickle

import pickle

# Load the model
with open('models/random_forest.pkl', 'rb') as f:
    model = pickle.load(f)

# Make predictions
prediction = model.predict(X_test)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support