Instructions to use Phase-Technologies/netuark-classifier-ensemble with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use Phase-Technologies/netuark-classifier-ensemble with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("Phase-Technologies/netuark-classifier-ensemble", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: mit | |
| library_name: sklearn | |
| tags: | |
| - text-classification | |
| - ensemble | |
| - scikit-learn | |
| datasets: | |
| - Xerv-AI/netuark-posts-6000 | |
| model_details: | |
| parameters: {params} | |
| model-index: | |
| - name: netuark-classifier-ensemble | |
| results: | |
| - task: | |
| type: text-classification | |
| dataset: | |
| name: netuark-posts-6000 | |
| type: Xerv-AI/netuark-posts-6000 | |
| metrics: | |
| - type: accuracy | |
| value: 93.75 | |
| # NetuArk Posts Classifier (Ensemble Architecture) | |
| This model is a ensemble classifier designed to categorize technology-related social media posts into their respective news sources. | |
| The model is trained to classify the following sources: | |
| - ArsTechnica | |
| - FT | |
| - GuardianTech | |
| - HackerNews | |
| - Slashdot | |
| - TechCrunch | |
| - TheVerge | |
| - | |
| ## Model Details | |
| - **Architecture:** Voting Classifier (Multinomial Naive Bayes + Logistic Regression) | |
| - **Vectorization:** TF-IDF (N-grams 1-3) | |
| - **Accuracy:** 94.81% on the NetuArk-6000 dataset. | |
| - **Classes:** HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica. | |
| ## Training Data | |
| Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset. | |
| ## Usage | |
| ```python | |
| import joblib | |
| import os | |
| from huggingface_hub import hf_hub_download | |
| # Define the missing custom function required by the unpickler | |
| def advanced_clean(text): | |
| return text | |
| # Assign it to __main__ to ensure joblib can find it during loading | |
| import __main__ | |
| __main__.advanced_clean = advanced_clean | |
| # Repository and filename | |
| repo_id = 'Phase-Technologies/netuark-classifier-ensemble' | |
| filename = 'netuark_ensemble_classifier.joblib' | |
| try: | |
| # Download the file from Hugging Face | |
| file_path = hf_hub_download(repo_id=repo_id, filename=filename) | |
| # Load the model | |
| model = joblib.load(file_path) | |
| prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"]) | |
| print(f"Prediction: {prediction}") | |
| except Exception as e: | |
| import traceback | |
| print(f"An error occurred: {e}") | |
| traceback.print_exc() | |
| ``` |