rajeshaustin06's picture
Upload README.md with huggingface_hub
59c5c0d verified
metadata
language: en
tags:
  - scientific-papers
  - arxiv
  - classification
  - scibert
  - text-classification
datasets:
  - arxiv
license: mit

SciBERT Fine-tuned for arXiv Paper Classification

This model is a fine-tuned version of allenai/scibert_scivocab_uncased for classifying scientific papers into arXiv categories.

Model Description

  • Base Model: SciBERT (Scientific BERT)
  • Task: Multi-class Text Classification
  • Training Data: arXiv scientific papers
  • Number of Classes: 20 arXiv categories

Intended Use

This model classifies scientific paper abstracts into their primary arXiv subject categories.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")
model = AutoModelForSequenceClassification.from_pretrained("YOUR_USERNAME/scibert-arxiv-classifier")

text = "Your scientific paper abstract here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()

Training Details

  • Fine-tuned on arXiv paper dataset
  • Optimized for scientific domain text classification

Limitations

  • Best suited for scientific/academic papers
  • Performance may vary on non-scientific text