Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
leonardlin 's Collections
llmjudge
merging
8b-class-japanese-models
speed
quantize
multilingual
sota
evals
tuning
rag
context
safety
image
reasoning
interprebility
vision
code
Prompting
embedding
prompt injection
TOREAD
architecture
synthetic-data
multimodal
Open LLMs
data
voice

llmjudge

updated 11 days ago
Upvote
-

  • Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

    Paper • 2510.18196 • Published Oct 21

  • Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?

    Paper • 2503.17039 • Published Mar 21

  • Can You Trick the Grader? Adversarial Persuasion of LLM Judges

    Paper • 2508.07805 • Published Aug 11

  • Checklist Engineering Empowers Multilingual LLM Judges

    Paper • 2507.06774 • Published Jul 9

  • JuStRank: Benchmarking LLM Judges for System Ranking

    Paper • 2412.09569 • Published Dec 12, 2024 • 20

  • When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks Silently Undermine Validity

    Paper • 2509.20293 • Published Sep 24 • 7

  • Quantitative LLM Judges

    Paper • 2506.02945 • Published Jun 3 • 5
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs