Spaces:

alexandrainst
/

radial-plot-generator

Running

App Files Files Community

saattrupdan commited on Mar 13

Commit

9fa29df

1 Parent(s): e5acaa3

style: Rename ScandEval to EuroEval

Browse files

Files changed (1) hide show

app.py +13 -15

app.py CHANGED Viewed

@@ -26,18 +26,18 @@ INTRO_MARKDOWN = """
 This demo allows you to generate a radial plot comparing the performance of different
 language models on different tasks. It is based on the generative results from the
-[ScandEval benchmark](https://scandeval.com).
 """
 ABOUT_MARKDOWN = """
-## About the ScandEval Benchmark
-The [ScandEval benchmark](https://scandeval.com) is used compare pretrained language
-models on tasks in Danish, Swedish, Norwegian Bokmål, Norwegian Nynorsk, Icelandic,
-Faroese, German, Dutch and English. The benchmark supports both encoder models (such as
 BERT) and generative models (such as GPT), and leaderboards for both kinds [are
-available](https://scandeval.com).
 The generative models are evaluated using in-context learning with few-shot prompts.
 The few-shot examples are sampled randomly from the training split, and we benchmark
@@ -54,10 +54,8 @@ the worst performing models having rank scores close to 0.
 ## The Benchmark Datasets
-The ScandEval generative benchmark currently covers the languages Danish, Swedish,
-Norwegian, Icelandic, German, Dutch and English. For each language, the benchmark
-consists of 7 different tasks, each of which consists of 1-2 datasets. The tasks are
-the following:
 ### Text Classification
 Given a piece of text, classify it into a number of classes. For this task we extract
@@ -110,7 +108,7 @@ Correlation Coefficient (MCC) as the evaluation metric.
 ## Citation
-If you use the ScandEval benchmark in your work, please cite [the
 paper](https://aclanthology.org/2023.nodalida-1.20):
 ```
@@ -741,16 +739,16 @@ def produce_radial_plot(
 def fetch_results() -> dict[Language, pd.DataFrame]:
-    """Fetch the results from the ScandEval benchmark.
     Returns:
         A dictionary of languages -> results-dataframes, whose indices are the
         models and columns are the tasks.
     """
-    logger.info("Fetching results from ScandEval benchmark...")
     response = requests.get(
-        "https://raw.githubusercontent.com/ScandEval/leaderboards/refs/heads/main/results/results.jsonl"
     )
     response.raise_for_status()
     records = [
@@ -804,7 +802,7 @@ def fetch_results() -> dict[Language, pd.DataFrame]:
         ).dropna()
         results_dfs[language] = results_df
-    logger.info("Successfully fetched results from ScandEval benchmark.")
     return results_dfs

 This demo allows you to generate a radial plot comparing the performance of different
 language models on different tasks. It is based on the generative results from the
+[EuroEval benchmark](https://euroeval.com).
 """
 ABOUT_MARKDOWN = """
+## About the EuroEval Benchmark
+The [EuroEval benchmark](https://euroeval.com) is used compare pretrained language
+models on tasks in Danish, Dutch, English, Faroese, French, German, Icelandic, Italian,
+Norwegian and Swedish. The benchmark supports both encoder models (such as
 BERT) and generative models (such as GPT), and leaderboards for both kinds [are
+available](https://euroeval.com).
 The generative models are evaluated using in-context learning with few-shot prompts.
 The few-shot examples are sampled randomly from the training split, and we benchmark
 ## The Benchmark Datasets
+For each language, the benchmark consists of 7 different tasks, each of which consists
+of 1-2 datasets. The tasks are the following:
 ### Text Classification
 Given a piece of text, classify it into a number of classes. For this task we extract
 ## Citation
+If you use the EuroEval benchmark in your work, please cite [the
 paper](https://aclanthology.org/2023.nodalida-1.20):
 ```
 def fetch_results() -> dict[Language, pd.DataFrame]:
+    """Fetch the results from the EuroEval benchmark.
     Returns:
         A dictionary of languages -> results-dataframes, whose indices are the
         models and columns are the tasks.
     """
+    logger.info("Fetching results from EuroEval benchmark...")
     response = requests.get(
+        "https://raw.githubusercontent.com/EuroEval/leaderboards/refs/heads/main/results/results.jsonl"
     )
     response.raise_for_status()
     records = [
         ).dropna()
         results_dfs[language] = results_df
+    logger.info("Successfully fetched results from EuroEval benchmark.")
     return results_dfs