Spaces:

JetBrains-Research
/

long-code-arena

Running

App Files Files Community

egor-bogomolov commited on Jun 4, 2024

Commit

84fadb9

1 Parent(s): a5e37f8

Add description for the library-based code generation task

Browse files

Files changed (1) hide show

src/tasks_content.py +16 -7

src/tasks_content.py CHANGED Viewed

@@ -11,13 +11,22 @@ TASKS_PRETTY = {
 TASKS_PRETTY_REVERSE = {value: key for key, value in TASKS_PRETTY.items()}
 TASKS_DESCRIPTIONS = {
-    "library_based_code_generation": "cool description for Library Usage Examples Generation task",
     "ci_builds_repair": "cool description for Bug Localization on Build Logs task",
     "project_code_completion": """# Project-Level Code Completion\n
-        Our Project-Level Code Completion 🤗 [JetBrains-Research/lca-code-completion](https://huggingface.co/datasets/JetBrains-Research/lca-code-completion) includes four datasets:
         * `small-context`: 144 data points,
         * `medium-context`: 224 data points,
         * `large-context`: 270 data points,
@@ -34,7 +43,7 @@ TASKS_DESCRIPTIONS = {
         For further details on the dataset and the baselines from 🏟️ Long Code Arena Team, refer to `code_completion` folder in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines) or to our preprint (TODO).
         """,
     "commit_message_generation": """# Commit Message Generation\n
         Our Commit Message Generation benchmark 🤗 [JetBrains-Research/lca-commit-message-generation](https://huggingface.co/datasets/JetBrains-Research/lca-commit-message-generation) includes 163 manually curated commits from Python projects.
@@ -49,15 +58,15 @@ TASKS_DESCRIPTIONS = {
         **Note.** The leaderboard is sorted by ROUGE-1 metric by default.
         """,
     "bug_localization": """# Bug Localization\n
         Our Module-to-Text benchmark 🤗 [JetBrains-Research/lca-bug-localization](https://huggingface.co/datasets/JetBrains-Research/lca-bug-localization) includes 7,479 bug issue descriptions with information about pull request that fix them for Python, Java and Kotlin projects.
         Moreover, 150 data points from the test split were manually verified and can be used for bug localization approaches evaluation.
-        We used information retrieval metrics such as R@k, P@k and F1-score for evaluation, taking k equals to 2.
     """,
     "module_summarization": """# Module Summarization\n
         Our Module-to-Text benchmark 🤗 [JetBrains-Research/lca-module-summarization](https://huggingface.co/datasets/JetBrains-Research/lca-module-summarization) includes 216 manually curated text files describing different documentation of opensource permissive Python projects.
@@ -77,5 +86,5 @@ def get_submission_text_files_for_task(task_pretty: Optional[str]) -> str:
     if task_id == "commit_message_generation":
         return f"""**{task_pretty} Instructions:**\n\n* Please, attach files in [JSONLines format](https://jsonlines.org/). For an example, check the predictions provided by 🏟️ Long Code Arena Team in  🤗 [JetBrains-Research/lca-results](https://huggingface.co/datasets/JetBrains-Research/lca-results/tree/main/commit_message_generation/predictions). Make sure to include `"prediction"` and `"reference"` fields for each example, the rest are optional."""
     return f"**{task_pretty} Instructions:**\n\n* 🚧 There are no instructions for the current task yet."

 TASKS_PRETTY_REVERSE = {value: key for key, value in TASKS_PRETTY.items()}
 TASKS_DESCRIPTIONS = {
+    "library_based_code_generation": """# Library-Based Code Generation\n
+        Our Library-Based Code Generation benchmark 🤗 [JetBrains-Research/lca-library-based-code-generation](https://huggingface.co/datasets/JetBrains-Research/lca-library-based-code-generation) includes 150 manually curated instructions asking model to generate Python code using a particular library. Samples come from 62 Python repositories. All the samples in the dataset are based on reference example programs written by authors of the respective libraries.
+        For evaluation we use two metrics:
+        * `API Recall`: share of library-specific API calls used in the reference program that appear in the generated code,
+        * `ChrF`: textual similarity between the generated code and the reference program.
+        For further details on the dataset and the baselines from 🏟️ Long Code Arena Team, refer to `library_based_code_generation` folder in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines) or to our preprint (TODO).
+        """,
     "ci_builds_repair": "cool description for Bug Localization on Build Logs task",
     "project_code_completion": """# Project-Level Code Completion\n
+        Our Project-Level Code Completion benchmark 🤗 [JetBrains-Research/lca-code-completion](https://huggingface.co/datasets/JetBrains-Research/lca-code-completion) includes four datasets:
         * `small-context`: 144 data points,
         * `medium-context`: 224 data points,
         * `large-context`: 270 data points,
         For further details on the dataset and the baselines from 🏟️ Long Code Arena Team, refer to `code_completion` folder in [our baselines repository](https://github.com/JetBrains-Research/lca-baselines) or to our preprint (TODO).
         """,
     "commit_message_generation": """# Commit Message Generation\n
         Our Commit Message Generation benchmark 🤗 [JetBrains-Research/lca-commit-message-generation](https://huggingface.co/datasets/JetBrains-Research/lca-commit-message-generation) includes 163 manually curated commits from Python projects.
         **Note.** The leaderboard is sorted by ROUGE-1 metric by default.
         """,
     "bug_localization": """# Bug Localization\n
         Our Module-to-Text benchmark 🤗 [JetBrains-Research/lca-bug-localization](https://huggingface.co/datasets/JetBrains-Research/lca-bug-localization) includes 7,479 bug issue descriptions with information about pull request that fix them for Python, Java and Kotlin projects.
         Moreover, 150 data points from the test split were manually verified and can be used for bug localization approaches evaluation.
+        We used information retrieval metrics such as R@k, P@k and F1-score for evaluation, taking k equal to 1 and 2.
     """,
     "module_summarization": """# Module Summarization\n
         Our Module-to-Text benchmark 🤗 [JetBrains-Research/lca-module-summarization](https://huggingface.co/datasets/JetBrains-Research/lca-module-summarization) includes 216 manually curated text files describing different documentation of opensource permissive Python projects.
     if task_id == "commit_message_generation":
         return f"""**{task_pretty} Instructions:**\n\n* Please, attach files in [JSONLines format](https://jsonlines.org/). For an example, check the predictions provided by 🏟️ Long Code Arena Team in  🤗 [JetBrains-Research/lca-results](https://huggingface.co/datasets/JetBrains-Research/lca-results/tree/main/commit_message_generation/predictions). Make sure to include `"prediction"` and `"reference"` fields for each example, the rest are optional."""
     return f"**{task_pretty} Instructions:**\n\n* 🚧 There are no instructions for the current task yet."