Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -245,6 +245,20 @@ with torch.no_grad():
|
|
| 245 |
|
| 246 |
## Fine-tune
|
| 247 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
You can fine-tune the reranker with the following code:
|
| 249 |
|
| 250 |
**For llm-based reranker**
|
|
|
|
| 245 |
|
| 246 |
## Fine-tune
|
| 247 |
|
| 248 |
+
### Data Format
|
| 249 |
+
|
| 250 |
+
Train data should be a json file, where each line is a dict like this:
|
| 251 |
+
|
| 252 |
+
```
|
| 253 |
+
{"query": str, "pos": List[str], "neg":List[str], "prompt": str}
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.
|
| 257 |
+
|
| 258 |
+
See [toy_finetune_data.jsonl](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl) for a toy data file.
|
| 259 |
+
|
| 260 |
+
### Train
|
| 261 |
+
|
| 262 |
You can fine-tune the reranker with the following code:
|
| 263 |
|
| 264 |
**For llm-based reranker**
|