transformers

Running

foldl commited on 13 days ago

Commit

2fba224

verified ·

1 Parent(s): 1793966

Fixed typo in MQA explanation (#5)

- Fixed typo in MQA explanation (fe37f33f00089015b1b020c87f82b4adcfba6cbc)

Co-authored-by: Egor Konovalov <foldl@users.noreply.huggingface.co>

Files changed (1) hide show

app/src/content/article.mdx CHANGED Viewed

@@ -532,7 +532,7 @@ The [GQA paper](https://arxiv.org/abs/2305.13245) explains how grouped-query att
 </Sidenote>
-NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 6 key/value heads (in the default config). This is a common configuration for smaller models like nanochat.
 <Sidenote>

 </Sidenote>
+NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 1 key/value head (in the default config). This is a common configuration for smaller models like nanochat.
 <Sidenote>