burtenshaw HF Staff foldl commited on
Commit
2fba224
·
verified ·
1 Parent(s): 1793966

Fixed typo in MQA explanation (#5)

Browse files

- Fixed typo in MQA explanation (fe37f33f00089015b1b020c87f82b4adcfba6cbc)


Co-authored-by: Egor Konovalov <foldl@users.noreply.huggingface.co>

Files changed (1) hide show
  1. app/src/content/article.mdx +1 -1
app/src/content/article.mdx CHANGED
@@ -532,7 +532,7 @@ The [GQA paper](https://arxiv.org/abs/2305.13245) explains how grouped-query att
532
 
533
  </Sidenote>
534
 
535
- NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 6 key/value heads (in the default config). This is a common configuration for smaller models like nanochat.
536
 
537
  <Sidenote>
538
 
 
532
 
533
  </Sidenote>
534
 
535
+ NanoChat uses Multi-Query Attention (MQA) to reduce the memory footprint of the KV cache, using 6 query heads but only 1 key/value head (in the default config). This is a common configuration for smaller models like nanochat.
536
 
537
  <Sidenote>
538