Spaces:
Running
Running
Init
Browse files- README.md +60 -51
- package-lock.json +0 -0
- package.json +3 -1
- src/App.css +157 -25
- src/App.js +311 -16
README.md
CHANGED
|
@@ -8,76 +8,85 @@ pinned: false
|
|
| 8 |
app_build_command: npm run build
|
| 9 |
app_file: build/index.html
|
| 10 |
license: mit
|
| 11 |
-
short_description: NVIDIA Parakeet speech recognition for the browser (WebGPU)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
#
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
|
| 25 |
-
Open [http://localhost:3000](http://localhost:3000) to view it in your browser.
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
| 33 |
-
See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
|
|
|
| 42 |
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
To learn React, check out the [React documentation](https://reactjs.org/).
|
| 60 |
-
|
| 61 |
-
### Code Splitting
|
| 62 |
-
|
| 63 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/code-splitting](https://facebook.github.io/create-react-app/docs/code-splitting)
|
| 64 |
-
|
| 65 |
-
### Analyzing the Bundle Size
|
| 66 |
-
|
| 67 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size](https://facebook.github.io/create-react-app/docs/analyzing-the-bundle-size)
|
| 68 |
-
|
| 69 |
-
### Making a Progressive Web App
|
| 70 |
-
|
| 71 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app](https://facebook.github.io/create-react-app/docs/making-a-progressive-web-app)
|
| 72 |
-
|
| 73 |
-
### Advanced Configuration
|
| 74 |
-
|
| 75 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/advanced-configuration](https://facebook.github.io/create-react-app/docs/advanced-configuration)
|
| 76 |
-
|
| 77 |
-
### Deployment
|
| 78 |
-
|
| 79 |
-
This section has moved here: [https://facebook.github.io/create-react-app/docs/deployment](https://facebook.github.io/create-react-app/docs/deployment)
|
| 80 |
-
|
| 81 |
-
### `npm run build` fails to minify
|
| 82 |
|
| 83 |
-
|
|
|
|
| 8 |
app_build_command: npm run build
|
| 9 |
app_file: build/index.html
|
| 10 |
license: mit
|
| 11 |
+
short_description: NVIDIA Parakeet speech recognition for the browser (WebGPU/WASM)
|
| 12 |
+
models:
|
| 13 |
+
- ysdede/parakeet-tdt-0.6b-v2-onnx
|
| 14 |
+
tags:
|
| 15 |
+
- parakeet
|
| 16 |
+
- speech
|
| 17 |
+
- onnx
|
| 18 |
+
- webgpu
|
| 19 |
+
- wasm
|
| 20 |
+
- transcription
|
| 21 |
+
- nvidia
|
| 22 |
+
- speech-recognition
|
| 23 |
+
- browser
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# 🐠 Parakeet.js - HF Spaces Demo
|
| 27 |
|
| 28 |
+
> **NVIDIA Parakeet speech recognition for the browser using WebGPU/WASM**
|
| 29 |
|
| 30 |
+
This demo showcases the **[parakeet.js](https://www.npmjs.com/package/parakeet.js)** library, which brings NVIDIA's Parakeet speech recognition models to the browser using ONNX Runtime Web with WebGPU and WASM backends.
|
| 31 |
|
| 32 |
+
## 🚀 Features
|
| 33 |
|
| 34 |
+
- **🖥️ Browser-based**: Runs entirely in your browser - no server required
|
| 35 |
+
- **⚡ WebGPU acceleration**: Fast inference using WebGPU when available
|
| 36 |
+
- **🔧 WASM fallback**: CPU-based inference using WebAssembly
|
| 37 |
+
- **📱 Multiple formats**: Supports various audio formats (WAV, MP3, etc.)
|
| 38 |
+
- **🎯 Real-time performance**: Optimized for fast transcription
|
| 39 |
+
- **📊 Performance metrics**: Shows detailed timing information
|
| 40 |
+
- **🎛️ Configurable**: Adjustable quantization, preprocessing, and backend settings
|
| 41 |
|
| 42 |
+
## 🔧 How to Use
|
|
|
|
| 43 |
|
| 44 |
+
1. **Click "Load Model"** to download and initialize the speech recognition model
|
| 45 |
+
2. **Select your preferences**:
|
| 46 |
+
- **Backend**: Choose WebGPU (faster) or WASM (more compatible)
|
| 47 |
+
- **Quantization**: fp32 (higher quality) or int8 (faster)
|
| 48 |
+
- **Preprocessor**: Different audio processing options
|
| 49 |
+
3. **Upload an audio file** using the file input
|
| 50 |
+
4. **View the transcription** in real-time with performance metrics
|
| 51 |
|
| 52 |
+
## 📦 Integration
|
| 53 |
|
| 54 |
+
You can use parakeet.js in your own projects:
|
|
|
|
| 55 |
|
| 56 |
+
```bash
|
| 57 |
+
npm install parakeet.js onnxruntime-web
|
| 58 |
+
```
|
| 59 |
|
| 60 |
+
```javascript
|
| 61 |
+
import { ParakeetModel, getParakeetModel } from 'parakeet.js';
|
| 62 |
|
| 63 |
+
// Load model from HuggingFace Hub
|
| 64 |
+
const modelUrls = await getParakeetModel('ysdede/parakeet-tdt-0.6b-v2-onnx');
|
| 65 |
+
const model = await ParakeetModel.fromUrls(modelUrls);
|
| 66 |
|
| 67 |
+
// Transcribe audio
|
| 68 |
+
const result = await model.transcribe(audioData, sampleRate);
|
| 69 |
+
console.log(result.utterance_text);
|
| 70 |
+
```
|
| 71 |
|
| 72 |
+
## 🔗 Links
|
| 73 |
|
| 74 |
+
- **📚 [GitHub Repository](https://github.com/ysdede/parakeet.js)** - Source code and documentation
|
| 75 |
+
- **📦 [npm Package](https://www.npmjs.com/package/parakeet.js)** - Install via npm
|
| 76 |
+
- **🤖 [NVIDIA Parakeet Model](https://huggingface.co/nvidia/parakeet-tdt-1.1b)** - Original model on HuggingFace
|
| 77 |
|
| 78 |
+
## 🧠 Model Information
|
| 79 |
|
| 80 |
+
This demo uses the **ysdede/parakeet-tdt-0.6b-v2-onnx** model, which is an ONNX-converted version of NVIDIA's Parakeet speech recognition model optimized for browser deployment.
|
| 81 |
|
| 82 |
+
## 💡 Technical Details
|
| 83 |
|
| 84 |
+
- **Model Format**: ONNX for cross-platform compatibility
|
| 85 |
+
- **Backends**: WebGPU (GPU acceleration) and WASM (CPU fallback)
|
| 86 |
+
- **Quantization**: Support for both fp32 and int8 precision
|
| 87 |
+
- **Audio Processing**: Built-in preprocessing for various audio formats
|
| 88 |
+
- **Performance**: Real-time factor (RTF) typically < 1.0x for fast transcription
|
| 89 |
|
| 90 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
*Built with ❤️ using React and deployed on Hugging Face Spaces*
|
package-lock.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
package.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"name": "
|
| 3 |
"version": "0.1.0",
|
| 4 |
"private": true,
|
| 5 |
"dependencies": {
|
|
@@ -7,6 +7,8 @@
|
|
| 7 |
"@testing-library/jest-dom": "^6.6.3",
|
| 8 |
"@testing-library/react": "^16.3.0",
|
| 9 |
"@testing-library/user-event": "^13.5.0",
|
|
|
|
|
|
|
| 10 |
"react": "^19.1.0",
|
| 11 |
"react-dom": "^19.1.0",
|
| 12 |
"react-scripts": "5.0.1",
|
|
|
|
| 1 |
{
|
| 2 |
+
"name": "parakeet-js-hf-spaces-demo",
|
| 3 |
"version": "0.1.0",
|
| 4 |
"private": true,
|
| 5 |
"dependencies": {
|
|
|
|
| 7 |
"@testing-library/jest-dom": "^6.6.3",
|
| 8 |
"@testing-library/react": "^16.3.0",
|
| 9 |
"@testing-library/user-event": "^13.5.0",
|
| 10 |
+
"parakeet.js": "^0.0.1",
|
| 11 |
+
"onnxruntime-web": "1.22.0-dev.20250409-89f8206ba4",
|
| 12 |
"react": "^19.1.0",
|
| 13 |
"react-dom": "^19.1.0",
|
| 14 |
"react-scripts": "5.0.1",
|
src/App.css
CHANGED
|
@@ -1,38 +1,170 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
| 3 |
}
|
| 4 |
|
| 5 |
-
.
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
}
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
| 14 |
}
|
| 15 |
|
| 16 |
-
.
|
| 17 |
-
|
| 18 |
-
min-height: 100vh;
|
| 19 |
display: flex;
|
| 20 |
-
flex-direction: column;
|
| 21 |
align-items: center;
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
}
|
| 26 |
|
| 27 |
-
.
|
| 28 |
-
color: #
|
|
|
|
| 29 |
}
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
transform: rotate(0deg);
|
| 34 |
-
}
|
| 35 |
-
to {
|
| 36 |
-
transform: rotate(360deg);
|
| 37 |
-
}
|
| 38 |
}
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
font-family: Inter, system-ui, sans-serif;
|
| 3 |
+
line-height: 1.4;
|
| 4 |
+
color: #222;
|
| 5 |
+
background: #f3f6f8;
|
| 6 |
}
|
| 7 |
|
| 8 |
+
.app {
|
| 9 |
+
max-width: 760px;
|
| 10 |
+
margin: 2rem auto;
|
| 11 |
+
background: #ffffff;
|
| 12 |
+
border-radius: 8px;
|
| 13 |
+
padding: 1.5rem 2rem;
|
| 14 |
+
box-shadow: 0 4px 14px rgba(0, 0, 0, 0.06);
|
| 15 |
}
|
| 16 |
|
| 17 |
+
.controls {
|
| 18 |
+
display: flex;
|
| 19 |
+
flex-wrap: wrap;
|
| 20 |
+
gap: 0.75rem;
|
| 21 |
+
align-items: center;
|
| 22 |
+
margin-bottom: 1rem;
|
| 23 |
}
|
| 24 |
|
| 25 |
+
.controls label {
|
| 26 |
+
font-size: 0.9rem;
|
|
|
|
| 27 |
display: flex;
|
|
|
|
| 28 |
align-items: center;
|
| 29 |
+
gap: 0.35rem;
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
.controls select,
|
| 33 |
+
.controls input[type="number"] {
|
| 34 |
+
padding: 0.25rem 0.5rem;
|
| 35 |
+
border: 1px solid #d1d5db;
|
| 36 |
+
border-radius: 4px;
|
| 37 |
+
background: #fff;
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
button.primary {
|
| 41 |
+
padding: 0.4rem 0.9rem;
|
| 42 |
+
background: #3b82f6;
|
| 43 |
+
color: #ffffff;
|
| 44 |
+
border: none;
|
| 45 |
+
border-radius: 4px;
|
| 46 |
+
cursor: pointer;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
button.primary:hover {
|
| 50 |
+
background: #2563eb;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
.status {
|
| 54 |
+
margin-top: 0.5rem;
|
| 55 |
+
font-weight: 500;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.progress-wrapper {
|
| 59 |
+
margin: 0.5rem 0;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
.progress-bar {
|
| 63 |
+
height: 8px;
|
| 64 |
+
background: #e2e8f0;
|
| 65 |
+
border-radius: 4px;
|
| 66 |
+
overflow: hidden;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
.progress-bar > div {
|
| 70 |
+
height: 100%;
|
| 71 |
+
background: #10b981;
|
| 72 |
+
transition: width 0.2s;
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
.progress-text {
|
| 76 |
+
font-size: 0.8rem;
|
| 77 |
+
color: #555;
|
| 78 |
+
margin-top: 0.25rem;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
.textarea {
|
| 82 |
+
width: 100%;
|
| 83 |
+
height: 6rem;
|
| 84 |
+
resize: vertical;
|
| 85 |
+
padding: 0.75rem;
|
| 86 |
+
border: 1px solid #d1d5db;
|
| 87 |
+
border-radius: 4px;
|
| 88 |
+
font-family: inherit;
|
| 89 |
+
font-size: 0.9rem;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.performance {
|
| 93 |
+
font-size: 0.85rem;
|
| 94 |
+
background: #ecfdf5;
|
| 95 |
+
padding: 0.5rem 0.75rem;
|
| 96 |
+
border-radius: 6px;
|
| 97 |
+
border: 1px solid #d1fae5;
|
| 98 |
+
margin-bottom: 1rem;
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
.history {
|
| 102 |
+
margin-top: 1rem;
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
.history h3 {
|
| 106 |
+
margin-bottom: 0.5rem;
|
| 107 |
+
color: #333;
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
.history-item {
|
| 111 |
+
padding: 1rem;
|
| 112 |
+
border-bottom: 1px solid #f1f5f9;
|
| 113 |
+
background: #ffffff;
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
.history-item:last-child {
|
| 117 |
+
border-bottom: none;
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.history-meta {
|
| 121 |
+
display: flex;
|
| 122 |
+
justify-content: space-between;
|
| 123 |
+
font-size: 0.9rem;
|
| 124 |
+
color: #666;
|
| 125 |
+
margin-bottom: 0.5rem;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.history-stats {
|
| 129 |
+
font-size: 0.75rem;
|
| 130 |
+
color: #666;
|
| 131 |
+
margin-bottom: 0.5rem;
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
.history-text {
|
| 135 |
+
background: #f9fafb;
|
| 136 |
+
padding: 0.5rem 0.75rem;
|
| 137 |
+
border-radius: 4px;
|
| 138 |
+
border: 1px solid #e5e7eb;
|
| 139 |
+
font-size: 0.9rem;
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
/* HF Spaces specific styles */
|
| 143 |
+
.app h2 {
|
| 144 |
+
margin-top: 0;
|
| 145 |
+
color: #1f2937;
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.app p {
|
| 149 |
+
margin-bottom: 1rem;
|
| 150 |
+
color: #6b7280;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.app h3 {
|
| 154 |
+
color: #374151;
|
| 155 |
+
margin-bottom: 0.5rem;
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.app h4 {
|
| 159 |
+
color: #374151;
|
| 160 |
+
margin-bottom: 0.5rem;
|
| 161 |
}
|
| 162 |
|
| 163 |
+
.app a {
|
| 164 |
+
color: #3b82f6;
|
| 165 |
+
text-decoration: none;
|
| 166 |
}
|
| 167 |
|
| 168 |
+
.app a:hover {
|
| 169 |
+
text-decoration: underline;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
}
|
src/App.js
CHANGED
|
@@ -1,25 +1,320 @@
|
|
| 1 |
-
import
|
|
|
|
| 2 |
import './App.css';
|
| 3 |
|
| 4 |
-
function App() {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
return (
|
| 6 |
-
<div className="
|
| 7 |
-
<
|
| 8 |
-
|
|
|
|
|
|
|
| 9 |
<p>
|
| 10 |
-
|
| 11 |
</p>
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
>
|
| 18 |
-
|
| 19 |
-
</
|
| 20 |
-
</
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
</div>
|
| 22 |
);
|
| 23 |
}
|
| 24 |
-
|
| 25 |
-
export default App;
|
|
|
|
| 1 |
+
import React, { useState, useRef, useEffect } from 'react';
|
| 2 |
+
import { ParakeetModel, getParakeetModel } from 'parakeet.js';
|
| 3 |
import './App.css';
|
| 4 |
|
| 5 |
+
export default function App() {
|
| 6 |
+
const repoId = 'ysdede/parakeet-tdt-0.6b-v2-onnx';
|
| 7 |
+
const [backend, setBackend] = useState('webgpu-hybrid');
|
| 8 |
+
const [quant, setQuant] = useState('fp32');
|
| 9 |
+
const [preprocessor, setPreprocessor] = useState('nemo128');
|
| 10 |
+
const [status, setStatus] = useState('Idle');
|
| 11 |
+
const [progress, setProgress] = useState('');
|
| 12 |
+
const [progressText, setProgressText] = useState('');
|
| 13 |
+
const [progressPct, setProgressPct] = useState(null);
|
| 14 |
+
const [text, setText] = useState('');
|
| 15 |
+
const [latestMetrics, setLatestMetrics] = useState(null);
|
| 16 |
+
const [transcriptions, setTranscriptions] = useState([]);
|
| 17 |
+
const [isTranscribing, setIsTranscribing] = useState(false);
|
| 18 |
+
const [verboseLog, setVerboseLog] = useState(false);
|
| 19 |
+
const [decoderInt8, setDecoderInt8] = useState(true);
|
| 20 |
+
const [frameStride, setFrameStride] = useState(1);
|
| 21 |
+
const [dumpDetail, setDumpDetail] = useState(false);
|
| 22 |
+
const maxCores = navigator.hardwareConcurrency || 8;
|
| 23 |
+
const [cpuThreads, setCpuThreads] = useState(Math.max(1, maxCores - 2));
|
| 24 |
+
const modelRef = useRef(null);
|
| 25 |
+
const fileInputRef = useRef(null);
|
| 26 |
+
|
| 27 |
+
// Auto-adjust quant preset when backend changes
|
| 28 |
+
useEffect(() => {
|
| 29 |
+
if (backend.startsWith('webgpu')) {
|
| 30 |
+
setQuant('fp32');
|
| 31 |
+
} else if (backend === 'wasm') {
|
| 32 |
+
setQuant('int8');
|
| 33 |
+
}
|
| 34 |
+
}, [backend]);
|
| 35 |
+
|
| 36 |
+
async function loadModel() {
|
| 37 |
+
setStatus('Loading model…');
|
| 38 |
+
setProgress('');
|
| 39 |
+
setProgressText('');
|
| 40 |
+
setProgressPct(0);
|
| 41 |
+
console.time('LoadModel');
|
| 42 |
+
|
| 43 |
+
try {
|
| 44 |
+
const progressCallback = ({ loaded, total, file }) => {
|
| 45 |
+
const pct = total > 0 ? Math.round((loaded / total) * 100) : 0;
|
| 46 |
+
setProgressText(`${file}: ${pct}%`);
|
| 47 |
+
setProgressPct(pct);
|
| 48 |
+
};
|
| 49 |
+
|
| 50 |
+
// 1. Download all model files from HuggingFace Hub
|
| 51 |
+
const modelUrls = await getParakeetModel(repoId, {
|
| 52 |
+
quantization: quant,
|
| 53 |
+
preprocessor,
|
| 54 |
+
backend, // Pass backend to enable automatic fp32 selection for WebGPU
|
| 55 |
+
decoderInt8,
|
| 56 |
+
progress: progressCallback
|
| 57 |
+
});
|
| 58 |
+
|
| 59 |
+
// Show compiling sessions stage
|
| 60 |
+
setStatus('Creating sessions…');
|
| 61 |
+
setProgressText('Compiling model (this may take ~10 s)…');
|
| 62 |
+
setProgressPct(null);
|
| 63 |
+
|
| 64 |
+
// 2. Create the model instance with all file URLs
|
| 65 |
+
modelRef.current = await ParakeetModel.fromUrls({
|
| 66 |
+
...modelUrls.urls,
|
| 67 |
+
filenames: modelUrls.filenames,
|
| 68 |
+
backend,
|
| 69 |
+
verbose: verboseLog,
|
| 70 |
+
decoderOnWasm: decoderInt8, // if we selected int8 decoder, keep it on WASM
|
| 71 |
+
decoderInt8,
|
| 72 |
+
cpuThreads,
|
| 73 |
+
});
|
| 74 |
+
|
| 75 |
+
// 3. Warm-up and verify
|
| 76 |
+
setStatus('Warming up & verifying…');
|
| 77 |
+
setProgressText('Model ready! Upload an audio file to transcribe.');
|
| 78 |
+
setProgressPct(null);
|
| 79 |
+
|
| 80 |
+
console.timeEnd('LoadModel');
|
| 81 |
+
setStatus('Model ready ✔');
|
| 82 |
+
setProgressText('');
|
| 83 |
+
} catch (e) {
|
| 84 |
+
console.error(e);
|
| 85 |
+
setStatus(`Failed: ${e.message}`);
|
| 86 |
+
setProgress('');
|
| 87 |
+
}
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
async function transcribeFile(e) {
|
| 91 |
+
if (!modelRef.current) return alert('Load model first');
|
| 92 |
+
const file = e.target.files?.[0];
|
| 93 |
+
if (!file) return;
|
| 94 |
+
|
| 95 |
+
setIsTranscribing(true);
|
| 96 |
+
setStatus(`Transcribing "${file.name}"…`);
|
| 97 |
+
|
| 98 |
+
try {
|
| 99 |
+
const buf = await file.arrayBuffer();
|
| 100 |
+
const audioCtx = new AudioContext({ sampleRate: 16000 });
|
| 101 |
+
const decoded = await audioCtx.decodeAudioData(buf);
|
| 102 |
+
const pcm = decoded.getChannelData(0);
|
| 103 |
+
|
| 104 |
+
console.time(`Transcribe-${file.name}`);
|
| 105 |
+
const res = await modelRef.current.transcribe(pcm, 16_000, {
|
| 106 |
+
returnTimestamps: true,
|
| 107 |
+
returnConfidences: true,
|
| 108 |
+
frameStride
|
| 109 |
+
});
|
| 110 |
+
console.timeEnd(`Transcribe-${file.name}`);
|
| 111 |
+
|
| 112 |
+
if (dumpDetail) {
|
| 113 |
+
console.log('[Parakeet] Detailed transcription output', res);
|
| 114 |
+
}
|
| 115 |
+
setLatestMetrics(res.metrics);
|
| 116 |
+
// Add to transcriptions list
|
| 117 |
+
const newTranscription = {
|
| 118 |
+
id: Date.now(),
|
| 119 |
+
filename: file.name,
|
| 120 |
+
text: res.utterance_text,
|
| 121 |
+
timestamp: new Date().toLocaleTimeString(),
|
| 122 |
+
duration: pcm.length / 16000, // duration in seconds
|
| 123 |
+
wordCount: res.words?.length || 0,
|
| 124 |
+
confidence: res.confidence_scores?.overall_log_prob || null,
|
| 125 |
+
metrics: res.metrics
|
| 126 |
+
};
|
| 127 |
+
|
| 128 |
+
setTranscriptions(prev => [newTranscription, ...prev]);
|
| 129 |
+
setText(res.utterance_text); // Show latest transcription
|
| 130 |
+
setStatus('Model ready ✔'); // Ready for next file
|
| 131 |
+
|
| 132 |
+
} catch (error) {
|
| 133 |
+
console.error('Transcription failed:', error);
|
| 134 |
+
setStatus('Transcription failed');
|
| 135 |
+
alert(`Failed to transcribe "${file.name}": ${error.message}`);
|
| 136 |
+
} finally {
|
| 137 |
+
setIsTranscribing(false);
|
| 138 |
+
// Clear the file input so the same file can be selected again
|
| 139 |
+
if (fileInputRef.current) {
|
| 140 |
+
fileInputRef.current.value = '';
|
| 141 |
+
}
|
| 142 |
+
}
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
function clearTranscriptions() {
|
| 146 |
+
setTranscriptions([]);
|
| 147 |
+
setText('');
|
| 148 |
+
}
|
| 149 |
+
|
| 150 |
return (
|
| 151 |
+
<div className="app">
|
| 152 |
+
<h2>🐠 Parakeet.js - HF Spaces Demo</h2>
|
| 153 |
+
<p>NVIDIA Parakeet speech recognition for the browser using WebGPU/WASM</p>
|
| 154 |
+
|
| 155 |
+
<div className="controls">
|
| 156 |
<p>
|
| 157 |
+
<strong>Model:</strong> {repoId}
|
| 158 |
</p>
|
| 159 |
+
</div>
|
| 160 |
+
|
| 161 |
+
<div className="controls">
|
| 162 |
+
<label>
|
| 163 |
+
Backend:
|
| 164 |
+
<select value={backend} onChange={e=>setBackend(e.target.value)}>
|
| 165 |
+
<option value="webgpu-hybrid">WebGPU (Hybrid)</option>
|
| 166 |
+
<option value="webgpu-strict">WebGPU (Strict)</option>
|
| 167 |
+
<option value="wasm">WASM (CPU)</option>
|
| 168 |
+
</select>
|
| 169 |
+
</label>
|
| 170 |
+
{' '}
|
| 171 |
+
<label>
|
| 172 |
+
Quant:
|
| 173 |
+
<select value={quant} onChange={e=>setQuant(e.target.value)}>
|
| 174 |
+
<option value="int8">int8 (faster)</option>
|
| 175 |
+
<option value="fp32">fp32 (higher quality)</option>
|
| 176 |
+
</select>
|
| 177 |
+
</label>
|
| 178 |
+
{' '}
|
| 179 |
+
{backend.startsWith('webgpu') && (
|
| 180 |
+
<label style={{ fontSize:'0.9em' }}>
|
| 181 |
+
<input type="checkbox" checked={decoderInt8} onChange={e=>setDecoderInt8(e.target.checked)} />
|
| 182 |
+
Decoder INT8 on CPU
|
| 183 |
+
</label>
|
| 184 |
+
)}
|
| 185 |
+
{' '}
|
| 186 |
+
<label>
|
| 187 |
+
Preprocessor:
|
| 188 |
+
<select value={preprocessor} onChange={e=>setPreprocessor(e.target.value)}>
|
| 189 |
+
<option value="nemo80">nemo80 (smaller)</option>
|
| 190 |
+
<option value="nemo128">nemo128 (default)</option>
|
| 191 |
+
</select>
|
| 192 |
+
</label>
|
| 193 |
+
{' '}
|
| 194 |
+
<label>
|
| 195 |
+
Stride:
|
| 196 |
+
<select value={frameStride} onChange={e=>setFrameStride(Number(e.target.value))}>
|
| 197 |
+
<option value={1}>1</option>
|
| 198 |
+
<option value={2}>2</option>
|
| 199 |
+
<option value={4}>4</option>
|
| 200 |
+
</select>
|
| 201 |
+
</label>
|
| 202 |
+
{' '}
|
| 203 |
+
<label>
|
| 204 |
+
<input type="checkbox" checked={verboseLog} onChange={e => setVerboseLog(e.target.checked)} />
|
| 205 |
+
Verbose Log
|
| 206 |
+
</label>
|
| 207 |
+
{' '}
|
| 208 |
+
<label style={{fontSize:'0.9em'}}>
|
| 209 |
+
<input type="checkbox" checked={dumpDetail} onChange={e=>setDumpDetail(e.target.checked)} />
|
| 210 |
+
Dump result to console
|
| 211 |
+
</label>
|
| 212 |
+
{(backend === 'wasm' || decoderInt8) && (
|
| 213 |
+
<label style={{fontSize:'0.9em'}}>
|
| 214 |
+
Threads:
|
| 215 |
+
<input type="number" min="1" max={maxCores} value={cpuThreads} onChange={e=>setCpuThreads(Number(e.target.value))} style={{width:'4rem'}} />
|
| 216 |
+
</label>
|
| 217 |
+
)}
|
| 218 |
+
<button
|
| 219 |
+
onClick={loadModel}
|
| 220 |
+
disabled={!status.toLowerCase().includes('fail') && status !== 'Idle'}
|
| 221 |
+
className="primary"
|
| 222 |
>
|
| 223 |
+
{status === 'Model ready ✔' ? 'Model Loaded' : 'Load Model'}
|
| 224 |
+
</button>
|
| 225 |
+
</div>
|
| 226 |
+
|
| 227 |
+
{typeof SharedArrayBuffer === 'undefined' && backend === 'wasm' && (
|
| 228 |
+
<div style={{
|
| 229 |
+
marginBottom: '1rem',
|
| 230 |
+
padding: '0.5rem',
|
| 231 |
+
backgroundColor: '#fff3cd',
|
| 232 |
+
border: '1px solid #ffeaa7',
|
| 233 |
+
borderRadius: '4px',
|
| 234 |
+
fontSize: '0.9em'
|
| 235 |
+
}}>
|
| 236 |
+
⚠️ <strong>Performance Note:</strong> SharedArrayBuffer is not available.
|
| 237 |
+
WASM will run single-threaded. For better performance, use WebGPU.
|
| 238 |
+
</div>
|
| 239 |
+
)}
|
| 240 |
+
|
| 241 |
+
<div className="controls">
|
| 242 |
+
<input
|
| 243 |
+
ref={fileInputRef}
|
| 244 |
+
type="file"
|
| 245 |
+
accept="audio/*"
|
| 246 |
+
onChange={transcribeFile}
|
| 247 |
+
disabled={status !== 'Model ready ✔' || isTranscribing}
|
| 248 |
+
/>
|
| 249 |
+
{transcriptions.length > 0 && (
|
| 250 |
+
<button
|
| 251 |
+
onClick={clearTranscriptions}
|
| 252 |
+
style={{ marginLeft: '1rem', padding: '0.25rem 0.5rem' }}
|
| 253 |
+
>
|
| 254 |
+
Clear History
|
| 255 |
+
</button>
|
| 256 |
+
)}
|
| 257 |
+
</div>
|
| 258 |
+
|
| 259 |
+
<p>Status: {status}</p>
|
| 260 |
+
{progressPct!==null && (
|
| 261 |
+
<div className="progress-wrapper">
|
| 262 |
+
<div className="progress-bar"><div style={{ width: `${progressPct}%` }} /></div>
|
| 263 |
+
<p className="progress-text">{progressText}</p>
|
| 264 |
+
</div>
|
| 265 |
+
)}
|
| 266 |
+
|
| 267 |
+
{/* Latest transcription */}
|
| 268 |
+
<div className="controls">
|
| 269 |
+
<h3>Latest Transcription:</h3>
|
| 270 |
+
<textarea
|
| 271 |
+
value={text}
|
| 272 |
+
readOnly
|
| 273 |
+
className="textarea"
|
| 274 |
+
placeholder="Transcribed text will appear here..."
|
| 275 |
+
/>
|
| 276 |
+
</div>
|
| 277 |
+
|
| 278 |
+
{/* Latest transcription performace info */}
|
| 279 |
+
{latestMetrics && (
|
| 280 |
+
<div className="performance">
|
| 281 |
+
<strong>RTF:</strong> {latestMetrics.rtf?.toFixed(2)}x | Total: {latestMetrics.total_ms} ms<br/>
|
| 282 |
+
Preprocess {latestMetrics.preprocess_ms} ms · Encode {latestMetrics.encode_ms} ms · Decode {latestMetrics.decode_ms} ms · Tokenize {latestMetrics.tokenize_ms} ms
|
| 283 |
+
</div>
|
| 284 |
+
)}
|
| 285 |
+
|
| 286 |
+
{/* Transcription history */}
|
| 287 |
+
{transcriptions.length > 0 && (
|
| 288 |
+
<div className="history">
|
| 289 |
+
<h3>Transcription History ({transcriptions.length} files):</h3>
|
| 290 |
+
<div style={{ maxHeight: '400px', overflowY: 'auto', border: '1px solid #ddd', borderRadius: '4px' }}>
|
| 291 |
+
{transcriptions.map((trans) => (
|
| 292 |
+
<div className="history-item" key={trans.id}>
|
| 293 |
+
<div className="history-meta"><strong>{trans.filename}</strong><span>{trans.timestamp}</span></div>
|
| 294 |
+
<div className="history-stats">Duration: {trans.duration.toFixed(1)}s | Words: {trans.wordCount}{trans.confidence && ` | Confidence: ${trans.confidence.toFixed(2)}`}{trans.metrics && ` | RTF: ${trans.metrics.rtf?.toFixed(2)}x`}</div>
|
| 295 |
+
<div className="history-text">{trans.text}</div>
|
| 296 |
+
</div>
|
| 297 |
+
))}
|
| 298 |
+
</div>
|
| 299 |
+
</div>
|
| 300 |
+
)}
|
| 301 |
+
|
| 302 |
+
<div style={{ marginTop: '2rem', padding: '1rem', backgroundColor: '#f8f9fa', borderRadius: '4px', fontSize: '0.9em' }}>
|
| 303 |
+
<h4>🔗 Links:</h4>
|
| 304 |
+
<p>
|
| 305 |
+
<a href="https://github.com/ysdede/parakeet.js" target="_blank" rel="noopener noreferrer">
|
| 306 |
+
GitHub Repository
|
| 307 |
+
</a>
|
| 308 |
+
{' | '}
|
| 309 |
+
<a href="https://www.npmjs.com/package/parakeet.js" target="_blank" rel="noopener noreferrer">
|
| 310 |
+
npm Package
|
| 311 |
+
</a>
|
| 312 |
+
{' | '}
|
| 313 |
+
<a href="https://huggingface.co/nvidia/parakeet-tdt-1.1b" target="_blank" rel="noopener noreferrer">
|
| 314 |
+
NVIDIA Parakeet Model
|
| 315 |
+
</a>
|
| 316 |
+
</p>
|
| 317 |
+
</div>
|
| 318 |
</div>
|
| 319 |
);
|
| 320 |
}
|
|
|
|
|
|