File size: 5,795 Bytes
8c2765a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# Music Flamingo Code Flow

```mermaid
flowchart TD
    Start([App Starts]) --> Init[Initialize App]
    Init --> LoadModel[Load Music Flamingo Model<br/>processor & model from MODEL_ID]
    LoadModel --> SetupProxy{Check for<br/>SSH Proxy?}
    SetupProxy -->|Yes| CreateTunnel[Create SSH Tunnel]
    SetupProxy -->|No| Ready[App Ready]
    CreateTunnel --> Ready
    
    Ready --> UI[Gradio UI Loaded]
    UI --> UserInput{User Input}
    
    UserInput -->|Upload Audio| AudioFile[Audio File Path]
    UserInput -->|YouTube URL| YouTubeURL[YouTube URL String]
    UserInput -->|Load Button| LoadYouTube[Load YouTube Audio]
    
    LoadYouTube --> DownloadYT[download_youtube_audio]
    DownloadYT --> CheckCache{URL in<br/>Cache?}
    CheckCache -->|Yes & Exists| ReturnCached[Return Cached File]
    CheckCache -->|No| ValidateURL[Validate YouTube URL<br/>with Regex]
    ValidateURL -->|Invalid| Error1[Return Error Message]
    ValidateURL -->|Valid| YTDL[yt-dlp Download]
    YTDL --> ExtractAudio[Extract Audio to MP3]
    ExtractAudio --> CacheFile[Cache File Path]
    CacheFile --> ReturnFile[Return File Path]
    ReturnCached --> AudioFile
    ReturnFile --> AudioFile
    
    AudioFile --> UserPrompt[User Enters Prompt]
    UserPrompt --> ClickGenerate[Click Generate Button]
    
    ClickGenerate --> Infer[infer Function]
    Infer --> DetermineSource{Audio Source?}
    DetermineSource -->|File Upload| UseFile[Use audio_path]
    DetermineSource -->|YouTube| DownloadIfNeeded[Download if not cached]
    DownloadIfNeeded --> UseFile
    
    UseFile --> CreateConversation[Create Conversation Format]
    CreateConversation --> FormatInput["conversations = [<br/>  [{<br/>    'role': 'user',<br/>    'content': [<br/>      {'type': 'text', 'text': prompt},<br/>      {'type': 'audio', 'path': file}<br/>    ]<br/>  }]<br/>]"]
    
    FormatInput --> ApplyTemplate[processor.apply_chat_template]
    ApplyTemplate --> Tokenize[Tokenize Input]
    Tokenize --> MoveToDevice[Move to model.device]
    
    MoveToDevice --> Generate[model.generate<br/>max_new_tokens=4096]
    Generate --> Decode[processor.batch_decode]
    Decode --> FormatOutput[Format Result with Status]
    FormatOutput --> Display[Display in Gradio UI]
    
    Error1 --> Display
    
    style Start fill:#90EE90
    style LoadModel fill:#FFD700
    style Generate fill:#FF6B6B
    style Display fill:#4ECDC4
    style Error1 fill:#FF6B6B
```

## Detailed Function Flow

### 1. Initialization Flow
```mermaid
sequenceDiagram
    participant App
    participant Model
    participant Proxy
    
    App->>Proxy: Check SSH environment variables
    alt Proxy Available
        Proxy->>Proxy: Create SSH tunnel
        Proxy->>App: PROXY_URL set
    end
    App->>Model: Load processor from MODEL_ID
    App->>Model: Load model with device_map="auto"
    Model->>App: Model ready
    App->>App: Launch Gradio UI
```

### 2. YouTube Download Flow
```mermaid
flowchart LR
    A[YouTube URL] --> B{Valid URL?}
    B -->|No| C[Return Error]
    B -->|Yes| D{Cached?}
    D -->|Yes| E{File Exists?}
    E -->|Yes| F[Return Cached]
    E -->|No| G[Download]
    D -->|No| G
    G --> H[yt-dlp Download]
    H --> I[Extract to MP3]
    I --> J[Cache File]
    J --> K[Return Path]
    
    style C fill:#FF6B6B
    style F fill:#90EE90
    style K fill:#90EE90
```

### 3. Model Inference Flow
```mermaid
sequenceDiagram
    participant User
    participant UI
    participant Download
    participant Processor
    participant Model
    
    User->>UI: Upload audio or YouTube URL
    UI->>Download: Get audio file path
    Download->>UI: Return file path
    User->>UI: Enter prompt
    User->>UI: Click Generate
    UI->>Processor: Create conversation format
    Processor->>Processor: apply_chat_template()
    Processor->>Processor: Tokenize input
    Processor->>Model: Send batch to device
    Model->>Model: Generate tokens (max 4096)
    Model->>Processor: Return token IDs
    Processor->>Processor: batch_decode()
    Processor->>UI: Return text result
    UI->>User: Display response
```

## Key Functions

### download_youtube_audio()
```mermaid
flowchart TD
    Start[download_youtube_audio] --> Validate[Validate URL with Regex]
    Validate -->|Invalid| ReturnError[Return None, Error]
    Validate -->|Valid| CheckCache{URL in Cache?}
    CheckCache -->|Yes| CheckFile{File Exists?}
    CheckFile -->|Yes| ReturnCached[Return Cached Path]
    CheckFile -->|No| Download[Download Audio]
    CheckCache -->|No| Download
    Download --> YTDL[yt-dlp with Options]
    YTDL --> Extract[Extract to MP3]
    Extract --> Cache[Store in Cache]
    Cache --> ReturnPath[Return Path, Status]
    
    style ReturnError fill:#FF6B6B
    style ReturnCached fill:#90EE90
    style ReturnPath fill:#90EE90
```

### infer()
```mermaid
flowchart TD
    Start[infer Function] --> GetAudio{Get Audio}
    GetAudio -->|File Upload| UseFile[Use audio_path]
    GetAudio -->|YouTube| DownloadYT[Download YouTube]
    DownloadYT -->|Success| UseFile
    DownloadYT -->|Error| ReturnError[Return Error]
    UseFile --> CreateConv[Create Conversation]
    CreateConv --> ApplyTemplate[Apply Chat Template]
    ApplyTemplate --> Generate[Model Generate]
    Generate --> Decode[Decode Output]
    Decode --> Format[Format Result]
    Format --> Return[Return Text]
    
    style ReturnError fill:#FF6B6B
    style Return fill:#90EE90
```

## Data Flow

```mermaid
flowchart LR
    A[User Input] --> B{Input Type}
    B -->|Audio File| C[File Path]
    B -->|YouTube URL| D[Download Function]
    D --> C
    C --> E[Conversation Format]
    E --> F[Processor]
    F --> G[Model]
    G --> H[Generated Text]
    H --> I[UI Display]
    
    style A fill:#4ECDC4
    style G fill:#FF6B6B
    style I fill:#90EE90
```