Convert voice style with gender, age, and pitch controls
Process audio to extract F0 stats and speaker embeddings