The Voice Activity Detection (VAD) system determines when a user is speaking and when they’ve stopped. Proper configuration is crucial for smooth conversation experiences.
The Problem
By default, the Gemini Live API’s VAD can be overly sensitive to background noise, causing:
- Interpretation of background sounds as user speech
- Interruption of the agent mid-response
- “Stuttering” where the agent repeatedly stops and starts
The Solution: VAD Presets
We’ve implemented configurable VAD sensitivity presets to handle different noise environments effectively.Available Presets
Low Sensitivity (Default - Recommended)
Low Sensitivity
Best for noisy environments or when experiencing stuttering issues.Settings:
- Start Sensitivity: LOW
- End Sensitivity: LOW
- Prefix Padding: 150ms (requires sustained speech)
- Silence Duration: 500ms (longer pause needed to end turn)
- Working in noisy environments (cafes, open offices)
- Experiencing frequent false interruptions
- Background noise is causing stuttering
- Using speakers instead of headphones (echo/feedback issues)
Medium Sensitivity
Medium Sensitivity
Balanced for typical home/office environments.Settings:
- Start Sensitivity: MEDIUM
- End Sensitivity: MEDIUM
- Prefix Padding: 100ms
- Silence Duration: 400ms
- Working in moderately quiet environments
- Want balanced responsiveness
- Occasional background noise but not constant
High Sensitivity
High Sensitivity
For very quiet environments where quick response is needed.Settings:
- Start Sensitivity: HIGH
- End Sensitivity: HIGH
- Prefix Padding: 40ms (quick response)
- Silence Duration: 300ms (quick turn ending)
- Working in silent environments
- Using high-quality microphone with noise cancellation
- Need immediate response to speech
- No background noise present
Custom Settings
Custom Configuration
Fine-tune VAD parameters for your specific needs.
Implementation Examples
Basic Usage
Dynamic Environment Switching
Troubleshooting Guide
Common Issues
- Agent Stops Mid-Sentence
- Slow Response to Interruptions
- Background Noise Interruptions
- Cuts Off During Pauses
Problem: Agent keeps stopping mid-sentenceSolution: Use ‘low’ sensitivity preset or increase
prefixPaddingMs
in custom config.Additional Audio Optimizations
Hardware Solutions
Use Headphones
Prevents speaker feedback
Directional Mic
Use a directional microphone
Hardware Cancellation
Enable hardware noise cancellation
Acoustic Treatment
Consider room acoustics
Browser Noise Suppression
The package includes built-in noise suppression that works alongside VAD:
Testing Your Configuration
1
Start with Low Sensitivity
Begin with the ‘low’ preset to establish a baseline without stuttering.
2
Test in Your Environment
Make some background noise typical to your environment:
- Type on your keyboard
- Move papers around
- Have background conversations
3
Gradually Increase Sensitivity
If the agent is not responsive enough, try ‘medium’ sensitivity.
4
Fine-tune with Custom Settings
If presets don’t work perfectly, use custom settings to dial in the exact behavior you need.
Best Practices
Default to Low
Start with ‘low’ sensitivity and only increase if needed
Consider Environment
Choose presets based on your typical working environment
Use Headphones
This prevents feedback loops and false interruptions
Monitor Logs
Check console logs for VAD configuration details:
Combine with Noise Gates
The audio pipeline includes noise gates that work with VAD to filter out background noise
Test Thoroughly
Test in your actual working environment, not just quiet spaces
Migration Guide
If you’re upgrading from a version without VAD configuration:
- Before (Stuttering)
- After (With VAD Control)
VAD Parameters Reference
- Start Sensitivity
- End Sensitivity
- Prefix Padding
- Silence Duration
Controls how easily VAD detects speech start:
- LOW: Requires clear, sustained speech
- MEDIUM: Balanced detection
- HIGH: Detects even slight sounds
Debug Mode
Enable debug logging to see VAD behavior:Conclusion
Proper VAD configuration is crucial for a smooth conversation experience with the Gemini Live API. By defaulting to ‘low’ sensitivity and providing easy configuration options, we’ve addressed the common stuttering issues while maintaining flexibility for different use cases.Remember: When in doubt, use ‘low’ sensitivity - it’s better to require clearer speech than to have constant false interruptions.