Skip to main content
The Voice Activity Detection (VAD) system determines when a user is speaking and when they’ve stopped. Proper configuration is crucial for smooth conversation experiences.

The Problem

By default, the Gemini Live API’s VAD can be overly sensitive to background noise, causing:
  • Interpretation of background sounds as user speech
  • Interruption of the agent mid-response
  • “Stuttering” where the agent repeatedly stops and starts

The Solution: VAD Presets

We’ve implemented configurable VAD sensitivity presets to handle different noise environments effectively.

Available Presets

Low Sensitivity

Best for noisy environments or when experiencing stuttering issues.
vadConfig: {
  sensitivity: 'low'
}
Settings:
  • Start Sensitivity: LOW
  • End Sensitivity: LOW
  • Prefix Padding: 150ms (requires sustained speech)
  • Silence Duration: 500ms (longer pause needed to end turn)
Use when:
  • Working in noisy environments (cafes, open offices)
  • Experiencing frequent false interruptions
  • Background noise is causing stuttering
  • Using speakers instead of headphones (echo/feedback issues)

Medium Sensitivity

Medium Sensitivity

Balanced for typical home/office environments.
vadConfig: {
  sensitivity: 'medium'
}
Settings:
  • Start Sensitivity: MEDIUM
  • End Sensitivity: MEDIUM
  • Prefix Padding: 100ms
  • Silence Duration: 400ms
Use when:
  • Working in moderately quiet environments
  • Want balanced responsiveness
  • Occasional background noise but not constant

High Sensitivity

High Sensitivity

For very quiet environments where quick response is needed.
vadConfig: {
  sensitivity: 'high'
}
Settings:
  • Start Sensitivity: HIGH
  • End Sensitivity: HIGH
  • Prefix Padding: 40ms (quick response)
  • Silence Duration: 300ms (quick turn ending)
Use when:
  • Working in silent environments
  • Using high-quality microphone with noise cancellation
  • Need immediate response to speech
  • No background noise present

Custom Settings

Custom Configuration

Fine-tune VAD parameters for your specific needs.
vadConfig: {
  sensitivity: 'custom',
  customSettings: {
    startSensitivity: StartSensitivity.START_SENSITIVITY_LOW,
    endSensitivity: EndSensitivity.END_SENSITIVITY_MEDIUM,
    prefixPaddingMs: 200,
    silenceDurationMs: 600
  }
}

Implementation Examples

Basic Usage

import { SammyProvider } from '@sammy-three/sammy-three';

function App() {
  return (
    <SammyProvider
      config={{
        auth: { /* your auth config */ },
        // Use low sensitivity to prevent stuttering
        vadConfig: {
          sensitivity: 'low'
        }
      }}
    >
      {/* Your app */}
    </SammyProvider>
  );
}

Dynamic Environment Switching

import { useSammyAgent } from '@sammy-three/sammy-three';

function VADControl() {
  const { updateConfig } = useSammyAgent();
  
  const handleEnvironmentChange = (environment) => {
    let sensitivity;
    switch(environment) {
      case 'noisy':
        sensitivity = 'low';
        break;
      case 'office':
        sensitivity = 'medium';
        break;
      case 'quiet':
        sensitivity = 'high';
        break;
      default:
        sensitivity = 'low';
    }
    
    updateConfig({
      vadConfig: { sensitivity }
    });
  };
  
  return (
    <select onChange={(e) => handleEnvironmentChange(e.target.value)}>
      <option value="noisy">Noisy Environment</option>
      <option value="office">Office</option>
      <option value="quiet">Quiet Room</option>
    </select>
  );
}

Troubleshooting Guide

Common Issues

  • Agent Stops Mid-Sentence
  • Slow Response to Interruptions
  • Background Noise Interruptions
  • Cuts Off During Pauses
Problem: Agent keeps stopping mid-sentenceSolution: Use ‘low’ sensitivity preset or increase prefixPaddingMs in custom config.

Additional Audio Optimizations

Hardware Solutions

Use Headphones

Prevents speaker feedback

Directional Mic

Use a directional microphone

Hardware Cancellation

Enable hardware noise cancellation

Acoustic Treatment

Consider room acoustics

Browser Noise Suppression

The package includes built-in noise suppression that works alongside VAD:
// In noise-config.ts
noiseSuppression: {
  enabled: true,
  enhancementLevel: 'medium', // 'light', 'medium', or 'aggressive'
  fallbackToBasic: true
}

Testing Your Configuration

1

Start with Low Sensitivity

Begin with the ‘low’ preset to establish a baseline without stuttering.
2

Test in Your Environment

Make some background noise typical to your environment:
  • Type on your keyboard
  • Move papers around
  • Have background conversations
3

Gradually Increase Sensitivity

If the agent is not responsive enough, try ‘medium’ sensitivity.
4

Fine-tune with Custom Settings

If presets don’t work perfectly, use custom settings to dial in the exact behavior you need.

Best Practices

Default to Low

Start with ‘low’ sensitivity and only increase if needed

Consider Environment

Choose presets based on your typical working environment

Use Headphones

This prevents feedback loops and false interruptions

Monitor Logs

Check console logs for VAD configuration details:
[Agent Config] VAD settings: { preset: 'low', config: {...} }

Combine with Noise Gates

The audio pipeline includes noise gates that work with VAD to filter out background noise

Test Thoroughly

Test in your actual working environment, not just quiet spaces

Migration Guide

If you’re upgrading from a version without VAD configuration:
  • Before (Stuttering)
  • After (With VAD Control)
// No VAD configuration available
// Uses HIGH sensitivity by default
<SammyProvider config={config}>

VAD Parameters Reference

  • Start Sensitivity
  • End Sensitivity
  • Prefix Padding
  • Silence Duration
Controls how easily VAD detects speech start:
  • LOW: Requires clear, sustained speech
  • MEDIUM: Balanced detection
  • HIGH: Detects even slight sounds

Debug Mode

Enable debug logging to see VAD behavior:
vadConfig: {
  sensitivity: 'low',
  debug: true
}

// Console output:
// [VAD] Speech detected with confidence: 0.92
// [VAD] Silence detected, duration: 450ms
// [VAD] Turn ended after 500ms silence

Conclusion

Proper VAD configuration is crucial for a smooth conversation experience with the Gemini Live API. By defaulting to ‘low’ sensitivity and providing easy configuration options, we’ve addressed the common stuttering issues while maintaining flexibility for different use cases.Remember: When in doubt, use ‘low’ sensitivity - it’s better to require clearer speech than to have constant false interruptions.
I