Changelog
Keep track of all the important changes and updates to Sammy ThreeStay up to date with the latest features, improvements, and bug fixes. Each update includes detailed information about what’s changed and how it affects your implementation.
Version 0.1.x Series
Configurable Screen Capture System
Overhaul of screen capture system with full configuration support and performance optimization.Features
- Fully Configurable Screen Capture - Complete implementation of all documented ScreenCaptureConfig options
- Performance Tuning - Configurable intervals, quality settings, and DOM change detection for optimal performance
- Smart Frame Management - Configurable frame deduplication with useHashing option to skip identical frames
- Adaptive Capture Timing - Separate minInterval/maxInterval settings for responsive vs idle capture behavior
- Advanced DOM Observation - Configurable MutationObserver settings with debouncing and attribute filtering
Configuration Options
- Timing Control -
checkInterval
(1000ms),minInterval
(100ms),maxInterval
(5000ms) for precise capture timing - Quality Management -
jpegQuality
(0.5),maxWidth
(1920px),maxHeight
(1080px) for size/quality balance - Change Detection -
domChangeDetection.debounceMs
(300ms) and customizableobserverConfig
for DOM monitoring - Frame Optimization -
useHashing
(true) for intelligent frame skipping and bandwidth optimization - Target Element Support - Enhanced
targetElement
andscope
options for focused capture areas
Implementation Improvements
- Fixed Documentation Accuracy - All documented configuration options now properly implemented in useSimpleRenderCapture
- Backward Compatibility - Existing implementations continue to work with same defaults
- Type Safety - All configuration options properly typed and validated
- Performance Presets - Example configurations for different performance scenarios (high-quality, balanced, performance-focused)
Integration Enhancements
Improvements
- Data Validation - Added validation for empty data in video and render capture hooks
- Error Handling - Enhanced GenAILiveClient and wrapGenAIClient to skip empty media chunks
- Data Integrity - Improved error handling and data integrity across media processing components
Scratchpad Tool & Model Precedence System
Major enhancements with new interactive tools and intelligent model selection system.Features
- Scratchpad Tool Integration - Added support for new Scratchpad tool in SammyAgentCore with user configuration options
- Model Precedence System - Introduced three-tier model precedence: Package Default, API Recommended, and User Override
- Enhanced Tool Management - Updated ToolManager to handle enableScratchpad option with intelligent tool filtering
- Agent Configuration Overrides - Added comprehensive user override support for Gemini Live API configuration
Architecture Improvements
- Flexible Model Selection - Optimized model selection with comprehensive documentation and implementation guides
- Enhanced API Integration - Improved client-server configuration integration with recommended models
- Type Safety - Updated interfaces and types to reflect new model selection capabilities
- User Customization - Better customization options through SammyAgentProvider configuration
Developer Experience
- Comprehensive Documentation - Added MODEL_PRECEDENCE_GUIDE.md and MODEL_PRECEDENCE_IMPLEMENTATION.md
- Enhanced Logging - Improved logging of effective model selection during agent sessions
- Configuration Flexibility - Support for audio, generation, safety settings, and VAD configurations
Interactive Elements & Highlighting System
Comprehensive interactive elements management with advanced highlighting capabilities and context injection improvements.Features
- Interactive Elements Manager - New system for managing and highlighting interactive page elements
- Proactive UI Highlighting - Automatic element highlighting during user interactions with structured XML rules
- Context Injection Enhancements - Improved context injection timing and memory management
- Development Overlay - Enhanced DevOverlay component with conversation ID display and trace link copying
Interactive System
- Click Detection Architecture - Advanced click handling with intelligent aggregation and filtering
- Memory Injection Timing Fix - Prevents infinite loops by ensuring memories inject only after user turns
- State-Aware Context Injection - Smart timing based on agent and user speaking states
- DOM Processing Optimization - Unified rendering and highlighting processes to reduce overhead
Developer Experience
- Comprehensive Documentation - Detailed highlighting system architecture and performance guides
- Enhanced Debugging - Improved logging and error handling for interactive elements
- Performance Optimizations - Reduced DOM processing overhead with intelligent element detection
- Development Tools - Better development overlay with conversation tracking and debugging features
Enhanced Agent Configuration & Prompt System
Major improvements to agent configuration with user overrides and enhanced prompt composition.Features
- Agent Configuration Overrides - Introduced AgentConfigOverrides interface for comprehensive customization
- Enhanced Prompt Schema - Updated agent session initialization with new composition-based prompts
- Flexible Configuration - Support for audio, generation, safety settings, and VAD configurations
- Canvas Ref Requirements - Added documentation for capture method requirements
Configuration Improvements
- User Override Support - Allow users to customize Gemini Live API configuration aspects
- Type Safety - Enhanced type exports with AgentConfigOverrides for better development experience
- Default Preservation - Maintains default settings while allowing targeted customization
- Documentation Updates - Comprehensive guides for configuration options and usage patterns
Type Safety & Package Optimization
Major improvements to type safety and package size reduction with architectural streamlining.Features
- Navigation API Types - Added minimal type definitions for Navigation API to improve type safety in navigation event handling
- Enhanced Navigation Detection - Updated NavigationDetector to prioritize Navigation API with intelligent fallback options
- Audio Processing Components - Introduced new modular audio components: AudioRecorder, AudioStreamer, NoiseSuppressionProcessor, and NoiseGateProcessor
- Duplicate Prevention - Added URL tracking in NavigationManager to prevent duplicate context injections
Performance & Size Reduction
- Removed Deprecated Components - Eliminated audio stutter analyzer and related legacy components
- Cleaned Up Unused Hooks - Removed unused hooks for audio adaptation and click detection
- Streamlined Architecture - Simplified audio handling architecture for better maintainability
- Enhanced Logging - Improved observability during navigation detection and context injection processes
Developer Experience
- Better Type Safety - Navigation event handling now has proper TypeScript support
- Improved Documentation - Updated audio processing documentation to reflect new component structure
- Enhanced QA Checklist - Added new quality assurance items for balancing proactivity and user guidance
- Cleaner Codebase - Removed legacy components and improved code organization
Proactive Navigation & Enhanced User Interaction
Major improvements to navigation handling with proactive context injection and enhanced user interaction patterns.Features
- Navigation Manager - New NavigationManager class for automatic URL change tracking and context injection
- Proactive Navigation Context - Automatic context injection during page transitions for immediate user guidance
- Enhanced GenAI Context Adapter - Improved logging and structured navigation context responses
- Navigation Event Instructions - Detailed system prompt augmentations for better navigation event handling
Improvements
- Better layer separation with dedicated navigation management
- Immediate and proactive responses when users navigate to new pages
- Differentiated responses for navigation vs click events based on context
- Enhanced user experience during page transitions with contextual guidance
- Improved observability with detailed navigation event logging
Developer Experience
- New QA checklist for quality assurance coverage
- Enhanced documentation for navigation event behavior
- Improved debugging capabilities for navigation context injection
- Better separation of concerns between navigation and click handling
Advanced Context Injection & Click Resolution
Major improvements to context management and user interaction handling with comprehensive developer tools.Features
- Context Injection System - Complete context injection system for live sessions using
sendClientContent
API - Click Resolution Architecture - New click handling system with ClickDetectionManager for enhanced interactivity
- Navigation Context - Automatic context injection during page navigation for better user experience
- Session Resumption - Added session resumption update event handling for improved session management
- Development Overlay - New DevOverlay component with conversation ID display and trace link copying
- XML Context Formatting - Structured prompt guidelines with XML-like tags for better clarity and token efficiency
Improvements
- Enhanced microphone permission management with comprehensive state tracking
- Improved click detection with intelligent aggregation and filtering
- Better memory injection timing to prevent infinite loops
- Enhanced observability tracking for context injections
- Streamlined click handling to prevent repetitive speaking and stuttering
Developer Experience
- Comprehensive documentation for context injection architecture
- Click overload prevention guides and troubleshooting
- Enhanced logging and debugging capabilities
- Updated AI model references to ‘gemini-2.5-flash-preview-native-audio-dialog’
XML Prompt Formatting & Observability Improvements
Enhanced system prompt structure and improved observability session handling.Features
- XML-like Prompt Tags - Refactored system prompt augmentations to use XML-like tags for improved structure and readability
- Enhanced Observability - Improved observability session handling with proper cleanup and better error tracking
Improvements
- Better structure and maintainability of prompt augmentation system
- Enhanced context, memory management, and tool usage instructions
- Improved reliability of observability features
- Better logging for observability session lifecycle
Lazy Loading & Documentation Enhancements
Performance optimizations through lazy loading and comprehensive documentation updates.Features
- Lazy Loading for Guides - Implemented lazy loading for user guides, fetching only when requested via
refreshUserGuides()
- Microphone Permissions Documentation - Added comprehensive guide for managing microphone permissions with implementation patterns
- Enhanced Documentation Structure - Updated documentation branding to “SAMMY Labs Docs” with improved navigation
Performance
- Reduced unnecessary API calls through on-demand guide fetching
- Better caching of guide data in the
useGuides
hook - Optimized API load with lazy loading approach
Developer Experience
- Migration guide for transitioning from automatic to lazy loading
- Detailed microphone permissions implementation examples
- Enhanced troubleshooting tips and best practices
Audio Stutter Prevention & Agent Configuration
Audio quality improvements and refined agent behavior configuration.Features
- Audio Stutter Prevention - Added comprehensive audio stutter prevention system for smoother playback
- Agent Configuration Refinements - Updated agent configuration to reduce proactive behavior and improve conversation flow
- Documentation Restructure - Major documentation reorganization with new MDX structure and improved navigation
Audio Improvements
- Enhanced audio buffer management to prevent stuttering
- Better audio streaming order and processing
- Improved audio quality during extended conversations
Developer Experience
- Streamlined codebase by removing unused click handling tools
- Enhanced documentation structure with feature-focused organization
- Added comprehensive changelog tracking system
- Updated documentation icons for better visual clarity
Long Conversations & Performance Improvements
This release brings significant performance improvements and better support for extended conversations.Features
- Long Conversation Support - Enhanced support for extended conversations without degradation
- Voice Activity Detection (VAD) - Improved voice activity detection for better conversation flow
- Click-Turn-Complete - New interaction model for better turn management
- API Optimization - Removed unnecessary validation checks for faster API requests
- Simplified Observability - Streamlined
saveTraceEvents
andflushAudio
methods by removingskipValidation
parameter
Improvements
- Better error handling during disconnection scenarios
- Enhanced cleanup logging with reason tracking
- Optimized API call performance
System Prompt Management & Theming
Major improvements to prompt management and UI customization capabilities.Features
- System Prompt Augmentor - Centralized prompt management system for better control
- Ranking System - Added ranking capabilities for better content prioritization
- Customizable Themes - SammyGuideBubble component now supports fully customizable themes
- Enhanced Observability - Improved documentation and default configuration for observability
Developer Experience
- Better theme configuration options
- Improved documentation structure
- Enhanced default configurations
Screen Awareness & Language Support
Introducing screen capture capabilities and multi-language support.Features
- Screen Capture - Added screen awareness and capture functionality
- Language Settings - New language settings package for internationalization
- Large DOM Support - Fixed render capture for large DOM structures
Improvements
- Better handling of screen sharing scenarios
- Enhanced audio processing capabilities
- Improved observability tracking
Permission Management & Theme Configuration
Enhanced security and user experience improvements.Features
- Permission Checks - Added comprehensive permission checking system
- Theme Configuration - Refactored Sammy Agent integration with enhanced theme support
- Microphone Permissions - Improved microphone permission handling and callbacks
- Microphone Hook - New hook for better microphone state management
Bug Fixes
- Fixed multiple Sammy-related bugs
- Resolved microphone hang issues
- Fixed “try it now” bug with screen share timing
Major Migration & Architecture Improvements
A significant architectural overhaul bringing better performance and maintainability.Features
- Mega Migration - Complete architectural migration for better performance
- Screen Share Improvements - Multiple fixes and enhancements for screen sharing
- Conversation Escalation - New API to escalate conversations when needed
- Session Instantiation - Improved session management and instantiation
Technical Improvements
- Removed legacy package references
- Updated dependencies
- Better error handling for destroy method with reason parameters
Guides System & Context Management
Introduction of the guides system and improved context handling.Features
- Sammy Guides Context - New context system for guided experiences
- Exposed Types - More TypeScript types exposed for better developer experience
- Provider Hook - New provider hook pattern for cleaner integration
Improvements
- Better Vite compatibility
- Enhanced build process
- Improved type definitions
Production Build & Logging
Enhanced build process and developer tools.Features
- Production Build Mode - Added production build mode for optimized deployments
- Console Logging Management - Enhanced console logging with better control
- Environment Build Support - Better environment-specific build configurations
Improvements
- Cleaner build outputs
- Better logging management
- Improved production optimizations
Context Tools & System Prompts
Major improvements to context management and tool registration.Features
- Context Tool Registration - Re-enabled context tool registration in SammyAgentCore
- System Prompt Integration - Folded system prompts into the main flow
- Observability Fixes - Multiple improvements to observability system
Technical Improvements
- Better tool management
- Improved context handling
- Enhanced system prompt processing
Service Architecture & Memory System
Introduction of service-based architecture and memory management.Features
- Conversation Service - New conversation service initialization
- Memory Service - Integrated memory service for better state management
- Context Management - Improved context handling system
- Sammy API Client - New dedicated API client for Sammy interactions
Architecture
- Service layer pattern implementation
- Better separation of concerns
- Improved API abstractions
Video Capture & Package Optimization
Performance improvements and video capabilities.Features
- Video Capture - Fixed and enhanced video capture functionality
- Type Improvements - Better TypeScript type definitions
- Package Slimming - Significantly reduced package size
Improvements
- Optimized bundle size
- Better video handling
- Improved type safety
Audio & Authentication Improvements
Enhanced audio processing and security features.Features
- Audio Aggregation - Enhanced audio aggregation with timestamp support
- JWT Authentication - Moved ephemeral tokens to JWT authentication
- Screen Timing - Better screen capture timing controls
- Observability - New observability system for monitoring
Technical Improvements
- Better audio timestamp handling
- Improved security with JWT
- Enhanced monitoring capabilities
Initial Package Release
The first official release of Sammy Three as an NPM package.Core Features
- Service Layer Pattern - Abstracted all external API calls through services
- Lifecycle Callbacks - Comprehensive callback system for all major events
- Tool Handlers - Support for custom tool integration
- HTML2Canvas Integration - Screen capture capabilities
- Audio Settings - Configurable audio processing
Architecture
- Clean separation between React components and domain logic
- Service-based architecture for flexibility
- Manager classes for specific domains (Screen, Memory, Tools)
- Default service implementations for quick start
Developer Experience
- Simple provider + hook pattern
- TypeScript support throughout
- Comprehensive documentation
- Backward compatibility with existing implementations