Text Mode & Tour System
The Text Mode agent provides a lightweight, text-only alternative to the voice agent with an innovative Tour System that can visually guide users through web interfaces by highlighting DOM elements and providing contextual instructions.Text mode is perfect for environments where audio isn’t suitable, users prefer typing, or you need lower bandwidth requirements.
Quick Start
Enable text mode in your Sammy Agent by setting themode
prop:
Basic Text Mode Setup
Text Mode Features
Streaming Responses
Text mode provides real-time streaming of agent responses through thestreamingMessage
property:
Streaming Messages
Starting a Text Session
Starting Text Agent
Message Handling
Text mode provides a simple interface for sending and receiving messages:Message Management
Tour System
The Tour System enables agents to create interactive guides by highlighting elements on the page and providing step-by-step instructions.Tours are perfect for onboarding, troubleshooting, form assistance, and interactive tutorials.
How Tours Work
- Agent identifies elements - The agent receives DOM context with numbered interactive elements
- Highlights target - Purple border appears around the selected element
- Shows instructions - Popover displays contextual guidance
- Waits for user - User performs action or asks questions
- Proceeds to next step - Process repeats for multi-step workflows
Tour Tool Usage
The agent can trigger tours using the built-intour()
tool:
Tour Tool Schema
Tour Callbacks
Set up callbacks to handle tour interactions:Tour Callbacks
Manual Tour Creation
You can also create tours programmatically:Manual Tour
Advanced Usage
Direct Core Usage
For advanced scenarios, use the text agent core directly:Text Agent Core
Context Injection
Text mode supports all context injection features:Context Management
Screen Capture
Text mode can capture screenshots for visual context:Screen Capture Config
Configuration Reference
Provider Props
Selects between text-only or voice agent modes
Enables the tour tool for interactive guidance (text mode only)
Current streaming agent response (available via useSammyAgent hook)
Text Start Options
The agent mode (typically AgentMode.USER)
Identifies the source of the conversation for analytics
Optional guide ID for guided experiences
Tour Configuration
The DOM element to highlight
Short title displayed in the popover header
Detailed instruction or description for the user
Tour UI Components
The tour system includes several UI elements:Highlight Border
- Purple border (#8b5cf6) around target element
- 8px padding for visibility
- Dark overlay on rest of page
- Smooth transitions when moving between elements
Instruction Popover
- Smart positioning - Automatically positions to avoid viewport edges
- Arrow indicator - Points to highlighted element
- Chat interface - Users can ask questions directly
- Response display - Shows agent responses in real-time
User Interactions
- Click detection - Tracks when users click highlighted elements
- Cancellation - Detects when users click outside to cancel
- Chat input - Allows questions at any step
- Progress tracking - Maintains state across steps
Best Practices
For Text Mode
1
Choose the right mode
Use text mode when:
- Audio is not suitable (quiet environments, accessibility)
- Users prefer typing over speaking
- Lower bandwidth is required
- You need precise text logs for compliance
2
Handle streaming properly
- Show typing indicators during streaming
- Update UI smoothly as text arrives
- Handle partial responses gracefully
3
Manage conversation history
- Keep reasonable history limits (e.g., last 50 messages)
- Persist important conversations
- Clear history on mode switches
For Tours
1
Write clear instructions
- Use imperative language (“Click the Submit button”)
- Be specific about what users should do
- Keep instructions concise and actionable
2
Handle dynamic content
- Wait for elements to load before highlighting
- Gracefully handle missing elements
- Provide fallback text instructions
3
Test thoroughly
- Verify tours work on different screen sizes
- Test with various page layouts
- Ensure z-index doesn’t conflict with your app
Examples
Complete Chat Implementation
Tour-Enabled Form Assistant
Form Assistant with Tours
Troubleshooting
Common issues and their solutions
Tour not highlighting elements
- Check DOM context: Ensure elements are numbered in context
- Wait for render: Elements must exist before highlighting
- Verify z-index: Tour overlays need high z-index (999999+)
Streaming messages not appearing
- Check connection: Ensure agent is connected before sending
- Verify mode: Streaming only works in text mode
- Handle errors: Check onError callback for issues
Chat messages mixed with tour messages
- Filter by source: Check
source
field in onTurnComplete - Separate handlers: Use different callbacks for chat vs tour
Migration Guide
From Voice to Text Mode
Migration Example
Adding Tours to Existing Text Implementation
Adding Tours
Performance
Text mode offers several performance advantages:Lower Memory Usage
~30% less memory than voice mode due to no audio buffers
Reduced Bandwidth
Text-only WebSocket uses < 1KB per message typically
Minimal CPU
No audio processing overhead, just text handling
Fast Response
< 200ms message round-trip typical latency
Summary
Text mode with the Tour System provides a powerful combination for creating accessible, bandwidth-efficient conversational experiences with visual guidance capabilities. It’s ideal for:- Customer support - Guide users through troubleshooting
- Onboarding - Interactive product tours
- Form assistance - Step-by-step form completion
- Training - Interactive tutorials and learning paths
- Accessibility - Text-based alternative for all users