Screen Capture System - SAMMY Labs Docs

The Simple Render Capture System is a streamlined screen capture implementation that captures DOM content every second and sends it to the Gemini AI service. This is the current active implementation used in the sammy-three package.

Performance Optimization

Optimize for Your Use Case: Screen capture performance can be tuned based on your specific needs. The system offers flexible configuration options to balance quality, frequency, and performance.

Performance Tuning Options

import { SammyAgentProvider } from '@sammy-labs/sammy-three';

const config = {
  auth: { /* your auth config */ },
  
  // Screen capture performance optimization
  screenCapture: {
    method: 'render',
    
    // Optimize capture frequency for your needs
    checkInterval: 5000,  // 5 seconds for less frequent updates
    
    // Balance quality and performance
    jpegQuality: 0.3,     // Lower quality for faster processing
    
    // Optimize dimensions for your use case
    maxWidth: 1280,       // Smaller dimensions for better performance
    maxHeight: 720,       // Adjust based on your UI complexity
    
    // Fine-tune DOM observation
    domChangeDetection: {
      debounceMs: 500,    // Optimize mutation debouncing
      observerConfig: {
        subtree: false,   // Focus on direct changes only
        childList: true,  // Monitor content changes
        attributes: false,// Skip attribute changes if not needed
        characterData: false
      }
    }
  }
};

<SammyAgentProvider config={config}>
  <App />
</SammyAgentProvider>

Architecture Overview

System Features

The system uses a simplified approach for reliable screen capture:

Fixed 1-second intervals for capturing (configurable)
domToPng from modern-screenshot library for DOM-to-image conversion
Fallback mechanism when capture times out
Configurable DOM change detection and performance settings
Support for targetElement to capture specific DOM elements

Component Flow

Configuration Flow

Provider Config
Hook Flow

// SammyAgentProvider
config: {
  captureMethod: 'render',     // Use render method (not video)
  debugLogs: true,              // Enable debug logging
  targetElement: '#my-div',     // Optional: specific element to capture
}

// useScreenCapture
{
  method: config?.captureMethod ?? 'render',
  debugLogs: config?.debugLogs ?? false,
  targetElement: config?.targetElement,
}
↓
// useSimpleRenderCapture
{
  ...config,
  scope: config.scope || 'document',
  debugLogs: config.debugLogs ?? true,
}

Capture Element Resolution

The system determines which element to capture using a priority-based resolution system.

Priority Order

targetElement (Highest Priority)

Can be: CSS selector string, HTMLElement, or RefObject<HTMLElement>Examples: '#my-capture-area', document.getElementById('app'), useRef()

scope: 'context'

Uses the contextElementRef from the hookInternal wrapper element managed by sammy-three

scope: 'document' (Default)

Searches for common app containers in order:

#root
#app
[data-testid="app"]
<main>
document.body (last resort)

Resolution Logic

// Resolution logic
if (stableConfig.targetElement) {
  captureElement = resolveTargetElement(stableConfig.targetElement);
} else if (effectiveScope === 'context' && contextElementRef.current) {
  captureElement = contextElementRef.current;
} else {
  // Find first available app container
  const candidates = ['#root', '#app', '[data-testid="app"]', 'main', 'body'];
  for (const candidate of candidates) {
    if (element = document.querySelector(candidate)) {
      captureElement = element;
      break;
    }
  }
}

Capture Process

Interval Setup

// When capturing starts
if (isCapturing && !intervalRef.current && clientRef.current) {
  // Set up 1-second interval
  intervalRef.current = window.setInterval(() => {
    captureAndSend();
  }, 1000);
  
  // Perform initial capture immediately
  captureAndSend();
}

Capture Function Steps

Pre-flight Checks

Verify isCapturing is true
Verify clientRef.current exists
Verify captureElement exists

DOM to PNG Conversion

// Attempt capture with 2-second timeout
const capturePromise = domToPng(captureElement, {
  width: Math.min(captureElement.scrollWidth, 1920),
  height: Math.min(captureElement.scrollHeight, 1080),
});

const timeoutPromise = new Promise((_, reject) =>
  setTimeout(() => reject(new Error('domToPng timeout after 2 seconds')), 2000)
);

dataUrl = await Promise.race([capturePromise, timeoutPromise]);

Fallback Mechanism

When timeout occurs:

// Create fallback canvas
const canvas = document.createElement('canvas');
canvas.width = 800;
canvas.height = 600;
const ctx = canvas.getContext('2d');

// Draw placeholder content
ctx.fillStyle = '#f8f9fa';
ctx.fillRect(0, 0, 800, 600);
ctx.fillText('Screen capture in progress...', 400, 280);
ctx.fillText(`Element: ${captureElement.tagName}`, 400, 310);
ctx.fillText(`Size: ${width}x${height}`, 400, 330);
ctx.fillText(`Timestamp: ${time}`, 400, 350);

dataUrl = canvas.toDataURL('image/png');

Data Transmission

// Extract base64 data (remove data:image/png;base64, prefix)
const base64Data = dataUrl.split(',')[1];

// Send to Gemini
clientRef.current.sendRealtimeInput([
  { mimeType: 'image/png', data: base64Data }
]);

Fallback Canvas

The fallback canvas with “Screen capture in progress…” message appears under these conditions:

domToPng Timeout

Large or complex DOM structure
Heavy CSS animations or transforms
Many external resources
Browser performance issues

domToPng Errors

CORS issues with external resources
Invalid DOM structure
Memory constraints
Browser security restrictions

Element Issues

Element has zero dimensions
Element is hidden or off-screen
Element contains problematic content (iframes, canvas, video)

Fallback Canvas Details

Canvas Specifications

Size: 800x600 pixels
Background: #f8f9fa (light gray)
Text Color: #333 (dark gray)
Font: 16px/12px Arial
Content:
  - "Screen capture in progress..."
  - Element type and ID
  - Original element dimensions
  - Current timestamp

Modern Screenshot Process

Detailed Steps

DOM Cloning

Deep clone with computed styles

Resource Embedding

Inline external resources

SVG Creation

Wrap in ForeignObject SVG

Data URL Conversion

Convert SVG to data URL

Image Loading

Create image from data URL

Canvas Rendering

Draw image to canvas

PNG Export

Export canvas as PNG data URL

Usage Examples

Basic Usage

<SammyAgentProvider
  config={{
    auth: authConfig,
    screenCapture: {
      method: 'render',
      // Default settings - captures every 1 second
    },
    debugLogs: true,
  }}
>
  <App />
</SammyAgentProvider>

With Target Element

<SammyAgentProvider
  config={{
    auth: authConfig,
    captureMethod: 'render',
    targetElement: '#dashboard-content',
    debugLogs: true,
  }}
>
  <div id="dashboard-content">
    {/* This will be captured */}
  </div>
  <div id="sidebar">
    {/* This won't be captured */}
  </div>
</SammyAgentProvider>

Configuration Options

Complete Configuration Reference

method

'render' | 'video'

default:"'render'"

Capture method. Use 'video' to avoid DOM cloning entirely.

checkInterval

number

default:"1000"

How often to capture in milliseconds. Adjust based on your update frequency needs.

jpegQuality

number

default:"0.5"

JPEG quality from 0.0 to 1.0. Balance quality and file size for your use case.

maxWidth

number

default:"1920"

Maximum width for captured images. Optimize based on your UI requirements.

maxHeight

number

default:"1080"

Maximum height for captured images. Adjust for your display needs.

minInterval

number

default:"100"

Minimum time between captures in milliseconds when changes are detected.

maxInterval

number

default:"5000"

Maximum time between captures in milliseconds when no changes are detected.

useHashing

boolean

default:"true"

Enable smart deduplication using image hashing to skip identical frames.

scope

'context' | 'document'

default:"'document'"

Scope of capture. Use 'context' to capture only wrapped content.

domChangeDetection.debounceMs

number

default:"300"

Debounce time after DOM mutations stop before capturing.

domChangeDetection.observerConfig

MutationObserverInit

MutationObserver configuration for detecting DOM changes.

Target Element Options

The targetElement option accepts three types of values:

Type	Example	Description
`string`	`'#my-div'`, `'.capture-area'`	CSS selector string
`HTMLElement`	`document.getElementById('my-div')`	Direct element reference
`RefObject<HTMLElement>`	`useRef<HTMLDivElement>()`	React ref object

Configuration Presets

Performance Focused
Balanced
Quality Focused

Optimized for maximum performance and efficiency

const config = {
  auth: authConfig,
  screenCapture: {
    method: 'video',        // Native browser capture
    checkInterval: 10000,   // 10 seconds for efficiency
    jpegQuality: 0.3,
    maxWidth: 854,
    maxHeight: 480,
    useHashing: true,
    domChangeDetection: {
      debounceMs: 1000,
      observerConfig: {
        subtree: false,
        attributes: false,
        characterData: false
      }
    }
  }
};

Good balance of quality and performance

const config = {
  auth: authConfig,
  screenCapture: {
    method: 'render',
    checkInterval: 3000,    // 3 seconds
    jpegQuality: 0.5,
    maxWidth: 1280,
    maxHeight: 720,
    useHashing: true,
    domChangeDetection: {
      debounceMs: 500,
      observerConfig: {
        subtree: false,
        childList: true,
        attributes: false
      }
    }
  }
};

Optimized for maximum visual quality and detail

const config = {
  auth: authConfig,
  screenCapture: {
    method: 'render',
    checkInterval: 1000,    // 1 second for responsiveness
    jpegQuality: 0.8,
    maxWidth: 1920,
    maxHeight: 1080,
    useHashing: true,
    domChangeDetection: {
      debounceMs: 300,
      observerConfig: {
        subtree: true,
        childList: true,
        attributes: true
      }
    }
  }
};

Key Differences from Complex Render

Feature	Simple Render	Complex Render
Capture Interval	Fixed 1s	Dynamic (100ms-2s)
DOM Change Detection	No	Yes (MutationObserver)
Audio Adaptation	No	Yes (throttling)
Frame Hashing	No	Yes (deduplication)
Critical Renders	No	Yes (conversation events)
Worker Threads	No	Yes (optional)
Performance Mode	Single	Multiple (worker/main)
Capture Method	domToPng only	domToCanvas with fallbacks

Debug Logging

When debugLogs: true, the system provides extensive logging:

🔧 [EXPLICIT-RENDER-abc123] Hook initialized with config
🎯 [EXPLICIT-RENDER-abc123] Resolving targetElement
🎯 [EXPLICIT-RENDER-abc123] ✅ Successfully resolved targetElement
🎬 [EXPLICIT-RENDER] captureAndSend called
🎨 [EXPLICIT-RENDER] Starting domToPng capture...
✅ [EXPLICIT-RENDER] domToPng succeeded
📤 [EXPLICIT-RENDER] Sending to client...
✅ [EXPLICIT-RENDER] Frame sent successfully
📸 [EXPLICIT-RENDER-abc123] Currently capturing: #my-div

Performance Characteristics

Timing Analysis

Capture Frequency

Every 1000ms (1 FPS)

domToPng Timeout

2000ms maximum

Fallback Canvas

~5-10ms creation

Base64 Encoding

~10-20ms

Resource Usage

CPU

Medium (spikes during capture)

Memory

Low-Medium (temporary canvas/image)

Network

~50-200KB per frame (compressed PNG)

UI Impact

Minimal (no worker threads)

Error Handling

Timeout Recovery

if (error.message.includes('timeout')) {
  // Clear and restart interval
  clearInterval(intervalRef.current);
  intervalRef.current = null;
  // Will be restarted by useEffect
}

Client Loss Protection

// Check client exists during interval tick
if (!clientRef.current) {
  console.error('Client lost during interval');
  clearInterval(intervalRef.current);
  return;
}

Capture Element Validation

if (!captureElement || captureElement.offsetHeight === 0) {
  console.log('Capture skipped - invalid element');
  return;
}

Benefits of Target Element Capture

Reduced Size

Capture only the relevant part of your UI

Better Performance

Smaller capture area means faster processing

Focused Context

AI agent sees only the important content

Flexible Integration

Works with any existing DOM structure

Important Notes

Keep these points in mind when using screen capture:

Only works with captureMethod: 'render' (explicit render capture)
The target element must exist in the DOM when capture starts
If the target element is not found, the system falls back to the default scope behavior
Enable debugLogs: true to see which element is being captured in the console

Optimization Guide

Performance Tuning

Optimize for slower devices

When to use: Targeting older devices or complex UIsOptimizations:

Increase checkInterval to 5000ms or higher
Switch to method: 'video' for native browser capture
Reduce maxWidth and maxHeight to match your UI needs
Lower jpegQuality to 0.3 for smaller files
Set domChangeDetection.observerConfig.subtree: false

Optimize capture responsiveness

When to use: Need faster updates for dynamic contentOptimizations:

Reduce domChangeDetection.debounceMs to 200ms
Enable more mutation observer options
Decrease checkInterval for more frequent captures

Optimize bandwidth usage

When to use: Limited bandwidth or high traffic applicationsOptimizations:

Enable useHashing: true for smart deduplication
Increase minInterval to reduce capture frequency
Use restrictive domChangeDetection.observerConfig
Lower jpegQuality to reduce file sizes

Performance Monitoring

// Monitor capture performance in browser DevTools
// Use Performance tab to analyze capture timing

// Enable debug logs to track performance metrics
const config = {
  screenCapture: {
    debugLogs: true
  }
};

// Console output shows capture timing:
// ✅ [EXPLICIT-RENDER] Capture in 45.2ms  <- Good performance
// 📊 [EXPLICIT-RENDER] Capture in 150.3ms <- Consider optimization

Best Practices

Choose the right method - Use video for simplicity, render for DOM-specific features

Profile your specific use case - Use Chrome DevTools to understand your app’s needs

Test across devices - Validate performance on your target device range

Monitor performance metrics - Enable debugLogs to track capture timing

Enable smart optimizations - Use useHashing: true for automatic deduplication

Tune DOM observation - Configure observerConfig based on your UI update patterns

Summary

The Screen Capture system provides a flexible, configurable approach to screen capture that can be tuned for your specific performance and quality requirements. Choose from multiple presets or create custom configurations to match your application’s needs.

Key features:

Flexible intervals from 1-10+ seconds based on your needs
High-quality captures using modern DOM-to-image conversion
Reliable fallback mechanism ensures continuous operation
Targeted capture support for focused UI regions
Performance tuning options for various device capabilities
Multiple configuration presets for common use cases

Get Started

Features

Integrations

​Performance Optimization

​Performance Tuning Options

​Architecture Overview

System Features

​Component Flow

​Configuration Flow

​Capture Element Resolution

​Priority Order

​Resolution Logic

​Capture Process

​Interval Setup

​Capture Function Steps

​Fallback Canvas

domToPng Timeout

domToPng Errors

Element Issues

​Fallback Canvas Details

Canvas Specifications

​Modern Screenshot Process

​Detailed Steps

​Usage Examples

​Basic Usage

​With Target Element

​Configuration Options

​Complete Configuration Reference

​Target Element Options

​Configuration Presets

​Key Differences from Complex Render

​Debug Logging

​Performance Characteristics

​Timing Analysis

Capture Frequency

domToPng Timeout

Fallback Canvas

Base64 Encoding

​Resource Usage

CPU

Memory

Network

UI Impact

​Error Handling

​Timeout Recovery

​Client Loss Protection

​Capture Element Validation

​Benefits of Target Element Capture

Reduced Size

Better Performance

Focused Context

Flexible Integration

​Important Notes

​Optimization Guide

​Performance Tuning

​Performance Monitoring

​Best Practices

​Summary

Performance Optimization

Performance Tuning Options

Architecture Overview

Component Flow

Configuration Flow

Capture Element Resolution

Priority Order

Resolution Logic

Capture Process

Interval Setup

Capture Function Steps

Fallback Canvas

Fallback Canvas Details

Modern Screenshot Process

Detailed Steps

Usage Examples

Basic Usage

With Target Element

Configuration Options

Complete Configuration Reference

Target Element Options

Configuration Presets

Key Differences from Complex Render

Debug Logging

Performance Characteristics

Timing Analysis

Resource Usage

Error Handling

Timeout Recovery

Client Loss Protection

Capture Element Validation

Benefits of Target Element Capture

Important Notes

Optimization Guide

Performance Tuning

Performance Monitoring

Best Practices

Summary