Skip to main content
The Simple Render Capture System is a streamlined screen capture implementation that captures DOM content every second and sends it to the Gemini AI service. This is the current active implementation used in the sammy-three package.

Performance Optimization

Optimize for Your Use Case: Screen capture performance can be tuned based on your specific needs. The system offers flexible configuration options to balance quality, frequency, and performance.

Performance Tuning Options

import { SammyAgentProvider } from '@sammy-labs/sammy-three';

const config = {
  auth: { /* your auth config */ },
  
  // Screen capture performance optimization
  screenCapture: {
    method: 'render',
    
    // Optimize capture frequency for your needs
    checkInterval: 5000,  // 5 seconds for less frequent updates
    
    // Balance quality and performance
    jpegQuality: 0.3,     // Lower quality for faster processing
    
    // Optimize dimensions for your use case
    maxWidth: 1280,       // Smaller dimensions for better performance
    maxHeight: 720,       // Adjust based on your UI complexity
    
    // Fine-tune DOM observation
    domChangeDetection: {
      debounceMs: 500,    // Optimize mutation debouncing
      observerConfig: {
        subtree: false,   // Focus on direct changes only
        childList: true,  // Monitor content changes
        attributes: false,// Skip attribute changes if not needed
        characterData: false
      }
    }
  }
};

<SammyAgentProvider config={config}>
  <App />
</SammyAgentProvider>

Architecture Overview

System Features

The system uses a simplified approach for reliable screen capture:
  • Fixed 1-second intervals for capturing (configurable)
  • domToPng from modern-screenshot library for DOM-to-image conversion
  • Fallback mechanism when capture times out
  • Configurable DOM change detection and performance settings
  • Support for targetElement to capture specific DOM elements

Component Flow

Configuration Flow

  • Provider Config
  • Hook Flow
// SammyAgentProvider
config: {
  captureMethod: 'render',     // Use render method (not video)
  debugLogs: true,              // Enable debug logging
  targetElement: '#my-div',     // Optional: specific element to capture
}

Capture Element Resolution

The system determines which element to capture using a priority-based resolution system.

Priority Order

1

targetElement (Highest Priority)

Can be: CSS selector string, HTMLElement, or RefObject<HTMLElement>Examples: '#my-capture-area', document.getElementById('app'), useRef()
2

scope: 'context'

Uses the contextElementRef from the hookInternal wrapper element managed by sammy-three
3

scope: 'document' (Default)

Searches for common app containers in order:
  • #root
  • #app
  • [data-testid="app"]
  • <main>
  • document.body (last resort)

Resolution Logic

// Resolution logic
if (stableConfig.targetElement) {
  captureElement = resolveTargetElement(stableConfig.targetElement);
} else if (effectiveScope === 'context' && contextElementRef.current) {
  captureElement = contextElementRef.current;
} else {
  // Find first available app container
  const candidates = ['#root', '#app', '[data-testid="app"]', 'main', 'body'];
  for (const candidate of candidates) {
    if (element = document.querySelector(candidate)) {
      captureElement = element;
      break;
    }
  }
}

Capture Process

Interval Setup

// When capturing starts
if (isCapturing && !intervalRef.current && clientRef.current) {
  // Set up 1-second interval
  intervalRef.current = window.setInterval(() => {
    captureAndSend();
  }, 1000);
  
  // Perform initial capture immediately
  captureAndSend();
}

Capture Function Steps

1

Pre-flight Checks

  • Verify isCapturing is true
  • Verify clientRef.current exists
  • Verify captureElement exists
2

DOM to PNG Conversion

// Attempt capture with 2-second timeout
const capturePromise = domToPng(captureElement, {
  width: Math.min(captureElement.scrollWidth, 1920),
  height: Math.min(captureElement.scrollHeight, 1080),
});

const timeoutPromise = new Promise((_, reject) =>
  setTimeout(() => reject(new Error('domToPng timeout after 2 seconds')), 2000)
);

dataUrl = await Promise.race([capturePromise, timeoutPromise]);
3

Fallback Mechanism

When timeout occurs:
// Create fallback canvas
const canvas = document.createElement('canvas');
canvas.width = 800;
canvas.height = 600;
const ctx = canvas.getContext('2d');

// Draw placeholder content
ctx.fillStyle = '#f8f9fa';
ctx.fillRect(0, 0, 800, 600);
ctx.fillText('Screen capture in progress...', 400, 280);
ctx.fillText(`Element: ${captureElement.tagName}`, 400, 310);
ctx.fillText(`Size: ${width}x${height}`, 400, 330);
ctx.fillText(`Timestamp: ${time}`, 400, 350);

dataUrl = canvas.toDataURL('image/png');
4

Data Transmission

// Extract base64 data (remove data:image/png;base64, prefix)
const base64Data = dataUrl.split(',')[1];

// Send to Gemini
clientRef.current.sendRealtimeInput([
  { mimeType: 'image/png', data: base64Data }
]);

Fallback Canvas

The fallback canvas with β€œScreen capture in progress…” message appears under these conditions:

domToPng Timeout

  • Large or complex DOM structure
  • Heavy CSS animations or transforms
  • Many external resources
  • Browser performance issues

domToPng Errors

  • CORS issues with external resources
  • Invalid DOM structure
  • Memory constraints
  • Browser security restrictions

Element Issues

  • Element has zero dimensions
  • Element is hidden or off-screen
  • Element contains problematic content (iframes, canvas, video)

Fallback Canvas Details

Canvas Specifications

Size: 800x600 pixels
Background: #f8f9fa (light gray)
Text Color: #333 (dark gray)
Font: 16px/12px Arial
Content:
  - "Screen capture in progress..."
  - Element type and ID
  - Original element dimensions
  - Current timestamp

Modern Screenshot Process

Detailed Steps

1

DOM Cloning

Deep clone with computed styles
2

Resource Embedding

Inline external resources
3

SVG Creation

Wrap in ForeignObject SVG
4

Data URL Conversion

Convert SVG to data URL
5

Image Loading

Create image from data URL
6

Canvas Rendering

Draw image to canvas
7

PNG Export

Export canvas as PNG data URL

Usage Examples

Basic Usage

<SammyAgentProvider
  config={{
    auth: authConfig,
    screenCapture: {
      method: 'render',
      // Default settings - captures every 1 second
    },
    debugLogs: true,
  }}
>
  <App />
</SammyAgentProvider>

With Target Element

<SammyAgentProvider
  config={{
    auth: authConfig,
    captureMethod: 'render',
    targetElement: '#dashboard-content',
    debugLogs: true,
  }}
>
  <div id="dashboard-content">
    {/* This will be captured */}
  </div>
  <div id="sidebar">
    {/* This won't be captured */}
  </div>
</SammyAgentProvider>

Configuration Options

Complete Configuration Reference

method
'render' | 'video'
default:"'render'"
Capture method. Use 'video' to avoid DOM cloning entirely.
checkInterval
number
default:"1000"
How often to capture in milliseconds. Adjust based on your update frequency needs.
jpegQuality
number
default:"0.5"
JPEG quality from 0.0 to 1.0. Balance quality and file size for your use case.
maxWidth
number
default:"1920"
Maximum width for captured images. Optimize based on your UI requirements.
maxHeight
number
default:"1080"
Maximum height for captured images. Adjust for your display needs.
minInterval
number
default:"100"
Minimum time between captures in milliseconds when changes are detected.
maxInterval
number
default:"5000"
Maximum time between captures in milliseconds when no changes are detected.
useHashing
boolean
default:"true"
Enable smart deduplication using image hashing to skip identical frames.
scope
'context' | 'document'
default:"'document'"
Scope of capture. Use 'context' to capture only wrapped content.
domChangeDetection.debounceMs
number
default:"300"
Debounce time after DOM mutations stop before capturing.
domChangeDetection.observerConfig
MutationObserverInit
MutationObserver configuration for detecting DOM changes.

Target Element Options

The targetElement option accepts three types of values:
TypeExampleDescription
string'#my-div', '.capture-area'CSS selector string
HTMLElementdocument.getElementById('my-div')Direct element reference
RefObject<HTMLElement>useRef<HTMLDivElement>()React ref object

Configuration Presets

  • Performance Focused
  • Balanced
  • Quality Focused
Optimized for maximum performance and efficiency
const config = {
  auth: authConfig,
  screenCapture: {
    method: 'video',        // Native browser capture
    checkInterval: 10000,   // 10 seconds for efficiency
    jpegQuality: 0.3,
    maxWidth: 854,
    maxHeight: 480,
    useHashing: true,
    domChangeDetection: {
      debounceMs: 1000,
      observerConfig: {
        subtree: false,
        attributes: false,
        characterData: false
      }
    }
  }
};

Key Differences from Complex Render

FeatureSimple RenderComplex Render
Capture IntervalFixed 1sDynamic (100ms-2s)
DOM Change DetectionNoYes (MutationObserver)
Audio AdaptationNoYes (throttling)
Frame HashingNoYes (deduplication)
Critical RendersNoYes (conversation events)
Worker ThreadsNoYes (optional)
Performance ModeSingleMultiple (worker/main)
Capture MethoddomToPng onlydomToCanvas with fallbacks

Debug Logging

When debugLogs: true, the system provides extensive logging:
πŸ”§ [EXPLICIT-RENDER-abc123] Hook initialized with config
🎯 [EXPLICIT-RENDER-abc123] Resolving targetElement
🎯 [EXPLICIT-RENDER-abc123] βœ… Successfully resolved targetElement
🎬 [EXPLICIT-RENDER] captureAndSend called
🎨 [EXPLICIT-RENDER] Starting domToPng capture...
βœ… [EXPLICIT-RENDER] domToPng succeeded
πŸ“€ [EXPLICIT-RENDER] Sending to client...
βœ… [EXPLICIT-RENDER] Frame sent successfully
πŸ“Έ [EXPLICIT-RENDER-abc123] Currently capturing: #my-div

Performance Characteristics

Timing Analysis

Capture Frequency

Every 1000ms (1 FPS)

domToPng Timeout

2000ms maximum

Fallback Canvas

~5-10ms creation

Base64 Encoding

~10-20ms

Resource Usage

CPU

Medium (spikes during capture)

Memory

Low-Medium (temporary canvas/image)

Network

~50-200KB per frame (compressed PNG)

UI Impact

Minimal (no worker threads)

Error Handling

Timeout Recovery

if (error.message.includes('timeout')) {
  // Clear and restart interval
  clearInterval(intervalRef.current);
  intervalRef.current = null;
  // Will be restarted by useEffect
}

Client Loss Protection

// Check client exists during interval tick
if (!clientRef.current) {
  console.error('Client lost during interval');
  clearInterval(intervalRef.current);
  return;
}

Capture Element Validation

if (!captureElement || captureElement.offsetHeight === 0) {
  console.log('Capture skipped - invalid element');
  return;
}

Benefits of Target Element Capture

Reduced Size

Capture only the relevant part of your UI

Better Performance

Smaller capture area means faster processing

Focused Context

AI agent sees only the important content

Flexible Integration

Works with any existing DOM structure

Important Notes

Keep these points in mind when using screen capture:
  • Only works with captureMethod: 'render' (explicit render capture)
  • The target element must exist in the DOM when capture starts
  • If the target element is not found, the system falls back to the default scope behavior
  • Enable debugLogs: true to see which element is being captured in the console

Optimization Guide

Performance Tuning

When to use: Targeting older devices or complex UIsOptimizations:
  1. Increase checkInterval to 5000ms or higher
  2. Switch to method: 'video' for native browser capture
  3. Reduce maxWidth and maxHeight to match your UI needs
  4. Lower jpegQuality to 0.3 for smaller files
  5. Set domChangeDetection.observerConfig.subtree: false
When to use: Need faster updates for dynamic contentOptimizations:
  1. Reduce domChangeDetection.debounceMs to 200ms
  2. Enable more mutation observer options
  3. Decrease checkInterval for more frequent captures
When to use: Limited bandwidth or high traffic applicationsOptimizations:
  1. Enable useHashing: true for smart deduplication
  2. Increase minInterval to reduce capture frequency
  3. Use restrictive domChangeDetection.observerConfig
  4. Lower jpegQuality to reduce file sizes

Performance Monitoring

// Monitor capture performance in browser DevTools
// Use Performance tab to analyze capture timing

// Enable debug logs to track performance metrics
const config = {
  screenCapture: {
    debugLogs: true
  }
};

// Console output shows capture timing:
// βœ… [EXPLICIT-RENDER] Capture in 45.2ms  <- Good performance
// πŸ“Š [EXPLICIT-RENDER] Capture in 150.3ms <- Consider optimization

Best Practices

Choose the right method - Use video for simplicity, render for DOM-specific features
Profile your specific use case - Use Chrome DevTools to understand your app’s needs
Test across devices - Validate performance on your target device range
Monitor performance metrics - Enable debugLogs to track capture timing
Enable smart optimizations - Use useHashing: true for automatic deduplication
Tune DOM observation - Configure observerConfig based on your UI update patterns

Summary

The Screen Capture system provides a flexible, configurable approach to screen capture that can be tuned for your specific performance and quality requirements. Choose from multiple presets or create custom configurations to match your application’s needs.
Key features:
  • Flexible intervals from 1-10+ seconds based on your needs
  • High-quality captures using modern DOM-to-image conversion
  • Reliable fallback mechanism ensures continuous operation
  • Targeted capture support for focused UI regions
  • Performance tuning options for various device capabilities
  • Multiple configuration presets for common use cases
⌘I