Scene Format
The JSON schema for a single shot — canvas dimensions, duration, asset references, camera directives, and an ordered layer stack — that feeds into the sequence planner and Remotion renderer.
Overview
Scenes can contain HTML elements, video, images, or any combination. Each scene is the equivalent of a single shot in film. Scenes are authored as JSON, validated against the scene format schema, and consumed by the sequence planner and Remotion renderer.
Scene IDs follow the convention sc_{descriptive_name} (e.g., sc_hero_product, sc_brand_mark).
Top-level schema
{
"scene_id": "sc_hero_product",
"canvas": { "w": 1920, "h": 1080 },
"duration_s": 3.0,
"assets": [...],
"camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
"layout": { "template": "device-mockup", "slots": {...} },
"layers": [...],
"metadata": { "content_type": "product_shot", "visual_weight": "dark", ... }
}
| Field | Type | Required | Description |
|---|---|---|---|
scene_id | string | Yes | Unique identifier. Pattern: sc_[a-z0-9_]+ |
canvas | object | No | Dimensions in pixels. Defaults to 1920x1080 |
duration_s | number | Yes | Hold duration in seconds (0.5–30). Can be overridden by the sequence manifest |
assets | array | No | External media referenced by layers |
camera | object | No | Camera directive. Can be overridden by the sequence manifest |
layout | object | No | Layout template. Layers reference named slots that auto-resolve to pixel positions |
layers | array | No | Ordered layer stack. First = deepest (background), last = topmost (foreground) |
metadata | object | No | Content classification. Optional for hand-authored scenes, required for AI-planned sequences |
Assets
Assets declare external media that layers reference by ID.
{
"id": "hero_video",
"type": "video",
"src": "assets/hero-reel.mp4",
"trim": { "start_s": 2.0, "end_s": 8.0 },
"loop": false,
"muted": true
}
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Unique within this scene |
type | video / image / audio | Yes | Media type |
src | string | Yes | File path (relative to scene) or URL |
trim | object | No | For video/audio: start_s and end_s subrange |
loop | boolean | No | Loop for the scene duration. Default: false |
muted | boolean | No | Mute video audio track. Default: true |
Camera
Camera directives control movement applied to the entire scene via the <CameraRig> component.
{
"move": "push_in",
"intensity": 0.2,
"easing": "cinematic_scurve"
}
| Field | Values | Default | Description |
|---|---|---|---|
move | static, push_in, pull_out, pan_left, pan_right, drift | static | Movement type |
intensity | 0.0–1.0 | 0.5 | Movement magnitude. 0 = imperceptible, 1 = maximum |
easing | linear, ease_out, cinematic_scurve | cinematic_scurve | Interpolation curve |
Intensity mapping
How intensity translates to actual transforms per move type:
| Move | 0.0 | 0.5 | 1.0 |
|---|---|---|---|
push_in | scale 1.0 to 1.005 | scale 1.0 to 1.03 | scale 1.0 to 1.08 |
pull_out | scale 1.005 to 1.0 | scale 1.03 to 1.0 | scale 1.08 to 1.0 |
pan_left | translateX 0 to -5px | translateX 0 to -30px | translateX 0 to -80px |
pan_right | translateX 0 to 5px | translateX 0 to 30px | translateX 0 to 80px |
drift | ±0.2px sinusoidal | ±1px sinusoidal | ±3px sinusoidal |
static | No transform | No transform | No transform |
Layers
Layers form an ordered stack. Each layer has a type, positioning, and optional animation.
Layer fields
| Field | Type | Description |
|---|---|---|
id | string | Unique within this scene |
type | html / video / image / text / svg | Layer content type |
slot | string | Layout slot name. Overrides position when a layout template is active |
asset | string | Asset ID reference (for video/image layers) |
src | string | HTML file path (for html layers) |
content | string | Text content (required for text layers) |
animation | word-reveal / scale-cascade / weight-morph | Text animation primitive |
style | object | Typography styles: fontFamily, fontSize, fontWeight, color, textAlign, letterSpacing, lineHeight |
depth_class | background / midground / foreground | Depth for parallax and camera rig behavior. Default: midground |
fit | cover / contain / fill / none | How content fits the canvas. Default: cover |
position | object | x, y, w, h in pixels. Omit for full-canvas layers |
opacity | number | 0–1. Default: 1 |
blend_mode | normal / screen / multiply / overlay | Compositing mode. Default: normal |
mask_layer | string | ID of another layer to use as mask source |
mask_type | alpha / luminance | Mask compositing mode. Default: alpha |
entrance | object | primitive (animation ID from catalog) + delay_ms |
Weight-morph text
For text layers using weight-morph, two additional style fields control the animation:
fontWeightStart— starting font weight (e.g., 100)fontWeightEnd— ending font weight (e.g., 900)
Metadata
Content classification used by the scene analyzer and sequence planner. Optional for hand-authored scenes, but required when scenes enter the AI planning pipeline.
| Field | Values | Description |
|---|---|---|
content_type | portrait, ui_screenshot, typography, brand_mark, data_visualization, moodboard, product_shot, notification, device_mockup, split_panel, collage | Primary content classification |
visual_weight | light, dark, mixed | Overall tonal weight |
motion_energy | static, subtle, moderate, high | Amount of movement |
complexity | minimal, moderate, complex | Visual complexity |
intent_tags | array of strings | Narrative role: opening, closing, hero, emotional, detail, informational, transition |
style_override | string | Per-scene style pack override for blended sequences |
When scenes enter the AI pipeline, analyze_scene re-derives metadata independently and validates it. Hand-setting metadata is fine, but the analyzer has the final word.
Example scenes
HTML-only scene
A full-canvas HTML prototype used as a single shot:
{
"scene_id": "sc_product_demo",
"canvas": { "w": 1920, "h": 1080 },
"duration_s": 3.5,
"camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
"layers": [
{
"id": "prototype",
"type": "html",
"src": "prototypes/dashboard.html",
"fit": "cover",
"depth_class": "midground"
}
],
"metadata": {
"content_type": "ui_screenshot",
"visual_weight": "dark",
"motion_energy": "subtle",
"intent_tags": ["detail"]
}
}
Mixed media scene
Video background with composited text overlay:
{
"scene_id": "sc_hero_intro",
"canvas": { "w": 1920, "h": 1080 },
"duration_s": 4.0,
"assets": [
{
"id": "bg_video",
"type": "video",
"src": "assets/aerial-city.mp4",
"trim": { "start_s": 5.0, "end_s": 9.0 },
"muted": true
}
],
"camera": { "move": "drift", "intensity": 0.3 },
"layers": [
{
"id": "background",
"type": "video",
"asset": "bg_video",
"fit": "cover",
"depth_class": "background"
},
{
"id": "headline",
"type": "text",
"content": "The Future of Design",
"animation": "word-reveal",
"style": {
"fontFamily": "Inter",
"fontSize": 96,
"fontWeight": 700,
"color": "#ffffff",
"textAlign": "center"
},
"depth_class": "foreground",
"entrance": { "primitive": "as-fadeInUp", "delay_ms": 400 }
}
],
"metadata": {
"content_type": "typography",
"visual_weight": "dark",
"motion_energy": "moderate",
"intent_tags": ["opening", "hero"]
}
}
Kinetic type scene
Weight-morphing typography with a brand mark:
{
"scene_id": "sc_brand_type",
"canvas": { "w": 1920, "h": 1080 },
"duration_s": 3.0,
"assets": [
{ "id": "logo", "type": "image", "src": "assets/logo-mark.svg" }
],
"camera": { "move": "static" },
"layers": [
{
"id": "bg",
"type": "html",
"src": "prototypes/dark-gradient.html",
"fit": "cover",
"depth_class": "background"
},
{
"id": "brand_text",
"type": "text",
"content": "ANIMATIC",
"animation": "weight-morph",
"style": {
"fontFamily": "Inter",
"fontSize": 120,
"fontWeightStart": 100,
"fontWeightEnd": 900,
"color": "#ffffff",
"textAlign": "center",
"letterSpacing": "0.08em"
},
"depth_class": "foreground"
},
{
"id": "logo_mark",
"type": "image",
"asset": "logo",
"position": { "x": 860, "y": 620, "w": 200, "h": 200 },
"opacity": 0.8,
"depth_class": "foreground",
"entrance": { "primitive": "as-fadeIn", "delay_ms": 800 }
}
],
"metadata": {
"content_type": "brand_mark",
"visual_weight": "dark",
"motion_energy": "moderate",
"intent_tags": ["closing"]
}
}
Try it
Try asking your AI:
Show me the scene format schema and explain the layer types.
Create a scene JSON for a split-panel layout with a portrait photo on the left and a quote on the right.
Validate this scene JSON against the scene format spec.