Scene Format

The JSON schema for a single shot — canvas dimensions, duration, asset references, camera directives, and an ordered layer stack — that feeds into the sequence planner and Remotion renderer.

Overview

Scenes can contain HTML elements, video, images, or any combination. Each scene is the equivalent of a single shot in film. Scenes are authored as JSON, validated against the scene format schema, and consumed by the sequence planner and Remotion renderer.

Scene IDs follow the convention sc_{descriptive_name} (e.g., sc_hero_product, sc_brand_mark).

Top-level schema

{
  "scene_id": "sc_hero_product",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.0,
  "assets": [...],
  "camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
  "layout": { "template": "device-mockup", "slots": {...} },
  "layers": [...],
  "metadata": { "content_type": "product_shot", "visual_weight": "dark", ... }
}

Field	Type	Required	Description
`scene_id`	string	Yes	Unique identifier. Pattern: `sc_[a-z0-9_]+`
`canvas`	object	No	Dimensions in pixels. Defaults to 1920x1080
`duration_s`	number	Yes	Hold duration in seconds (0.5–30). Can be overridden by the sequence manifest
`assets`	array	No	External media referenced by layers
`camera`	object	No	Camera directive. Can be overridden by the sequence manifest
`layout`	object	No	Layout template. Layers reference named slots that auto-resolve to pixel positions
`layers`	array	No	Ordered layer stack. First = deepest (background), last = topmost (foreground)
`metadata`	object	No	Content classification. Optional for hand-authored scenes, required for AI-planned sequences

Assets

Assets declare external media that layers reference by ID.

{
  "id": "hero_video",
  "type": "video",
  "src": "assets/hero-reel.mp4",
  "trim": { "start_s": 2.0, "end_s": 8.0 },
  "loop": false,
  "muted": true
}

Field	Type	Required	Description
`id`	string	Yes	Unique within this scene
`type`	`video` / `image` / `audio`	Yes	Media type
`src`	string	Yes	File path (relative to scene) or URL
`trim`	object	No	For video/audio: `start_s` and `end_s` subrange
`loop`	boolean	No	Loop for the scene duration. Default: `false`
`muted`	boolean	No	Mute video audio track. Default: `true`

Camera

Camera directives control movement applied to the entire scene via the <CameraRig> component.

{
  "move": "push_in",
  "intensity": 0.2,
  "easing": "cinematic_scurve"
}

Field	Values	Default	Description
`move`	`static`, `push_in`, `pull_out`, `pan_left`, `pan_right`, `drift`	`static`	Movement type
`intensity`	0.0–1.0	0.5	Movement magnitude. 0 = imperceptible, 1 = maximum
`easing`	`linear`, `ease_out`, `cinematic_scurve`	`cinematic_scurve`	Interpolation curve

Intensity mapping

How intensity translates to actual transforms per move type:

Move	0.0	0.5	1.0
`push_in`	scale 1.0 to 1.005	scale 1.0 to 1.03	scale 1.0 to 1.08
`pull_out`	scale 1.005 to 1.0	scale 1.03 to 1.0	scale 1.08 to 1.0
`pan_left`	translateX 0 to -5px	translateX 0 to -30px	translateX 0 to -80px
`pan_right`	translateX 0 to 5px	translateX 0 to 30px	translateX 0 to 80px
`drift`	±0.2px sinusoidal	±1px sinusoidal	±3px sinusoidal
`static`	No transform	No transform	No transform

Layers

Layers form an ordered stack. Each layer has a type, positioning, and optional animation.

Layer fields

Field	Type	Description
`id`	string	Unique within this scene
`type`	`html` / `video` / `image` / `text` / `svg`	Layer content type
`slot`	string	Layout slot name. Overrides `position` when a layout template is active
`asset`	string	Asset ID reference (for video/image layers)
`src`	string	HTML file path (for html layers)
`content`	string	Text content (required for text layers)
`animation`	`word-reveal` / `scale-cascade` / `weight-morph`	Text animation primitive
`style`	object	Typography styles: `fontFamily`, `fontSize`, `fontWeight`, `color`, `textAlign`, `letterSpacing`, `lineHeight`
`depth_class`	`background` / `midground` / `foreground`	Depth for parallax and camera rig behavior. Default: `midground`
`fit`	`cover` / `contain` / `fill` / `none`	How content fits the canvas. Default: `cover`
`position`	object	`x`, `y`, `w`, `h` in pixels. Omit for full-canvas layers
`opacity`	number	0–1. Default: `1`
`blend_mode`	`normal` / `screen` / `multiply` / `overlay`	Compositing mode. Default: `normal`
`mask_layer`	string	ID of another layer to use as mask source
`mask_type`	`alpha` / `luminance`	Mask compositing mode. Default: `alpha`
`entrance`	object	`primitive` (animation ID from catalog) + `delay_ms`

Weight-morph text

For text layers using weight-morph, two additional style fields control the animation:

fontWeightStart — starting font weight (e.g., 100)
fontWeightEnd — ending font weight (e.g., 900)

Metadata

Content classification used by the scene analyzer and sequence planner. Optional for hand-authored scenes, but required when scenes enter the AI planning pipeline.

Field	Values	Description
`content_type`	`portrait`, `ui_screenshot`, `typography`, `brand_mark`, `data_visualization`, `moodboard`, `product_shot`, `notification`, `device_mockup`, `split_panel`, `collage`	Primary content classification
`visual_weight`	`light`, `dark`, `mixed`	Overall tonal weight
`motion_energy`	`static`, `subtle`, `moderate`, `high`	Amount of movement
`complexity`	`minimal`, `moderate`, `complex`	Visual complexity
`intent_tags`	array of strings	Narrative role: `opening`, `closing`, `hero`, `emotional`, `detail`, `informational`, `transition`
`style_override`	string	Per-scene style pack override for blended sequences

When scenes enter the AI pipeline, analyze_scene re-derives metadata independently and validates it. Hand-setting metadata is fine, but the analyzer has the final word.

Example scenes

HTML-only scene

A full-canvas HTML prototype used as a single shot:

{
  "scene_id": "sc_product_demo",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.5,
  "camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
  "layers": [
    {
      "id": "prototype",
      "type": "html",
      "src": "prototypes/dashboard.html",
      "fit": "cover",
      "depth_class": "midground"
    }
  ],
  "metadata": {
    "content_type": "ui_screenshot",
    "visual_weight": "dark",
    "motion_energy": "subtle",
    "intent_tags": ["detail"]
  }
}

Mixed media scene

Video background with composited text overlay:

{
  "scene_id": "sc_hero_intro",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 4.0,
  "assets": [
    {
      "id": "bg_video",
      "type": "video",
      "src": "assets/aerial-city.mp4",
      "trim": { "start_s": 5.0, "end_s": 9.0 },
      "muted": true
    }
  ],
  "camera": { "move": "drift", "intensity": 0.3 },
  "layers": [
    {
      "id": "background",
      "type": "video",
      "asset": "bg_video",
      "fit": "cover",
      "depth_class": "background"
    },
    {
      "id": "headline",
      "type": "text",
      "content": "The Future of Design",
      "animation": "word-reveal",
      "style": {
        "fontFamily": "Inter",
        "fontSize": 96,
        "fontWeight": 700,
        "color": "#ffffff",
        "textAlign": "center"
      },
      "depth_class": "foreground",
      "entrance": { "primitive": "as-fadeInUp", "delay_ms": 400 }
    }
  ],
  "metadata": {
    "content_type": "typography",
    "visual_weight": "dark",
    "motion_energy": "moderate",
    "intent_tags": ["opening", "hero"]
  }
}

Kinetic type scene

Weight-morphing typography with a brand mark:

{
  "scene_id": "sc_brand_type",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.0,
  "assets": [
    { "id": "logo", "type": "image", "src": "assets/logo-mark.svg" }
  ],
  "camera": { "move": "static" },
  "layers": [
    {
      "id": "bg",
      "type": "html",
      "src": "prototypes/dark-gradient.html",
      "fit": "cover",
      "depth_class": "background"
    },
    {
      "id": "brand_text",
      "type": "text",
      "content": "ANIMATIC",
      "animation": "weight-morph",
      "style": {
        "fontFamily": "Inter",
        "fontSize": 120,
        "fontWeightStart": 100,
        "fontWeightEnd": 900,
        "color": "#ffffff",
        "textAlign": "center",
        "letterSpacing": "0.08em"
      },
      "depth_class": "foreground"
    },
    {
      "id": "logo_mark",
      "type": "image",
      "asset": "logo",
      "position": { "x": 860, "y": 620, "w": 200, "h": 200 },
      "opacity": 0.8,
      "depth_class": "foreground",
      "entrance": { "primitive": "as-fadeIn", "delay_ms": 800 }
    }
  ],
  "metadata": {
    "content_type": "brand_mark",
    "visual_weight": "dark",
    "motion_energy": "moderate",
    "intent_tags": ["closing"]
  }
}

Try it

Try asking your AI:

Show me the scene format schema and explain the layer types.

Create a scene JSON for a split-panel layout with a portrait photo on the left and a quote on the right.

Validate this scene JSON against the scene format spec.