Scene Format

The JSON schema for a single shot — canvas dimensions, duration, asset references, camera directives, and an ordered layer stack — that feeds into the sequence planner and Remotion renderer.

Overview

Scenes can contain HTML elements, video, images, or any combination. Each scene is the equivalent of a single shot in film. Scenes are authored as JSON, validated against the scene format schema, and consumed by the sequence planner and Remotion renderer.

Scene IDs follow the convention sc_{descriptive_name} (e.g., sc_hero_product, sc_brand_mark).

Top-level schema

{
  "scene_id": "sc_hero_product",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.0,
  "assets": [...],
  "camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
  "layout": { "template": "device-mockup", "slots": {...} },
  "layers": [...],
  "metadata": { "content_type": "product_shot", "visual_weight": "dark", ... }
}
FieldTypeRequiredDescription
scene_idstringYesUnique identifier. Pattern: sc_[a-z0-9_]+
canvasobjectNoDimensions in pixels. Defaults to 1920x1080
duration_snumberYesHold duration in seconds (0.5–30). Can be overridden by the sequence manifest
assetsarrayNoExternal media referenced by layers
cameraobjectNoCamera directive. Can be overridden by the sequence manifest
layoutobjectNoLayout template. Layers reference named slots that auto-resolve to pixel positions
layersarrayNoOrdered layer stack. First = deepest (background), last = topmost (foreground)
metadataobjectNoContent classification. Optional for hand-authored scenes, required for AI-planned sequences

Assets

Assets declare external media that layers reference by ID.

{
  "id": "hero_video",
  "type": "video",
  "src": "assets/hero-reel.mp4",
  "trim": { "start_s": 2.0, "end_s": 8.0 },
  "loop": false,
  "muted": true
}
FieldTypeRequiredDescription
idstringYesUnique within this scene
typevideo / image / audioYesMedia type
srcstringYesFile path (relative to scene) or URL
trimobjectNoFor video/audio: start_s and end_s subrange
loopbooleanNoLoop for the scene duration. Default: false
mutedbooleanNoMute video audio track. Default: true

Camera

Camera directives control movement applied to the entire scene via the <CameraRig> component.

{
  "move": "push_in",
  "intensity": 0.2,
  "easing": "cinematic_scurve"
}
FieldValuesDefaultDescription
movestatic, push_in, pull_out, pan_left, pan_right, driftstaticMovement type
intensity0.0–1.00.5Movement magnitude. 0 = imperceptible, 1 = maximum
easinglinear, ease_out, cinematic_scurvecinematic_scurveInterpolation curve

Intensity mapping

How intensity translates to actual transforms per move type:

Move0.00.51.0
push_inscale 1.0 to 1.005scale 1.0 to 1.03scale 1.0 to 1.08
pull_outscale 1.005 to 1.0scale 1.03 to 1.0scale 1.08 to 1.0
pan_lefttranslateX 0 to -5pxtranslateX 0 to -30pxtranslateX 0 to -80px
pan_righttranslateX 0 to 5pxtranslateX 0 to 30pxtranslateX 0 to 80px
drift±0.2px sinusoidal±1px sinusoidal±3px sinusoidal
staticNo transformNo transformNo transform

Layers

Layers form an ordered stack. Each layer has a type, positioning, and optional animation.

Layer fields

FieldTypeDescription
idstringUnique within this scene
typehtml / video / image / text / svgLayer content type
slotstringLayout slot name. Overrides position when a layout template is active
assetstringAsset ID reference (for video/image layers)
srcstringHTML file path (for html layers)
contentstringText content (required for text layers)
animationword-reveal / scale-cascade / weight-morphText animation primitive
styleobjectTypography styles: fontFamily, fontSize, fontWeight, color, textAlign, letterSpacing, lineHeight
depth_classbackground / midground / foregroundDepth for parallax and camera rig behavior. Default: midground
fitcover / contain / fill / noneHow content fits the canvas. Default: cover
positionobjectx, y, w, h in pixels. Omit for full-canvas layers
opacitynumber0–1. Default: 1
blend_modenormal / screen / multiply / overlayCompositing mode. Default: normal
mask_layerstringID of another layer to use as mask source
mask_typealpha / luminanceMask compositing mode. Default: alpha
entranceobjectprimitive (animation ID from catalog) + delay_ms

Weight-morph text

For text layers using weight-morph, two additional style fields control the animation:

  • fontWeightStart — starting font weight (e.g., 100)
  • fontWeightEnd — ending font weight (e.g., 900)

Metadata

Content classification used by the scene analyzer and sequence planner. Optional for hand-authored scenes, but required when scenes enter the AI planning pipeline.

FieldValuesDescription
content_typeportrait, ui_screenshot, typography, brand_mark, data_visualization, moodboard, product_shot, notification, device_mockup, split_panel, collagePrimary content classification
visual_weightlight, dark, mixedOverall tonal weight
motion_energystatic, subtle, moderate, highAmount of movement
complexityminimal, moderate, complexVisual complexity
intent_tagsarray of stringsNarrative role: opening, closing, hero, emotional, detail, informational, transition
style_overridestringPer-scene style pack override for blended sequences

Example scenes

HTML-only scene

A full-canvas HTML prototype used as a single shot:

{
  "scene_id": "sc_product_demo",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.5,
  "camera": { "move": "push_in", "intensity": 0.2, "easing": "cinematic_scurve" },
  "layers": [
    {
      "id": "prototype",
      "type": "html",
      "src": "prototypes/dashboard.html",
      "fit": "cover",
      "depth_class": "midground"
    }
  ],
  "metadata": {
    "content_type": "ui_screenshot",
    "visual_weight": "dark",
    "motion_energy": "subtle",
    "intent_tags": ["detail"]
  }
}

Mixed media scene

Video background with composited text overlay:

{
  "scene_id": "sc_hero_intro",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 4.0,
  "assets": [
    {
      "id": "bg_video",
      "type": "video",
      "src": "assets/aerial-city.mp4",
      "trim": { "start_s": 5.0, "end_s": 9.0 },
      "muted": true
    }
  ],
  "camera": { "move": "drift", "intensity": 0.3 },
  "layers": [
    {
      "id": "background",
      "type": "video",
      "asset": "bg_video",
      "fit": "cover",
      "depth_class": "background"
    },
    {
      "id": "headline",
      "type": "text",
      "content": "The Future of Design",
      "animation": "word-reveal",
      "style": {
        "fontFamily": "Inter",
        "fontSize": 96,
        "fontWeight": 700,
        "color": "#ffffff",
        "textAlign": "center"
      },
      "depth_class": "foreground",
      "entrance": { "primitive": "as-fadeInUp", "delay_ms": 400 }
    }
  ],
  "metadata": {
    "content_type": "typography",
    "visual_weight": "dark",
    "motion_energy": "moderate",
    "intent_tags": ["opening", "hero"]
  }
}

Kinetic type scene

Weight-morphing typography with a brand mark:

{
  "scene_id": "sc_brand_type",
  "canvas": { "w": 1920, "h": 1080 },
  "duration_s": 3.0,
  "assets": [
    { "id": "logo", "type": "image", "src": "assets/logo-mark.svg" }
  ],
  "camera": { "move": "static" },
  "layers": [
    {
      "id": "bg",
      "type": "html",
      "src": "prototypes/dark-gradient.html",
      "fit": "cover",
      "depth_class": "background"
    },
    {
      "id": "brand_text",
      "type": "text",
      "content": "ANIMATIC",
      "animation": "weight-morph",
      "style": {
        "fontFamily": "Inter",
        "fontSize": 120,
        "fontWeightStart": 100,
        "fontWeightEnd": 900,
        "color": "#ffffff",
        "textAlign": "center",
        "letterSpacing": "0.08em"
      },
      "depth_class": "foreground"
    },
    {
      "id": "logo_mark",
      "type": "image",
      "asset": "logo",
      "position": { "x": 860, "y": 620, "w": 200, "h": 200 },
      "opacity": 0.8,
      "depth_class": "foreground",
      "entrance": { "primitive": "as-fadeIn", "delay_ms": 800 }
    }
  ],
  "metadata": {
    "content_type": "brand_mark",
    "visual_weight": "dark",
    "motion_energy": "moderate",
    "intent_tags": ["closing"]
  }
}

Try it

Try asking your AI:

Show me the scene format schema and explain the layer types.
Create a scene JSON for a split-panel layout with a portrait photo on the left and a quote on the right.
Validate this scene JSON against the scene format spec.

Was this page helpful?