Skip to main content

Constraints

Constraints are spatial pins layered on top of segments. Three primitives exist in v1: root_path, effector_target, and pose_keyframe.

All constraints carry a type discriminator. Frames are zero-indexed and reference the global timeline of the request (after segments compose). The total number of frames is:

  • sum(segments[].duration_frames) when segments is non-empty, or
  • request.duration_frames when segments is empty (constraints-only).

Frame indices outside [0, total_frames - 1] cause frame_out_of_range.

World frame

Positions in constraints are in the client's world coordinate frame.

Servers MAY internally re-center the motion (for example, translating constraint positions so the earliest-frame root pin sits at (x=0, z=0) to stay inside the model's training distribution) as long as the response is un-translated back to the client's frame before encoding.

Clients MUST NOT assume generated motion starts at world origin — it starts wherever your constraints place the root.

Best-effort

Constraints are best-effort: the backbone honors them as closely as it can within the prompt and the model's training distribution. There is no protocol-level guarantee of exact satisfaction.

root_path

Floor-plane trajectory for the character's root.

{
"type": "root_path",
"frames": [0, 30, 60, 90],
"positions_xz": [[0.0, 0.0], [1.0, 0.0], [2.0, 1.0], [2.0, 2.0]],
"heading_radians": [0.0, 0.0, 0.5, 1.0]
}

positions_xz is the horizontal world position at each frame. heading_radians is optional; when present it specifies the body's facing angle as a right-handed rotation about the +Y axis, where 0 faces the +Z direction and positive values rotate +Z toward −X.

effector_target

World-space targets for a named joint, typically a hand or foot.

{
"type": "effector_target",
"joint": "LeftHand",
"frames": [40, 60, 80],
"positions": [[0.5, 1.2, 0.3], [0.6, 1.4, 0.4], [0.5, 1.2, 0.3]],
"rotations": [[0,0,0,1], [0,0,0,1], [0,0,0,1]]
}

positions are world-space (x, y, z). rotations, if given, are local-to-parent quaternions for the joint at each frame (same convention as pose_keyframe.joint_rotations). rotations is optional.

The joint name MUST exist in the request's skeleton.joints.

pose_keyframe

A pose at a single frame.

{
"type": "pose_keyframe",
"frame": 30,
"joint_rotations": {
"LeftArm": [0.707, 0.0, 0.707, 0.0],
"RightArm": [0.707, 0.0, -0.707, 0.0]
},
"root_position": [1.0, 0.95, 0.0],
"fill_mode": "generate"
}

joint_rotations is a sparse map of {joint_name: local_rotation_quat}. root_position is optional.

FieldTypeDefaultNotes
fill_mode"rest" | "generate""generate"Behavior for joints not listed in joint_rotations

Fill mode semantics:

  • rest — unspecified joints are pinned to their rest_rotation at this frame.
  • generate — unspecified joints are unconstrained; the backbone generates them as part of the motion. The constraint pins only the listed joints.
Densifying poses on the client

If you want "key pose A at frame 30, key pose B at frame 60, model fills the middle on specified joints only," densify by emitting one pose_keyframe per intermediate frame with slerped joint rotations (linear in frame index) and fill_mode: "generate". This pattern is intentionally client-side; the server does not collate keyframes.