Raw motion capture data is a goldmine of realistic human movement, but it rarely comes out of the capture session ready for production. Between marker noise, data gaps, and drift artifacts, getting from raw capture to game-ready animation requires a structured cleanup workflow. This guide walks through every stage of MoCap data cleanup, from initial marker reconstruction to final quality assurance.
What Raw MoCap Data Actually Looks Like
Before cleanup, raw MoCap data is messy. Expect to see marker jitter (small high-frequency noise on every joint), trajectory gaps where markers were occluded, positional drift over long takes, T-pose or reference frame misalignment, and swapped or mislabeled markers. None of this is unusual. Even a top-tier studio with 50+ Vicon cameras will have some cleanup to do. The question is how much, and having a repeatable workflow makes all the difference.
Marker Gap Filling Techniques
The first pass in any cleanup pipeline is filling gaps in marker trajectories. Common approaches include:
- Linear interpolation — Simple straight-line fill between the last known and next known position. Works for very short gaps (1–3 frames) but creates unnatural movement on longer ones.
- Spline interpolation — Cubic or B-spline interpolation that follows the curve of surrounding data. Better for gaps up to 10–15 frames.
- Pattern fill — Uses the trajectory of a neighboring marker (e.g., filling a left knee gap using the right knee mirrored). Best for longer gaps where context matters.
- Rigid body fill — Reconstructs the missing marker position based on other markers in the same rigid body segment. The gold standard for accuracy.
Most professional packages (Vicon Nexus, OptiTrack Motive, Cortex) offer all of these. The key is using rigid body fill wherever possible and falling back to spline for isolated markers.
Rigid Body Solving and Skeleton Fitting
Once gaps are filled, the next step is solving marker data into a skeleton. This means mapping raw 3D marker positions to a defined skeleton hierarchy. The solver calculates joint rotations that best match where the markers ended up each frame. A clean solve depends on accurate marker placement during capture, a well-calibrated skeleton template, and consistent marker labeling throughout the take. When the solve is off, you get joints that pop, limbs that stretch, or hips that drift. Always check the solve residuals — high residual frames indicate the solver is struggling to match the data.
Filtering and Smoothing Curves
After solving, the rotation curves still carry high-frequency noise from marker vibration. Filtering removes this noise while preserving intentional movement. The two most common filters are:
- Butterworth filter — A frequency-domain filter that cuts everything above a specified frequency. Excellent for removing electronic noise while preserving fast movements. Typical cutoff: 6–12 Hz for body motion, higher for fingers.
- Gaussian filter — A time-domain smoothing filter that averages neighboring frames using a bell curve weighting. Gentler than Butterworth, good for polishing curves without introducing phase shift.
The critical rule: filter conservatively. Over-filtering makes motion feel floaty and robotic. You want to remove the noise without killing the micro-movements that make MoCap feel alive. Always compare filtered vs. unfiltered playback before committing.
Foot Contact Cleanup
Foot sliding is the most visible MoCap artifact and the one audiences notice instantly. Even small amounts of foot slide destroy the illusion of weight and contact. Cleanup involves:
- Detecting foot plant frames — Identifying when each foot should be stationary (heel strike, toe push-off, flat plant)
- Locking foot position — Pinning the foot IK target to a fixed world position during plant frames
- Blending in and out — Smoothly transitioning between locked and free states to avoid pops
- Adjusting pelvis height — Compensating for the leg length changes that locking introduces
MotionBuilder has excellent foot contact tools built in. In Maya, you will typically use HumanIK floor contacts or custom scripts. In Blender, addons like Auto-Rig Pro include foot locking utilities.
Hand Cleanup for Prop Interaction
Hand and finger MoCap data is notoriously noisy because finger markers are small and frequently occluded. Cleanup priorities include stabilizing wrist rotation so hands do not wobble, smoothing finger curls while preserving intentional grip changes, fixing thumb trajectories that cross through the palm, and ensuring prop contact frames show clean gripping without interpenetration. For scenes with prop interaction, you often need to constrain the hand to the prop object during contact frames and blend that constraint in and out.
Facial MoCap Cleanup Workflow
Facial capture generates dense data with dozens to hundreds of markers or blend shape channels. The cleanup workflow differs from body capture:
- Stabilize the head — Remove head translation and rotation so facial deformations are isolated
- Clean individual channels — Each blend shape or marker region (brows, eyelids, lips, jaw) gets filtered independently
- Fix asymmetry artifacts — Marker-based face capture often produces unintended asymmetry from marker slip
- Match audio sync — Verify lip sync timing against the audio reference, adjusting any drift
Batch Processing Tools and Scripts
On a production with hundreds of takes, manual cleanup is not feasible. Studios rely on batch processing scripts that apply consistent filter settings across all takes, auto-detect and fill gaps below a threshold, run foot contact detection with preset parameters, and export to target formats (FBX, BVH) with standardized naming. Python scripting in MotionBuilder, Maya, or Blender is the standard approach. A well-built batch pipeline can reduce per-take cleanup from hours to minutes.
Quality Assurance Pass Checklist
Before any cleaned animation leaves the pipeline, it should pass these checks:
- No visible foot sliding on any surface contacts
- No joint pops or sudden rotation spikes
- No limb stretching or compression beyond 1–2%
- Hips maintain consistent height during standing poses
- Fingers do not interpenetrate on grip poses
- Root motion path is smooth and logical
- Animation loops seamlessly (if cyclic)
- Frame rate matches target project (30 or 60 fps)
- Bone naming matches the target skeleton standard
Common MoCap Artifacts and Fixes
Marker Swap
When the system confuses two nearby markers, causing limbs to cross. Fix by relabeling markers at the swap frame and re-solving.
Gimbal Lock Flips
Sudden 360-degree rotation spikes from Euler angle singularities. Fix by changing rotation order or converting to quaternion interpolation.
Drift
Gradual positional shift over a long take. Fix by anchoring the root to known reference positions at key frames.
Jitter on Slow Movements
Low-velocity movements where noise becomes proportionally large. Fix with adaptive filtering that increases smoothing at low velocities.
Timeline: Capture to Game-Ready
For a typical body MoCap take, expect roughly 2–4 hours of cleanup per minute of final animation. Breakdown: gap filling and labeling (30–60 min), skeleton solve and adjustment (20–40 min), curve filtering and manual polish (30–60 min), foot and hand contact cleanup (30–60 min), QA and export (15–30 min). Facial capture adds another 1–3 hours per minute on top of body cleanup. This is why pre-cleaned MoCap packs save enormous production time — the cleanup has already been done by specialists, giving you game-ready animation without the pipeline overhead.
Why Pre-Cleaned MoCap Packs Save Production Time
Building and maintaining a MoCap cleanup pipeline requires specialized skills, expensive software licenses, and significant per-take labor. For most game studios and indie developers, purchasing professionally cleaned MoCap animation packs is dramatically more cost-effective than capturing and cleaning data in-house. A pre-cleaned pack from MoCap Online delivers game-ready animations with consistent quality, standard skeleton hierarchies, clean foot contacts, and export-ready FBX files — all the work described in this guide, already done.
Frequently Asked Questions
How long does MoCap cleanup take per animation?
For body capture, expect 2–4 hours of cleanup work per minute of final animation. Simple locomotion cycles clean up faster (1–2 hours), while complex fight choreography or prop interaction can take 4–6 hours per minute. Facial capture adds another 1–3 hours per minute.
Can MoCap cleanup be fully automated?
Partially. Gap filling, initial filtering, and format conversion can be automated with batch scripts. However, artistic decisions like how much to smooth a performance, fixing subtle foot contacts, and handling edge cases like marker swaps still require human judgment. AI-assisted cleanup tools are improving but have not replaced manual QA.
What software is best for MoCap cleanup?
MotionBuilder is the industry standard for MoCap cleanup, with purpose-built tools for retargeting, filtering, and foot contacts. Maya with HumanIK is a strong second option. Blender is increasingly viable with community addons. For marker-level work, the capture system software (Vicon Nexus, OptiTrack Motive) is essential.
Does buying pre-cleaned MoCap data eliminate all cleanup work?
Pre-cleaned packs eliminate the capture-to-clean pipeline entirely. You may still need to retarget animations to your specific character skeleton and adjust root motion settings for your engine, but the labor-intensive noise removal, gap filling, and contact cleanup is already handled.
Combining Facial and Body Animation
While MoCap Online specializes in body motion capture rather than facial animation, the two animation layers must work together for believable characters. Our Conversation and Meeting animation packs capture the body language that accompanies speech — hand gestures, weight shifts, head nods, and postural changes. Layer your lip sync and facial animation on top of these body performances for characters that feel naturally engaged in dialogue.
In both Unreal Engine and Unity, facial animation runs as a separate animation layer from body animation. Use our body mocap on the base layer and your lip sync system (Audio2Face, FaceFX, Oculus LipSync, or custom blend shape drivers) on the facial layer. The two systems work independently, so you can swap body animations without re-doing facial work and vice versa. This layered approach is industry standard for dialogue sequences in games.
Automated Motion Capture Cleanup Pipelines
Manual motion capture cleanup consumes 2 to 8 hours per minute of captured data depending on marker occlusion rates and movement complexity. Automated pipelines reduce this to minutes by applying rule-based and machine learning corrections in sequence, reserving manual intervention for edge cases that automated tools cannot resolve. Studios processing large volumes of mocap data — game animation libraries, film production, virtual production shoots — rely heavily on automation to meet production timelines.
Gap filling is the first automated step after marker reconstruction. When markers become occluded during capture, the tracking software records gaps in the trajectory data. Simple gap filling uses cubic spline interpolation between the last known and next known positions. Advanced methods incorporate biomechanical constraints — an elbow marker gap is filled using the known positions of the shoulder and wrist markers combined with the joint's rotational limits, producing a physically plausible trajectory rather than a mathematically smooth but anatomically impossible one.
Noise filtering removes high-frequency jitter from marker trajectories without destroying intentional quick movements. Butterworth low-pass filters at 10 to 15 Hz cutoff work well for slow movements like walking and talking, but aggressive filtering destroys the sharp accelerations present in combat and sports captures. Adaptive filtering adjusts the cutoff frequency based on movement velocity — slow segments receive heavy smoothing while fast movements pass through with minimal filtering. This preserves the snap and impact that makes action animations feel powerful.
Skeleton solving converts cleaned marker data into joint rotations suitable for game engines. The solver maps each marker to its corresponding skeleton joint and computes the rotation needed to position the virtual bone at the marker location. Redundant markers (multiple markers per body segment) improve solve accuracy by over-constraining the solution — if one marker drifts, the others provide enough information to maintain correct joint angles. Modern solvers like those in Vicon Shogun and OptiTrack Motive handle real-time solving at capture speeds.
Foot contact detection and correction prevents the ground sliding that plagues raw mocap data. Automated systems analyze vertical velocity of foot markers to detect plant events — moments when the foot should be stationary on the ground. During detected plant phases, the solver locks the foot position and adjusts the hip and leg chain using inverse kinematics to maintain a natural pose. This foot-locking step is critical for any animation that will be used with root motion in a game engine, as even small amounts of foot sliding destroy the illusion of weight and contact.
Batch processing frameworks orchestrate the full cleanup pipeline across hundreds of captured takes. A typical batch job runs gap filling, noise filtering, skeleton solving, foot contact correction, and format export as sequential stages with validation checks between each step. Failed validations flag takes for manual review rather than halting the entire batch, allowing clean takes to proceed through the pipeline while problematic captures queue for artist attention. This parallel processing approach means a day's capture session of 200 takes can be cleaned overnight with only 10 to 15 takes requiring manual intervention the following morning.
Motion capture data standards vary significantly between providers and hardware systems, creating interoperability challenges for studios that source animation from multiple vendors. The FBX format has become the de facto interchange standard, but different export settings for axis orientation, scale units, and rotation order can produce incompatible files even within the same format. Establishing a studio-wide import specification that defines the expected coordinate system, unit scale, frame rate, and naming convention eliminates these compatibility issues at the pipeline boundary. All incoming mocap data passes through a validation and conversion step before entering the production asset library, ensuring that clips from any source integrate seamlessly with the project's character rigs. Automated validation scripts that check bone counts, hierarchy structure, frame ranges, and naming patterns catch format discrepancies before they propagate through the animation pipeline and cause retargeting failures on production characters during crunch time when debugging bandwidth is at its lowest.

