Introduction: Why Facial Animation Matters
Players read faces. Long before they process dialogue or read subtitles, they're watching a character's eyes, mouth, and brow for emotional truth. An incredible voice performance paired with a blank, static face feels hollow. Conversely, even mediocre audio can be elevated by expressive, well-crafted facial animation. In the era of photorealistic game characters — MetaHuman, Character Creator, Ziva-simulated faces — the bar for facial animation quality has never been higher, and the gap between amateur and professional work has never been more visible.
This guide covers facial animation methods, hardware, software implementation, and practical workflows for game developers working in Unreal Engine and Unity.
Facial Animation Methods
There are three primary approaches to facial animation in games, each with distinct strengths and limitations.
Blend Shapes (Morph Targets)
Blend shapes are the industry standard for facial animation. Each blend shape is a full mesh deformation that moves the face from a neutral pose to a specific expression or phoneme (mouth shape for speech). By blending multiple shapes together at different weights, you can create any expression that falls within the combination space of your shape library.
Advantages of blend shapes:
- Excellent visual quality — each shape is hand-sculpted or mocap-derived to look exactly right
- Predictable performance cost — blending is GPU-friendly and well-optimized in both UE5 and Unity
- ARKit standard provides 52 shapes that cover the full FACS (Facial Action Coding System) vocabulary
- Easy to author and edit in 3D software (Maya, Blender, ZBrush)
Limitations: blend shapes can't easily represent wrinkles, skin sliding, and volume preservation under extreme deformations without additional techniques (corrective shapes, dynamic wrinkle maps).
Bone-Based Facial Rigs
Instead of mesh deformation through blend shapes, bone-based rigs use a skeleton of facial bones (jaw bone, eyelid bones, brow bones, cheek bones, lip bones) driven by an animator or by curves. This approach is common in real-time games because it's computationally lighter than blend shape evaluation on complex meshes and works natively with the same animation systems used for body animation.
Bone rigs are easier to keyframe animate and work well for stylized characters. The limitation is that bone deformation can look less organic than blend shapes for photorealistic faces — bone transformations create angular deformations that require careful weight painting to smooth out.
Many shipped games use hybrid approaches: a skeletal rig for the face driven by Animation Blueprint, with blend shapes for expressions and corrective shapes for problem deformations.
FACS (Facial Action Coding System)
FACS is a scientific system developed by psychologists Paul Ekman and Wallace Friesen in 1978 to taxonomically describe all possible human facial movements. It defines 44 Action Units (AUs) — muscle group activations — and any human expression can be decomposed into combinations of these units.
In games, FACS is used as the organizing framework for blend shape libraries. Rather than having shapes named "happy" or "angry" (which are interpretations), FACS-based rigs have shapes named AU6 (cheek raiser), AU12 (lip corner puller), AU17 (chin raiser) — the actual muscle movements. Combining AU6 + AU12 produces a smile. This modularity allows any possible expression to be created rather than only the expressions the rigger anticipated.
Apple's ARKit face tracking uses a simplified but FACS-derived set of 52 blend shapes, which has become a de facto standard for real-time facial animation pipelines.
Facial Mocap Hardware
Facial motion capture ranges from consumer-grade to production-level, with significant cost differences.
iPhone Face Tracking (ARKit)
The TrueDepth camera in iPhones (iPhone X and later) tracks 52 ARKit blend shape weights in real time. Free apps like Live Link Face (from Epic Games) stream this data wirelessly to Unreal Engine, where it can drive a character's facial rig in real time or be recorded for later use.
This is remarkably capable for the price point. ARKit-based performances look professional when done well: good lighting, expressive performance, and a character rig that maps cleanly to the 52 ARKit shapes. The limitations are head rotation tracking (less stable than marker-based systems), range of expression (you're limited to the 52 standard shapes), and the actor must hold the phone or wear a phone mount rig near their face.
Faceware Technologies
Faceware is a professional facial mocap system using a head-mounted camera that records the actor's face during performance. Video is processed through Faceware's Analyzer and Retargeter software to extract facial motion data, which is then mapped to custom blend shape rigs. Faceware is used in major game productions (Call of Duty, NBA 2K) and provides significantly higher quality and flexibility than ARKit at a corresponding price increase.
Xsens and Marker-Based Systems
Traditional marker-based facial mocap (small reflective dots placed on the actor's face) provides the highest accuracy and the greatest freedom in data mapping, but requires a full mocap studio with multiple cameras and significant post-processing. This is typically reserved for the largest productions. Xsens face modules add facial tracking to their body tracking inertial suit system, providing an all-in-one body+face capture solution.
ARKit Blend Shape Standard
Because ARKit's 52 shapes have become an industry standard (supported by MetaHuman, Character Creator, VTuber tools, and many commercial rigs), understanding what they cover is valuable for any facial animation work:
- Eye tracking: eyeLookUp/Down/Left/Right (x2 for each eye), eyeBlink, eyeSquint, eyeWide
- Brow: browDown, browInnerUp, browOuterUpLeft/Right
- Nose: noseSneer (x2)
- Jaw: jawOpen, jawLeft/Right/Forward
- Mouth: mouthLeft/Right, mouthSmile/Frown/Stretch/DimpleLeft/Right, mouthUpperUpLeft/Right, mouthLowerDownLeft/Right, mouthPressLeft/Right, mouthClose, mouthFunnel, mouthPucker, mouthRollLower/Upper, mouthShrugLower/Upper
- Cheek: cheekSquint (x2), cheekPuff
- Tongue: tongueOut
If your character rig uses these exact 52 shape names, it will work directly with ARKit data, Live Link Face, MetaHuman, and a wide ecosystem of tools without any custom remapping.
Implementing Facial Animation in Unreal Engine 5: The MetaHuman Pipeline
MetaHuman Creator, Epic's free character creation platform, produces photorealistic human characters that come pre-rigged with a complete facial animation system compatible with ARKit blend shapes and a sophisticated bone rig.
Live Link Face for Real-Time or Recorded Performance
Epic's free Live Link Face iOS app streams ARKit face tracking data to UE5 via the Live Link plugin. Setup:
- Enable the Live Link plugin in your UE5 project
- Add a Live Link Source for "ARKit Face" in the Live Link panel
- Add a Live Link Component to your MetaHuman character Blueprint
- Set the Live Link Subject to the iPhone's subject name
- The MetaHuman's facial animation system automatically maps the incoming ARKit data to the character's blend shapes and bone rig
For cinematic production, record the performance using Sequencer's Live Link recording functionality to capture the facial animation as animation curves that can be edited frame by frame.
The MetaHuman Face Board
MetaHumans include a "Face Board" — a Sequencer-friendly control rig that exposes every facial control as an animatable property. This allows animators to keyframe facial animation directly in Sequencer without mocap hardware. The Face Board works alongside or instead of Live Link data.
Unity's Face Animation Tools
Unity doesn't have a MetaHuman equivalent, but several tools and workflows support high-quality facial animation:
Blend Shape Curves in the Animator
Unity's SkinnedMeshRenderer exposes blend shape weights that can be animated directly through the Animator as generic float parameters. Create animation clips in the Animation window that key blend shape weights, then drive them through the Animator state machine or Timeline for cinematics.
Character Creator + iClone Export
Reallusion's Character Creator generates characters with ARKit-compatible facial rigs, and their AccuFace system supports iPhone face tracking. Exporting to Unity with the CC&iClone Pipeline plugin preserves the facial rig.
Unity Face Capture
Unity's Live Capture package (available on GitHub) includes an iOS companion app that streams ARKit face data to the Unity Editor for recording facial animation directly in Unity timelines. This mirrors UE5's Live Link Face workflow.
Lip Sync: Viseme-Based Systems
Lip sync — matching mouth shapes to speech audio — is a specialized subset of facial animation. Rather than animating every phoneme individually, most game lip sync systems use visemes: visual speech units that correspond to groups of phonemes that produce similar mouth shapes.
A minimal viseme set for English has around 15 shapes. Tools like SALSA LipSync (Unity), OVRLipSync (Meta), and UE5's built-in Curve/Speech Audio Component analyze audio in real time and drive viseme blend shape weights automatically. For pre-rendered dialogue, tools like Annosoft and Auto Lip Sync generate viseme timing curves from audio+transcript that can be refined by hand.
Key to good lip sync: visemes should be slightly "anticipatory" — the mouth should begin forming the next viseme slightly before the audio for that phoneme arrives, matching how human faces actually work.
Emotion States and Facial Layers
Facial animation in games often involves blending a base emotion state (the character's current mood) with moment-to-moment expression changes (reacting to dialogue). This layering approach:
- Base emotion layer: Low-weight, slow-changing blend of shapes that establishes mood (slight brow furrow for tension, relaxed for content)
- Dialogue layer: Lip sync and conversational expressions driven by audio analysis
- Reaction layer: High-weight, fast, triggered reactions to specific events (flinch, surprise, laugh)
- Blink layer: Autonomous random blinks (critical — characters without blinking look dead)
- Eye tracking layer: Eyes following points of interest (speaker, objects, player)
Managing all these simultaneously requires a facial animation controller that blends these layers without conflict — for example, a surprise reaction shouldn't override lip sync in a way that makes the mouth snap shut during speech.
Combining Body and Facial Mocap
Full-performance capture records body and face simultaneously. The actor wears a full-body mocap suit while also having facial markers or a head-mounted camera. This is the highest quality approach because body language and facial expression are recorded together as a unified performance — the slight lean forward as a character makes an important point, the body tension that matches a pained expression.
When combining separate body and facial captures, the performances must be synchronized. This is done either by time-code synchronization during recording (all systems share a master clock) or in post-production by manually aligning emotional beats between the body and face recordings.
Body Mocap Without Facial Mocap: Workarounds
Many mocap packages — including standard body mocap packs — are captured without facial data. For games using these assets as character animation, facial animation must be handled separately:
- Procedural emotion: Use an emotion state system to drive facial blend shapes based on gameplay state (combat = tense expression, exploring = curious expression)
- Scripted facial animation: Add facial animation only during scripted moments (cutscenes, dialogue) rather than throughout gameplay
- Audio-driven visemes: Implement lip sync that runs automatically on any character speech, providing a basic level of facial movement without per-clip animation
- Eye tracking and blinks: Even without expression animation, adding procedural eye tracking (eyes follow nearby characters or the player) and random blinks dramatically increases perceived life
The last point is worth emphasizing: autonomous blink generation is one of the highest ROI improvements you can make to a character that otherwise has no facial animation. Characters who never blink look like mannequins.
Facial Animation Optimization
High-fidelity facial animation on complex meshes can be performance-intensive:
- LOD facial rigs: Use full blend shape evaluation for hero characters at close range; switch to simplified bone-only rig at distance
- Blend shape limit: Actively blending shapes has a GPU cost proportional to the number of active shapes and vertex count. Keep non-zero blend shapes minimal at any given time
- Baked expression textures: For extreme LOD distances, bake the character's current expression into a texture and use a simple shader rather than real-time deformation
- Cinematic vs gameplay rigs: Use a high-resolution cinematic rig (MetaHuman quality) for cutscenes and a simplified gameplay rig for real-time play, swapping at cutscene transition
Frequently Asked Questions
Do I need expensive hardware to do facial mocap?
No. An iPhone with ARKit (iPhone X or later) and the free Live Link Face app provides surprisingly professional facial capture when used correctly. Good lighting on the actor's face, a stable head mount for the phone, and an expressive performer can produce results approaching much more expensive systems. For non-cinematic games, procedural emotion systems and audio-driven lip sync may eliminate the need for facial mocap entirely.
How many blend shapes does a good facial rig need?
For ARKit compatibility, 52 shapes. For a production-quality rig that can express any emotion convincingly, 50–80 shapes is typical. Fewer shapes than this result in limited expressiveness; more shapes only add value if you have the tools and workflow to author them efficiently. Avoid "expression shapes" (a single shape for "happy face") in favor of modular FACS-style shapes that combine flexibly.
What's the difference between live link and baked facial animation?
Live Link streams facial data in real time from a device (iPhone) to the engine — useful for previewing and recording. Baked facial animation is the recorded result: animation curves or blend shape weight curves stored in an animation asset and played back deterministically. Production pipelines use Live Link to record, then bake to animation assets for final use in the game.
Can I use facial animation on non-human characters?
Absolutely. The FACS system and ARKit blend shapes are human-specific, but the techniques (blend shapes, bone rigs, layered animation systems) apply to any character. Non-human characters (animals, monsters, aliens) need custom blend shape sets designed for their anatomy. Retargeting ARKit data to a non-human rig requires custom mapping of which ARKit shapes drive which character shapes, often with creative reinterpretation.
How do I prevent uncanny valley in facial animation?
The uncanny valley in facial animation is most often caused by: eyes that don't move naturally (no micro-movements, no blinks, no saccades), lip sync that's slightly off, expressions that don't reach the eyes (a mouth smiling with dead eyes), and lack of micro-expressions (brief flashes of emotion before the main expression settles). Solutions: implement procedural eye behavior (blinks, subtle drift, focus shifts), use slightly anticipatory lip sync, and layer subtle ambient animation on brows and cheeks even during neutral states.
Conclusion: Bringing Characters to Life
Facial animation is one of the most human-specific skills in game development. Players have evolved to read faces with extraordinary sensitivity — we notice the slightest wrongness in an expression. Getting facial animation right requires understanding both the technical systems (blend shapes, ARKit, Live Link, bone rigs) and the artistic principles (FACS action units, emotion layering, lip sync best practices).
Start with the fundamentals: get eye tracking, blinks, and basic viseme-driven lip sync working before adding complex expression systems. Each improvement compounds — a character with natural eyes, blinking, and basic lip sync already looks orders of magnitude more alive than one without these basics, regardless of how sophisticated the rest of the animation system is.
Browse our complete motion capture animation library for professional body animation assets compatible with Unreal Engine and Unity, and build your facial animation system on top of our high-quality mocap body performances.
Professional Body Motion Capture for Games
While facial animation captures emotion, body motion capture delivers the foundational character movement that brings games to life. MoCap Online offers professionally captured full-body animation packs in FBX, BIP, Unreal Engine, Unity, Blender, and iClone formats — providing the locomotion, combat, and interaction animations that complement your facial animation pipeline.
