Face Capture for Game Dev: iPhone, ARKit, and MetaHuman

Face Capture for Game Dev: iPhone, ARKit, Live Link Face & MetaHuman Workflows — MoCap Online

Apr 11, 2026 by Kyle Jirik

What Is Face Capture — and Why Does It Matter for Game Dev?

Face capture is the process of recording and translating human facial movement into digital animation data. Where traditional body motion capture tracks skeletal movement from the neck down, face capture focuses on the dozens of micro-expressions, muscle contractions, and jaw movements that make a character feel alive rather than plastic. It is the difference between a character that walks realistically and one that genuinely communicates emotion.

For game developers and 3D animators, face capture has gone from a technique reserved for major studios to something achievable with consumer hardware. An iPhone 12 or newer can deliver broadcast-quality facial animation data. That shift has changed how indie developers, VTubers, cinematic animators, and real-time production teams approach character work.

This guide covers the full landscape: how face capture technology works, the iPhone and ARKit pipeline, Unreal Engine's Live Link Face workflow, combining facial capture with body mocap, and professional tools like Faceware and Reallusion. If you are building a production workflow from scratch, this is where to start.

How Face Capture Technology Works

All face capture systems share the same fundamental goal: map real human facial movement onto a digital character's rig. They differ in how they collect that data.

Marker-based face mocap uses reflective dots applied to the performer's face, tracked by a helmet-mounted camera rig or a surrounding camera array. The system resolves each marker's 3D position every frame and uses that point cloud to drive a custom facial rig. This approach — used by companies like Faceware and on productions like The Last of Us Part II and Spider-Man: Miles Morales — delivers extremely high-fidelity results because the tracking is hardware-agnostic and frame-accurate even during fast motion. The tradeoff is cost: hardware rigs start in the thousands and require trained operators.

Depth camera face capture uses structured light or time-of-flight sensors (like the TrueDepth camera in iPhones) to build a real-time 3D mesh of the face. The ARKit framework on iPhone processes that depth data against 52 predefined blend shape coefficients — values like jawOpen, eyeBlinkLeft, mouthSmileLeft — and outputs them as normalized floats between 0 and 1. This is machine-readable facial animation data, generated with no post-processing required, at 60 frames per second.

Video-based facial tracking (also called markerless face mocap) uses AI and computer vision to infer 3D facial position from standard video. Tools in this category include Reallusion's AccuFACE, DeepMotion's face module, and the Metaface pipeline in iClone. Quality is improving rapidly, but results remain less stable than depth camera systems during fast head rotation or partial occlusion.

For most game developers and animators working today, the iPhone ARKit pipeline sits at the sweet spot of accessibility and quality.

iPhone ARKit Face Capture: The Accessible Entry Point

Apple's TrueDepth camera, available on iPhone X and all Face ID-enabled iPhones since, fires an infrared dot array onto the face and reads the deformation pattern to reconstruct the face's geometry 60 times per second. ARKit then maps that geometry onto 52 blend shape coefficients — the same blend shape set used by Epic's MetaHuman standard.

This alignment is not an accident. Epic designed the MetaHuman facial rig to be ARKit-compatible, which means data recorded from an iPhone maps directly onto a MetaHuman face without any retargeting step. The values Apple generates are the values the rig expects.

The practical implications are significant. A developer with an iPhone 12 or newer, the free Live Link Face app, and Unreal Engine 5 has a complete real-time facial capture pipeline that, two years before this hardware existed, would have cost $40,000 and required a dedicated studio day.

For teams that do not need real-time monitoring — for example, animators capturing performances to bake into animation assets — the iPhone pipeline is even simpler: record in the Live Link Face app, export the take as a CSV of blend shape curves, and import directly into Unreal Engine or any DCC that accepts ARKit blend shape data.

Unreal Engine Live Link Face Workflow

Live Link Face is Epic Games' free iOS application that streams ARKit face capture data directly into Unreal Engine over Wi-Fi. It is the standard entry point for facial capture in Unreal Engine 5 and works in two modes: live streaming for real-time preview and recording, and file export for offline use.

Setting Up the Live Link Face Pipeline

Prerequisites:
- iPhone with Face ID (iPhone X or newer; iPhone 12+ recommended for improved depth sensor accuracy)
- Unreal Engine 5 with the Live Link plugin enabled (Editor Preferences → Plugins → Live Link)
- A MetaHuman character or a skeletal mesh using ARKit's 52 blend shapes
- Both devices on the same Wi-Fi network

Step 1: Configure Live Link in Unreal Engine.
Open the Live Link panel (Window → Live Link) and add a Face AR Source. Enter the IP address of your iPhone. UE5 will wait for an incoming stream.

Step 2: Configure the Live Link Face app.
Install the app from the App Store. In Settings, enter the IP address of your development machine and the port (11111 by default). Enable streaming. In UE5's Live Link panel, you should see the subject appear with a green dot.

Step 3: Connect the stream to your character.
On your MetaHuman Blueprint (or any Blueprint using a Live Link Pose node), add a Live Link Component and set the Subject Name to match what appeared in the Live Link panel. Add a Live Link Pose node to the Animation Blueprint, connect the ARKit curve names to your facial rig's morph targets, and compile.

At this point, your character's face should mirror the performer's expressions in real time.

Step 4: Record and export a take.
In the Live Link Face app, tap the record button before the performance. The app stores takes locally and can export them as .csv files (blend shape curve data per frame) or sync wirelessly to the engine's Take Recorder. For production use, Take Recorder captures both the facial stream and body animation data in the same session, which is essential when combining with body mocap.

Real-Time Face Capture for VTubers and Virtual Production

Live Link Face is not limited to game production pipelines. VTubers use the same iPhone ARKit stack — often through apps like VTube Studio or direct Live Link integration — to drive 2D and 3D avatars in real time. The face capture data drives mouth sync, eye tracking, brow expressions, and head rotation, creating the animated presence that defines the VTuber format.

For virtual production stages using Unreal Engine's nDisplay or LED volume setups, real-time face capture streams directly into the live render, eliminating the post-production step entirely.

Combining Face Capture with Body Mocap

The most common gap in a production pipeline is the disconnect between facial performance and body motion. Body mocap suits track skeletal movement beautifully but ignore the face. Face capture solves the performance layer above the neck. The challenge is synchronization and combining both data streams into a unified character animation.

Timecode synchronization is the professional solution. Both the body mocap system (e.g., Rokoko, Xsens, OptiTrack) and the face capture app must stamp their data with the same timecode signal. UE5's Take Recorder supports SMPTE timecode input on devices that provide it, and Rokoko's Smartsuit Pro II outputs timecode that can be aligned with a Live Link Face recording in post.

For indie setups without timecode hardware, the practical approach is:

Record the body performance first using your body mocap suit or the body skeleton from a motion capture animation library.
Record the facial performance separately while the performer watches playback of the body session, matching energy and sync to a reference audio track.
Combine both in Unreal Engine's Sequencer or in a DCC like Maya or Blender, manually aligning the curves where needed.

This decoupled workflow is slower than simultaneous capture but requires no additional hardware and works well for cinematics, cutscenes, and any content where slight manual sync adjustments are acceptable.

For teams using pre-built body animation assets, the workflow is even more accessible. Pull a body animation from your motion capture animation library, apply it to your skeletal mesh in Unreal Engine, then layer a recorded face capture performance on top using Sequencer's curve tracks. The two streams operate independently on the skeleton — body translation and rotation data lives on the skeletal bones, facial expression data lives on the morph target curves — so they compose cleanly without blending complexity.

Professional Face Capture Tools

Faceware Technologies

Faceware is the marker-based system used on major game productions. Their Analyzer software processes footage from a camera rig worn by the performer and produces a dense tracking solve that drives any facial rig. Their Retargeter applies that solve to custom character rigs. The advantage over ARKit is stability under extreme motion and independence from any specific hardware or blend shape standard. The disadvantage is cost (analyzer + retargeter licenses are several thousand dollars annually) and the operational overhead of applying markers and running the analysis pipeline.

For studios shipping AAA titles where facial performance is a primary production value, Faceware is an industry standard. For indie developers and small teams, ARKit on iPhone is a more practical starting point.

Reallusion AccuFACE and iClone

Reallusion's AccuFACE is a markerless face mocap solution built into iClone 8. It uses a standard webcam or iPhone (via the Live Link Face connector) to drive Reallusion's CC4 character facial rigs. The pipeline is fully integrated — capture, animate, and export to FBX, BVH, or FBX are all first-party operations inside iClone.

For animators working in the Reallusion ecosystem (Character Creator 4, iClone, Cartoon Animator), AccuFACE is the native face capture path. For Unreal Engine pipelines, the ARKit/Live Link Face route is generally cleaner.

MetaHuman Animator

Epic released MetaHuman Animator in UE5.2, a markerless face capture tool that processes iPhone video footage (not just real-time streaming) through a high-fidelity solve pipeline. Unlike Live Link Face, which streams ARKit blend shape coefficients directly, MetaHuman Animator performs a full offline solve against the MetaHuman performance capture model, then bakes the result onto the MetaHuman rig as animation curves.

The output quality is noticeably higher than raw ARKit streaming, particularly on subtle lip and eye movements. The tradeoff is that it is an offline process — you capture the footage on iPhone, import it into UE5, run the solve (which can take several minutes depending on take length), and then receive the final baked animation. This is well suited to cinematic content and less suited to real-time VTubing or live virtual production.

MetaHuman Animator requires a MetaHuman character created through the MetaHuman Creator portal and is free to use within Unreal Engine.

Practical Workflow Recommendations by Use Case

VTubing and live streaming: Use Live Link Face app on iPhone with VTube Studio (for 2D/3D avatars) or direct UE5 Live Link integration for Unreal-based avatars. Real-time, free, and requires only the iPhone and app.

Indie game cinematics: Use MetaHuman Animator for highest-quality offline results. Capture on iPhone, solve in UE5, combine with body animation from a professional motion capture library. No hardware investment beyond the iPhone.

AA/AAA game production: Budget for a Faceware or equivalent marker-based system for hero characters. Use ARKit/Live Link Face for supporting cast, background characters, or rapid iteration during pre-production.

VR/XR real-time applications: Live Link Face streaming is natively supported in UE5 XR templates. The latency of the ARKit stream (typically under 100ms on a solid 5GHz Wi-Fi connection) is suitable for interactive real-time applications.

For more context on how facial animation fits into broader character production, the MoCap Online animation blog covers ongoing workflow guides for game developers and 3D artists.

Getting Started Without Capturing Your Own Faces

Not every project needs captured facial performance. For game characters where facial animation is secondary to body performance — combat, locomotion, crowd systems — professionally produced body motion capture assets carry the majority of the character animation work.

If you are early in production and want to validate your character pipeline before investing in face mocap hardware, start with a free animation pack to verify your skeleton setup, retargeting, and Animation Blueprint architecture are solid. Facial capture layered on top of a broken body rig produces broken results — getting the skeletal foundation right first saves significant iteration time.

FAQ: Face Capture for Game Developers

What iPhone do I need for face capture in Unreal Engine?
Any iPhone with Face ID — starting from the iPhone X — supports ARKit face tracking. In practice, iPhone 12 or newer delivers more stable depth sensor results, particularly at oblique angles and in mixed lighting. The Live Link Face app runs on iOS 14 or later and is free to download.

Is Live Link Face the same as face mocap?
Live Link Face is the name of Epic's iOS app and the Live Link protocol source type it creates in Unreal Engine. It uses iPhone ARKit face tracking (which is a type of face mocap) to stream 52 ARKit blend shape coefficients into UE5. The terms are often used interchangeably in UE5 documentation and community discussions.

Can I use face capture data on non-MetaHuman characters?
Yes. ARKit outputs 52 named blend shape coefficients as normalized float values. Any facial rig that exposes morph targets with matching names (e.g., jawOpen, mouthSmileLeft) will receive the values correctly. MetaHuman uses these names natively; custom rigs require a remapping step in the Animation Blueprint or via a Live Link Remap Asset, which maps incoming curve names to your rig's morph target names.

How do I sync face capture with body mocap?
The cleanest method is SMPTE timecode synchronization — both capture systems stamp their data with the same time reference, allowing frame-accurate alignment in post. For indie setups, record body and face separately and align manually in Unreal Engine Sequencer or your DCC of choice using audio reference (a clapper or vocal cue at the start of each take). Allow extra takes and record clean audio to make manual sync easier.

What is the difference between MetaHuman Animator and Live Link Face?
Live Link Face streams real-time ARKit blend shape data into UE5 — what the iPhone's TrueDepth sensor measures, applied directly to the rig. MetaHuman Animator is an offline solve tool: you import raw iPhone video footage and Epic's pipeline computes a higher-quality facial solve using a neural model trained on performance capture data. MetaHuman Animator produces better results; Live Link Face is suitable for real-time applications and faster iteration.

Do I need to buy face mocap hardware for indie game development?
Not necessarily. An iPhone with the free Live Link Face app is sufficient for many indie use cases, including cinematics, character reveals, and VTubing. Dedicated marker-based systems like Faceware are valuable when facial performance is a primary production value for hero characters in narrative games, but for supporting characters, crowd systems, and most gameplay animation, body mocap data combined with blend shape-driven emotion states handles the majority of requirements.

Build Your Character Animation Foundation First

Face capture brings performance to life, but it works best when it is layered on top of a solid body animation system. Whether you are building a cinematic character or a real-time avatar, the skeletal body animation underneath the facial performance needs to be clean, correctly retargeted, and production-quality.

Explore the full MoCap Online motion capture animation library for professional FBX, Unreal Engine, Unity, and Blender animation packs — from locomotion and combat to character-specific movement sets — that give your characters the physical foundation that makes facial capture worth the effort.