Blend Shapes & Morph Targets Guide | MCO

Blend shapes (also called morph targets or shape keys depending on your software) are one of the most powerful tools in a character animator's toolkit. They allow you to define specific mesh deformations as targets and blend between them, enabling everything from subtle facial expressions to dramatic body deformations that bones alone cannot achieve.

What Are Blend Shapes?

A blend shape is a stored vertex offset target. You start with a base mesh and create duplicate meshes where you have moved vertices into new positions. Each modified duplicate becomes a target shape. At runtime, the engine interpolates vertex positions between the base mesh and the target shapes based on a weight value (0 to 1). At weight 0, the mesh looks like the base. At weight 1, it matches the target shape exactly. Values in between produce proportional blending.

The key distinction from bone-based deformation is that blend shapes move vertices directly to predetermined positions rather than transforming them through a skeletal hierarchy. This makes blend shapes ideal for deformations that are difficult or impossible to achieve with bones, such as facial expressions, muscle bulging, and corrective deformation fixes.

Blend Shapes vs. Bones for Deformation

Bones excel at broad, structural deformation like limb movement, spine bending, and joint articulation. Blend shapes excel at surface-level deformation like facial expressions, subtle muscle definition, and corrective adjustments. In practice, professional character rigs use both systems together. The skeleton handles large movements while blend shapes add nuance, correct artifacts, and drive facial animation.

The decision of when to use which system comes down to the type of deformation. If the deformation follows a rotational or translational pattern, use bones. If the deformation requires vertices to move to specific sculptured positions regardless of the skeletal pose, use blend shapes.

Facial Blend Shape Workflows

ARKit and the 52 Blend Shapes

Apple's ARKit facial tracking system uses 52 standardized blend shapes that map to specific facial muscle groups. This has become a de facto industry standard for real-time facial animation. The 52 shapes cover eye movement (blinks, gaze direction), brow expressions, nose movement, mouth shapes (including jaw open, lips, cheeks), and tongue positions. Creating an ARKit-compatible face rig means sculpting all 52 targets, which ensures compatibility with iPhone face capture and many game engine facial animation systems.

FACS (Facial Action Coding System)

FACS is a scientific system that categorizes facial movement into Action Units (AUs), each corresponding to specific muscle contractions. Professional facial rigs often map blend shapes directly to FACS AUs, providing anatomically accurate facial animation. FACS-based rigs are particularly valuable for performance capture workflows where actor expressions need to be faithfully reproduced on digital characters.

Corrective Blend Shapes

One of the most valuable applications of blend shapes is corrective deformation. When a skeleton rig produces poor deformation at extreme joint angles (volume loss at the shoulder, pinching at the elbow), corrective blend shapes can fix these issues automatically. A corrective blend shape is sculpted to show the correct deformation at a specific pose and is driven by the joint rotation value. When the joint reaches that angle, the corrective shape activates and pushes the geometry into the correct position.

This approach is far more art-directable than trying to solve every deformation problem with bones alone. Corrective shapes are standard in film production and increasingly common in high-end game character rigs.

Body Blend Shapes

Beyond facial animation, blend shapes are used for body deformation effects including muscle flexion (biceps, pectorals, quadriceps tensing during exertion), breathing animation (chest and abdomen expansion), body type variation (adjusting a base mesh between muscular, slim, and heavy builds), and secondary motion cues. These body blend shapes can be driven by animation data, physics systems, or gameplay parameters.

Blend Shapes Across Applications

Maya: Blend Shapes

Maya calls them blend shapes. You create them using the Blend Shape deformer, which supports multiple target shapes, in-between targets, and combination targets. Maya's blend shape editor provides a visual interface for managing complex setups with dozens or hundreds of targets. Targets can be added from separate meshes or painted directly using the Shape Editor.

Blender: Shape Keys

Blender calls them shape keys. The basis shape key represents the default mesh state, and additional shape keys define target deformations. Shape keys can have driver expressions that automatically activate based on bone rotations or other parameters. Blender supports both relative shape keys (most common, blend from basis) and absolute shape keys (sequence-based, used for animation).

Unreal Engine: Morph Targets

Unreal Engine calls them morph targets. They are imported from FBX files and can be driven by Animation Blueprints, Anim Curves, or gameplay code. Unreal supports GPU-accelerated morph target evaluation for efficient real-time performance. Morph targets can be previewed and tested directly in the Morph Target Previewer panel.

Creating Blend Shapes from Sculpted Poses

The typical workflow for creating blend shapes involves duplicating the base mesh, sculpting the duplicate into the desired target shape, and registering it as a blend shape target. The topology and vertex count must remain identical between the base and all targets. Only vertex positions change. Sculpting tools in ZBrush, Mudbox, or Blender Sculpt mode are commonly used for this process. Artists focus on creating clean, isolated deformations that combine well with other targets.

Combining Multiple Blend Shapes

The power of blend shapes comes from combining them. A smile target plus a brow raise target plus a squint target creates a complex expression that would be extremely difficult to sculpt as a single shape. This additive nature means a relatively small set of well-designed blend shapes can produce an enormous range of expressions and deformations. However, some combinations can produce undesirable results (vertices stretching too far), which is where combination corrective shapes come in.

In-Between Shapes

In-between shapes (also called intermediate targets) define specific mesh states at partial blend weights. Instead of linearly interpolating between 0 and 1, you can define what the mesh should look like at weight 0.3 or 0.7. This gives artists precise control over the deformation path. For example, a mouth opening blend shape might need a specific lip curl at 50% that simple linear interpolation would not produce.

Performance Cost in Games

Each active blend shape adds computational cost because the engine must calculate vertex offsets for every affected vertex. A face rig with 52 blend shapes on a 10,000 vertex mesh means potentially 520,000 vertex offset calculations per frame. Modern GPUs handle this efficiently through compute shaders, but optimization still matters. Common strategies include limiting the number of simultaneously active blend shapes, using LOD to reduce blend shape complexity at distance, and compressing sparse blend shape data.

LOD and Blend Shapes

At a distance, subtle facial expressions and muscle deformations are invisible to the player. LOD systems can reduce blend shape cost by disabling blend shapes entirely at far distances, reducing the number of available targets at medium distances, and using simplified meshes with fewer affected vertices at each LOD level. This tiered approach ensures that blend shape quality is highest where it matters most: up close.

Exporting Blend Shapes in FBX

FBX is the standard format for transferring blend shapes between applications and into game engines. When exporting, ensure that blend shape names are clean and descriptive (engines will use these names), all targets are included in the export, animation curves driving blend shapes are exported alongside them, and the base mesh and targets share identical vertex counts. Both Unreal Engine and Unity import FBX blend shapes reliably, though naming conventions may need adjustment.

Blend Shapes and MoCap Facial Data

Motion capture facial data maps directly to blend shape weights. Performance capture systems (like those from Faceware, Dynamixyz, or Apple ARKit) track an actor's facial movement and output weight curves for each blend shape target. These curves are applied to the character's blend shape rig to reproduce the actor's performance. High-quality MoCap facial data combined with a well-built blend shape rig produces the most realistic facial animation possible, which is why this combination is standard in AAA games and film production.

Frequently Asked Questions

How many blend shapes does a typical game character need?

A character with full facial animation typically needs 50-70 blend shapes (52 for ARKit standard plus correctives). A character with only body corrective shapes might need 20-40. The total depends on the required expression range and quality bar. Mobile games may use as few as 15-20 for basic facial animation, while film characters can have 200 or more.

Can I use blend shapes and bones together on the same mesh?

Yes, and this is standard practice. The skeleton handles structural movement while blend shapes handle surface-level deformation, facial expressions, and corrective adjustments. In the deformation order, blend shapes are typically applied after skinning so they can correct bone-driven deformation artifacts. Most game engines support this combined approach natively.

Do blend shapes work with motion capture data?

Absolutely. Facial motion capture systems specifically output blend shape weight curves. These curves are recorded at high frame rates (often 60 or 120 fps) and applied directly to the character's blend shape rig. This workflow is the industry standard for realistic facial animation in games and film. Body motion capture can also drive corrective blend shapes through pose-space deformation setups.

What is the performance difference between blend shapes and bones?

Bones are generally cheaper because vertex skinning is highly optimized on modern GPUs. Blend shapes add per-vertex offset calculations on top of skinning. However, for the quality of deformation blend shapes provide (especially for faces), the cost is well justified. The key is managing how many blend shapes are active simultaneously and using LOD to reduce cost at distance. On modern hardware, a character with 50+ active blend shapes runs without issue.

Blend Shape Performance Optimization Techniques

Blend shape evaluation scales linearly with the number of active targets and affected vertices. A character face with 50 blend shapes affecting 5,000 vertices evaluates 250,000 vertex deltas per frame when all shapes are active. Reducing either the number of simultaneous active shapes or the vertex count of the blend shape mesh directly improves performance. Most game engines skip evaluation for blend shapes with weights below a threshold (typically 0.01), so authoring shapes that return cleanly to zero when not needed provides automatic culling.

Compressed blend shape storage reduces memory footprint without visible quality loss. Full-precision blend shapes store three 32-bit float values per vertex per target — for a 10,000-vertex mesh with 60 targets, that's 7.2 megabytes of delta data. Sparse storage formats record only vertices that actually move in each target, typically reducing data by 80% since most facial expressions affect less than 20% of the face mesh. Unity and Unreal both support sparse blend shape formats natively.

LOD transitions for blend shapes prevent wasted evaluation on distant characters. At close range, the full blend shape set drives facial animation with per-frame updates. At medium distance, the system reduces to a subset of 10 to 15 primary shapes (jaw open, smile, blink, brow raise) updated every other frame. Beyond a distance threshold, blend shape evaluation stops entirely and the face holds a neutral expression. These LOD tiers are essential for scenes with many visible characters.

GPU-accelerated blend shape evaluation moves the vertex delta computation from CPU to compute shaders. Traditional CPU evaluation processes blend shapes sequentially on a single core, creating a bottleneck when many characters animate simultaneously. GPU evaluation parallelizes across thousands of shader cores, processing all vertices of all characters in a single dispatch call. This approach requires storing blend shape deltas in structured buffers and outputting modified vertex positions to a vertex buffer that the rendering pipeline consumes directly.

Corrective blend shapes fix deformation artifacts that occur when multiple shapes combine. A smile shape and a jaw-open shape may individually look correct, but combining them produces an unnatural result because the corner of the mouth stretches beyond its natural range. A corrective shape activates when both smile and jaw-open exceed a combined threshold, pulling the mouth corner back to an anatomically correct position. Authoring corrective shapes is tedious but essential for high-quality facial animation — AAA games typically include 20 to 30 corrective shapes per face rig.

Procedural blend shape generation from machine learning models is an emerging technique that creates new expressions by interpolating between captured shapes. Rather than hand-sculpting every possible expression, an ML model trained on facial scan data can generate intermediate expressions that blend naturally between authored targets. This approach significantly reduces the artist workload for creating comprehensive facial rigs while maintaining the quality standard expected in modern games.

The choice between blend shapes and bone-based facial animation affects every downstream decision in the character pipeline. Blend shapes offer artistic precision because each expression is hand-sculpted exactly as the artist intends, but they consume more memory and scale poorly when many face variations are needed. Bone-based facial rigs use a skeleton hierarchy to deform the face mesh, offering compact data representation and easy runtime manipulation but requiring careful weight painting to avoid unnatural deformations. Most AAA productions use a hybrid approach where primary expressions are driven by bones for efficient runtime evaluation while corrective blend shapes handle the subtle deformation fixes that bones alone cannot achieve. Indie developers typically choose blend shapes for simplicity since the workflow is more intuitive and modern engines handle the performance overhead efficiently for single-character close-up shots during dialogue sequences. The critical decision factor is how many unique faces the project requires, as blend shape data multiplies linearly with character count while bone-based rigs share the same skeleton definition across all characters.