Volumetric Video Capture: Where We Are and Where It's Going
If you’ve spent any time in VR or AR in the past year, you’ve probably encountered volumetric video—even if you didn’t know what to call it. It’s the technology that captures real people and environments as three-dimensional objects that you can view from any angle, walk around, and interact with in immersive space.
Unlike traditional 360-degree video, which captures a scene from a fixed point, volumetric video captures the full three-dimensional shape and appearance of subjects. The result is content that feels fundamentally different from anything else in immersive media. A performer isn’t a flat image projected onto a sphere. They’re a three-dimensional presence in your space.
The technology has improved dramatically in the past two years. But it’s still far from mainstream, and understanding why requires looking at the production pipeline, the economics, and the distribution challenges.
How It Works
Volumetric capture typically involves surrounding a subject with an array of cameras—often 50 to 100 or more—that simultaneously record from every angle. Software then processes these multiple viewpoints to reconstruct the subject as a 3D mesh with texture (colour and detail) applied to its surface.
The result is a sequence of 3D models that can be played back as video in any immersive environment. The viewer can move freely around the captured subject, seeing it from any angle, at any distance.
There are several technical approaches to volumetric capture.
Studio-based multi-camera rigs use arrays of synchronised cameras (RGB, depth, or both) in a controlled environment. Companies like Microsoft Mixed Reality Capture and Dimension Studios (UK) operate dedicated volumetric capture studios. The quality is high but the cost is substantial—studio time typically runs $10,000-$50,000 per day.
LiDAR-based capture uses laser scanning technology to build point cloud representations of subjects. iPhone and iPad Pro devices with LiDAR sensors have made basic volumetric capture accessible to anyone, though the quality gap between consumer and professional capture is enormous.
Neural radiance fields (NeRFs) and Gaussian splatting represent newer approaches that use AI to reconstruct 3D scenes from a smaller number of camera viewpoints. These techniques have progressed remarkably quickly and are reducing the hardware requirements for acceptable-quality volumetric content.
What’s Changed in 2026
The most significant development in the past year has been the improvement of AI-based reconstruction methods.
Gaussian splatting, which produces 3D representations using collections of 3D Gaussian functions rather than traditional polygon meshes, has become the preferred method for many volumetric content producers. The rendering quality is excellent, the file sizes are more manageable than mesh-based formats, and the processing pipeline is faster.
Several startups have built commercial tools around these techniques. Luma AI offers cloud-based processing that can turn a set of photos or a video into a viewable 3D scene. Polycam and similar apps have added volumetric capture features accessible to non-technical users.
The quality from a handful of consumer cameras is nowhere near studio-grade volumetric capture. But it’s good enough for many applications—virtual property tours, product visualisation, event documentation, and social sharing.
Professional studios have also benefited. Studios that previously needed 100 cameras can now achieve comparable results with 30-40 cameras plus AI reconstruction, which reduces both capital costs and physical space requirements. AI techniques like custom AI development applied to capture pipeline processing are helping studios reduce the time between capture and delivery from weeks to days.
Production Economics
The cost structure of volumetric video production remains a significant barrier to mainstream adoption.
Capture costs. Professional studio capture runs $10,000-$50,000 per day depending on the facility and the complexity of the shoot. A one-minute volumetric performance might take half a day of studio time, plus preparation and reset time.
Processing costs. Converting raw capture data into a deliverable 3D asset requires substantial compute. Cloud processing has reduced the capital expenditure, but processing a single minute of high-quality volumetric video still costs $1,000-$5,000 in compute time.
Storage and bandwidth. Volumetric video files are large. A minute of high-quality volumetric content can be 500MB to several gigabytes, depending on the format and compression. Streaming this to end users requires substantial bandwidth and CDN capacity.
Compare this to traditional video production: a well-equipped video crew can capture broadcast-quality content for $5,000-$15,000 per day, and the resulting files are easily distributed through existing infrastructure.
The economics are improving. AI-based reconstruction reduces both capture and processing costs. Better compression formats reduce storage and bandwidth requirements. But the gap between volumetric and traditional video production costs remains wide.
Applications Making Progress
Despite the cost challenges, several application areas are finding sustainable models for volumetric video.
Performance capture for gaming and VFX. Film and game studios use volumetric capture to create realistic character animations and digital doubles. The production budgets in these industries can absorb the costs, and the quality requirements justify the investment.
Heritage and cultural preservation. Museums, cultural institutions, and heritage organisations are using volumetric capture to create permanent 3D records of performers, elders, and cultural practices. The Australian Institute of Aboriginal and Torres Strait Islander Studies has explored volumetric capture for preserving cultural knowledge.
Live events. Experimental broadcasts of live performances using volumetric capture allow remote audiences to view performers from any angle. The technical challenges are substantial—real-time volumetric processing and streaming is computationally demanding—but the experience is compelling.
Medical and scientific visualisation. Capturing surgical procedures, anatomical demonstrations, and scientific experiments in volumetric format creates training resources that flat video can’t match.
Distribution Challenges
Capturing volumetric video is only half the problem. Getting it to end users is the other half, and it’s arguably harder.
There’s no universal format for volumetric video. Different playback environments—Meta Quest, Apple Vision Pro, web browsers, game engines—support different formats and have different performance constraints. Content producers often need to create multiple versions of the same asset for different platforms.
Streaming infrastructure for volumetric content is immature. Netflix, YouTube, and other platforms have spent decades optimising video compression and delivery. Volumetric streaming is years behind in terms of compression efficiency, adaptive bitrate algorithms, and CDN support.
The installed base of devices capable of high-quality volumetric playback is still small. Apple Vision Pro and Meta Quest 3 can render volumetric content well, but these devices have limited market penetration compared to smartphones and televisions.
Where This Is Heading
I think volumetric video will follow a similar adoption curve to 3D printing: rapid improvement in the technology, gradual reduction in costs, and eventual mainstream adoption in specific use cases rather than as a universal replacement for existing formats.
Within five years, expect AI-based capture methods to bring professional-quality volumetric video within reach of small production companies. Expect format standardisation to improve through industry groups and platform convergence. Expect the first genuine consumer-facing volumetric video platforms to emerge.
But don’t expect volumetric video to replace traditional video. It won’t. It’s a different medium suited to different applications. The most exciting work will happen when creators start treating it as its own form rather than a 3D version of flat video.