How do virtual reality and augmented reality worlds tell us where we are within a given space? This is through the help of virtual camera projection. Virtual cameras provide perspective by capturing different angles and positions of objects, backgrounds and scenes. Through this process, these virtual cameras help users feel as though they are moving around in 3D space.
Virtual camera systems tell your game engine what parts of your 3D scene to render to your screen or to your headset.
When you take a picture with a digital camera, it captures visual data within its “field of view”. The real world gets “projected” onto a 2-dimensional photo or video and captures the photo as pixels in a grid.
As the camera lens is moved around, it captures different things in its field of view: different objects, backgrounds, scenes, from different positions and angles. Objects that are farther away look smaller. Objects up close look larger. This is called “perspective”.
Similarly, virtual cameras have a position and rotation in 3D space, as well as a field of view. This creates a pyramid-like shape, called a frustum. Everything inside of the frustum gets rendered into an image to be displayed on a screen or headset. Things that are out of view are not rendered.
In addition to the pointing of the camera, there are also two “clipping planes” that determine how close or far the camera can see. These are called the “far clipping plane” and the “near clipping plane”. Any objects farther than the “far clipping plane” are considered to be too far from the camera to be worth rendering. To save computational resources, the game engine will just decide not to render objects farther than the far clipping plane.
Objects closer than the “near clipping plane” are also not rendered, because they may be too close to the camera and cause heavy discomfort, especially in VR.
The clipping planes, along with the position, rotation, and field of view of the camera, determine the frustum that describes the camera’s perspective. In most game engines, you’ll be able to see the frustum to see what the virtual camera will capture in the virtual scene. Everything inside of the frustum will be rendered by the graphics processing unit as the foreground. The GPU will apply all of the right materials and lighting to make your virtual scene look the way it should on its way to becoming screen pixels for your displays.
The background for the camera is usually filled in with a skybox, a flat image that wraps around the user at all angles. Just like a physical landscape fills in the backdrop for physical cameras, a virtual landscape of skies, buildings, mountains, or other faraway objects, can serve as the skybox for the virtual cameras.
By updating what the virtual camera sees and by compositing the foreground and background in real time, the virtual camera projection creates the illusion that users are moving around in 3D space, whether on desktop, in augmented reality, or in virtual reality.