Stereoscopic volume perception

In the past five years technology improvements made it possible to capture, modify and present technically good stereoscopic images. With devices able to capture high-resolution digital pictures and robust computer vision algorithms in postproduction, today we are able to produce high quality live action stereoscopic content – from a technical point of view.

This blog is an analytical engagement tackling a question living in the art domain of stereoscopic cinema: When does a ball look flat like a frisbee disk and when does it appear elongated like an egg?

To analyze volume perception we introduce a simple but powerful basic concept:

A 1x1x1 meter cube is used to virtually sweep through space away from the observer’s eyes to infinity. For each distance we measure the angles of width and depth on the observer’s retina, which causes the retinal disparity that elicits stereopsis. This ratio leads to a width to depth sweep through space (width to depth ratio vs. distance).

The same test setup is then duplicated. Instead of the eyes we have two cameras connected to a virtual cinema where our observer sits. Again we sweep our cube through space and measure the width and depth angles.

Having established this basic concept we then modify the parameters focal length and interaxial distance and see how they affect the »width to depth« ratio. During evaluation we find new dependencies between all parameters.

Moreover new measurement parameters will be presented, which simplify the usage of depth volume  in stereoscopic photography.

1. The basic concept of the cube

To explore stereoscopic volume perception we imagine a square cube that is aligned on the axis of the left eye (figure 1.1). The cube is moved from the left eyeball to infinity. Although this approach causes minor errors when the cube is very close, it simplifies the math for distances that are a multiple of the cube’s width. As there is just absolute parallax in one eye, the calculations only have to be done once. For latter equations we scale the cube down to improve our results.

One binocular depth cue for the observer is the disparity caused by the upper right and lower right corner of the cube. The retinal disparity can be measured as angle ∢δ.

To put this absolute angle in relation to something, we choose the retinal image caused by the width of the cube ∢ω. The ratio δ/ω then gives us an idea of the volume while the cube moves through space.

Figure 1.1

The theoretical concept of a cube sweeping through space. This sweep or chirp will give us a better understanding of how we perceive volume.

The following equations are used to compute plot 1.1.

  • pd = pupil or interocular distance of the observer            (65mm)
  • d = is the distance to the first surface of the cube
  • width = width of the cube (equal to the height and the depth of the cube)

, (1.1)

, (1.2)

, (1.3)

 , (1.4)

, (1.5)

 , (1.6)

As we can see in plot 1.1, the depth of the cube decreases inverse to the distance to the observer.

Plot 1.1

The x-axis shows the distance of the cube to the eyeball. The y-axis is the ratio δ/ω. 

The cube has neither constant nor linear depth volume.

The cube’s width to depth ratio is affected by the distance to the observer.

A cube, being one meter away, has a greater depth to width ratio than the same cube that is several meters away. As there is a reciprocal relation to the distance, the volume significantly decreases in the first meters. Our cognitive system compensates for this inverse relation, so the perceived volume of »objects we know« stays constant. Objects must follow this curve to appear consistent while traveling through space. Similar behavior of the human cognitive process can be seen in the size consistency. Objects that come closer to the observer change their image size on the observer’s retina, while the perceived size stays constant.

2. The cube model in stereoscopic photography

The same setup is used for the next step. Only the observer’s eyes are replaced by two cameras.

Imagine the cameras are connected to a cinema theater, in which the observer sits. Again we examine the ratio »width to depth« on the observer’s retina.

Figure 2.1

The eyes are replaced by two cameras in this setup.

2.1 Basic setup – ortho-stereoscopic image

For the first setup we use the following parameters:

Camera

  • Sensor width = 24mm (roughly a S35mm film back)
  • Sensor horizontal resolution = 2048 pixel
  • Focal length = 29mm
  • Interaxial distance = 65mm

Cinema

The observer has a angle of view of 45°. This is the arithmetic average of the theoretical sweet spot for 2048 pixel image resolution (36° – last row of a standard cinema) and the middle of a cinema (distance to the screen equals screen width (53°)). This seat is placed in the second half of a standard cinema and is therefore likely to represent a good average. The whole setup is screen size independent. For each screen size one pixel relative parallax will cause 1.5 minute of arc disparity in the observer’s eye.

After the projection of the three-dimensional cube onto a two dimensional image plane, the depth information lives as a relative parallax between the two images. The depth in the stereo image is the projection of the top right and bottom right corner (T: red line figure 2.1). In our setup the left camera captures no parallax, because it does not see the right top corner. All the absolute parallax is created in the right camera.

The two dimensional projection of the cube’s width on the camera’s image plane (in pixel) can be computed by equation 2.2. The two dimensional projection of the cube’s depth on the camera’s image plane (in pixel) can be computed by equation 2.3. The final ratio on the observer’s retina is shown in equation 2.4

  • T = real projection dimension of the cube’s depth in world space
  • hSensorPx = horizontal spatial image resolution of the sensor
  • sensorWidth = sensor width
  • widthProjectionPx = dimension of the cube’s width on the image sensor (in pixel)
  • depthProjectionPx = dimension of the cube’s depth on the image sensor (in pixel)

,(2.1)

,(2.2)

,(2.3)

, (2.4)

The ratio of relative parallax on the screen to disparity in the eye is not important, as it is canceled down by the quotient width/depth. The plot of this »width to depth« ratio results in exactly the same plot as plot 1.1. This way of capturing and screening is called ortho stereoscopy. The focal length (angle of view) at the capture is equal to the angle of view at the screening and moreover is the interaxial distance equal to the interocular distance.

2.2 Changing one parameter

If we choose a setup that corresponds to our human factors, we will get a natural reproduction of the depth volume. Anyway, in cinematographic work these camera parameters are rare. Limits like binocular rivalry or diplopia usually prohibit the use of these camera parameters, due to the mismatch of »camera to object« vs. »observers to screen« distance.

Let us analyze the effects of changing these parameters.

2.2.1 Changing the interaxial distance

How does the interaxial distance affect volume perception? Plot 2.1 shows the resulting depth change.

Plot 2.1

The x-axis shows the cube’s distance to the camera rig. The y-axis shows the ratio depthProjectionPx/widthProjectionPx (equation 2.4). The colored lines represent the different interaxial distances of the camera rig. The dark blue curve in the middle represents the ortho stereoscopic volume

By increasing the interaxial distance the »depth to width« ratio gets bigger, and so the perceived volume. The depth volume is directly proportional to the interaxial distance. The mathematical expression is shown below. Equation 2.5 is the first fundamental depth volume expression 

, (2.5)

The y=1/x ratio of the curve is maintained, as visually seen in Plot 2.1 and mathematically described in equation 2.5. We can compare the interaxial change with a gain of the curve. Doubling up the interaxial distance will duplicate the depth volume.

2.2.2 Changing focal length

Does focal length behave in a similar way? Plot 2.2 shows different focal lengths with constant interaxial distances (65mm).

Plot 2.2

The x-axis shows the cube’s distance to the camera rig. The y-axis shows the ratio depthProjectionPx/widthProjectionPx (equation 2.4). The colored line  represent the different focal lengths of the camera rig. All of them fall on the same curve.

Surprisingly the depth plot was not altered by different focal lengths. The magnification change affects both, depthProjectionPx and widthProjectionPx in the same way (compare equation 2.2 and 2.3). As the depth plot is the quotient of »depth divided by width«, the focal length has been canceled down and does not affect the depth plot and therefore the depth volume.

However this result does not correspond to our daily experience in stereoscopic cinematography. It is well know that long lenses produce strongly reduced volume perception (cardboard effect). For this reason we have to examine focal length in greater detail.

2.3 Focal length and perspective

Point 2.2.2 is mathematically and analytically correct, but it does not represent a typical use of different focal lengths when framing a picture. Normally a change of focal length comes along with a change of distance to an object. We will have to take this change of distance into account, if we want to formulate a meaningful statement.

Example:

We frame a picture and our object of interest is 3 meters away. When we increase our focal length by a factor of 2 we also have to change the distance to 6 meters to achieve the same size of our object of interest within the picture. All other objects will have different sizes in both pictures. See figure 2.2

figure 2.2

The same scene captured with different focal lengths. In both pictures the white guitar on the left has the same image size. To achieve this the distance from camera to guitar was altered while changing the focal length. All other objects have different image sizes.

2.3.1 Natural focal length

Which focal length produces natural looking pictures? This focal length is determined by the angle of view of the observer in the cinema. If the angle of view at capture and the angle of view at the screening is equal, the picture will have a natural appearance. This does not necessarily have to match the normal focal length known in photography. If we have a smaller angle of view (greater focal length) the objects will appear closer as they were shot (telephoto) and vice versa.

So the appearance of distance changes; and we know (plot 1.1) that the distance to the object alters the volume significantly.

For our setup the natural focal length is 29mm (the focal length that matches the predefined observer’s angle of view ).

2.3.2 Object of interest and focal length compensation

We start with an example:

We have framed a picture with a focal length twice the natural length (≈ 60mm in our case) and the object of interest is 6 meters away. We could have achieved the same object size at 3 meters distance using a natural focal length (29mm). With 60mm focal length we are 3 meters farther away from the scene (all objects) than with natural focal length.

If we used a 60mm lens and our object of interest was  10 meter away, we would be 5 meter farther away from all objects. So the focal length compensation is influenced by the object of interest. Equation 2.6 shows this relation.

  • focalCompensate = focal compensation described above
  • distObjectOfIntend = distance to the object of interest
  • angleOf ViewCamera = camera’s angle of view
  • fieldOfViewObserver = observer’s angle of view

, (2.6)

3. Depth Budget

To evaluate a significant object of interest we briefly examine the concept of stereoscopic depth budget and screen plane. One possible stereoscopic strategy (among others) is to allow a certain amount of relative parallax, produced by the farthest object in the scene and the screen plane (= no absolute parallax). The stereographer defines a certain amount of depth budget (usually in pixel or as percentage of the horizontal image dimensions). This depth budget defines where the screen plane will be regarding the farthest object and the camera parameter (focal length and interaxial distance). Equation 3.1 shows the mathematical expression used to calculate the screen plane distance from the camera with the given parameters and a far-point at infinity.

, (3.1)

3.1 Interaxial distance and screen plane

For plot 3.1 we have chosen a depth budget of 25 pixel. At the intersection of the curves with the 25 pixel threshold we draw a straight line down the x-axis. This gives us the screen plane distance for each interaxial distance.

It gets bigger as the interaxial distance increases. There is a linear relation between screen plane and interaxial distance. If we double the interaxial distance we have twice the distance to the screen plane.

3.2 Focal length and screen plane

For plot 3.2 we have chosen the same depth budget of 25 pixel. Again we estimate the different screen plane distances for each focal length.

The screen plane distances get bigger as the focal length increases. There is the same linear relation between screen plane and focal length. If we double the focal length, we also have to double the distance to the screen plane.

Plot 3.1

The x-axis is the distance to the camera rig and the y-axis is the relative parallax in pixel. The colored lines represent depth plots for different interaxial distances.

Plot 3.2: The x-axis is the distance to the camera and the y-axis is the relative parallax in pixel. The colored lines represent depth plots for different focal lengths.

4 Changing focal length and staging – the density of roundness

We have discussed the relation between depth budget, screen plane, interaxial distance and focal length. Now we will use the strategy of screen plane to estimate a suitable object of interest. We define, that the object of interest is at screen plane distance. As we change focal length we will have to change the distance to the screen plane. For a large amount of todays stereoscopic cinema pictures the object of interest is near the screen plane. Again, this is just one possible style of stereoscopic-photography among others.

The approach will be as follows:

First we set the camera parameters (focal length and interaxial distance), then we calculate the screen plane distance. With this value we can then calculate the focal length compensation. Keeping this in mind we revisit at our depth plots again. We change the depth volume values according to the focal length compensation. So if we have a compensation of 5 meters, the new value for 5 meters now is the original 10 meter value. The new 10 meter value is the old 15 meter value et cetera. Graphically we shift the curve along the x-axis by the amount of the focal compensation, which is calculated by the screen plane distance.(equation 4.1)

, (4.1)

The x-axis is no longer the real distance [d] for all focal length. It just shows the same objects vertically aligned for all focal lengths.

Let’s say we have an actor 5 meters away in a ortho stereoscopic setup. He has an absolute volume of 0.013 (dashed blue line). The same actor will have an absolute volume of 0.0069 with a 50 mm lens (green curve) at same image size, because the distance to the camera rig has increased by the focal length compensation to 9.5 meters. So the actor appears with a strongly reduced volume perception, because he is 9.5 meters away from the camera rig, whereas the observer believes that the actor is 5 meters away.

This mismatch causes the unnatural distortion of stereoscopic depth perception (card-boarding).

Equation 4.2 shows how to calculate these mismatch caused by the unnatural focal length:

, (4.2)

Plot 4.1

The x-axis is the distance d from the cube to the camera rig only for the natural focal length (blue dashed curve). The y-axis shows the ratio depthProjectionPx/widthProjectionPx (equation 2.4). The colored lines represent different focal lengths of the camera rig.

Remember equation 2.6 for focal length compensation. Now we know why we cannot draw a general conclusion about focal length and volume perception. We can assert, that a bigger focal length compared to the natural focal length will produce a flattened depth perception and vice versa. But the extent is influenced by the object of interest.

The information provided in chapter 1-4 enables us to develop a lot of creative and artistic concepts and to formulate some key parameters to describe depth volume within a scene.

5. Stereoscopic volume parameters

This chapter tries to provide some useful and practical parameters in order to sufficiently describe depth volume and depth volume perception for each point within a scene. The following three parameters offer valuable information to judge stereoscopic volume perception:

  • specific volume
  • volume gain
  • volume density

5.1 Specific volume 

A suggestion for a specific volume parameter is a slight modification of the depth volume in equation 2.6.

, (5.1)

Plot 5.1

The x-axis is the distance d from the cube to the camera rig. The y-axis shows the specific volume (equation 5.1). The colored lines represent different interaxial distances.

By normalizing the interaxial distance (equation 5.1) the numbers for the depth volume become more handy. For an ortho stereoscopic setup the volume is simply 1 divided by the distance. The specific volume of an object which is 3 meters away is 1/3. If we now decrease the interaxial distance to 22 mm (≈ a third), we will also decrease the volume by this amount ≈ 1/9.

The volume for each individual point can easily be  calculated. This is the absolute value from which all  others are derivated. It is only affected by the interaxial distance and the distance to the camera rig.

5.2 Volume gain

The volume gain shows the ratio between the specific volume in the scene and the specific volume in the same scene with an ortho stereoscopic setup – compared to the object of interest. If the focal length is natural, the volume gain will be the  same for all points in space (the curve becomes a line parallel to the x-axis). Values below 1 indicate an overall shallower depth. Values above 1 indicate an overall greater depth. However, a »cinematic picture« might have a volume gain below 1 for most shots. The more important thing is the consistency of gain. If a long lens is used, then the volume gain will be smaller for near objects. That means that close objects have a more reduced specific volume than objects that are farther away. If the focal length is smaller than the natural focal length, the volume gain is greater for near objects and shallower for the distant ones. Equation 5.2 shows the mathematical relation.

, (5.2)

Plot 5.2

The x-axis is the distance d from the cube to the camera rig. The y-axis shows the volume Gain (equation 5.2). The colored lines represent different focal length at fixed interaxial distance.

The volume gain gives us a good idea of the scene. It tells us the depth volume amplification caused by the interaxial distance and the focal length/staging. When the curve is a horizontal line we have chosen the natural focal length and therefore no unnatural depth volume will be perceived.

Anyway, when you look at a specific value for one specific object (without knowing the curve) the volume gain alone does not tell whether there is a natural or an unnatural gain.

It might therefore be useful to have an indicator just for the mismatch caused by the focal length/staging.

5.3 Volume density

The volume density is the normalized mismatch described in chapter 4. A value below 1 indicates a reduced depth volume perception (card-boarding) and a value above 1 will elongate the volume. The volume density does not account for the change of volume gain caused by a change of interaxial distance.

Equation 5.3 shows the normalization:

, (5.3)

Plot 5.3

The x-axis is the distance d from the cube to the camera rig. The y-axis shows the volume density (equation 5.2). The colored lines represent different focal length at fixed interaxial distance.

A natural focal length will therefore produce the value 1 for all objects in the scene. The distance to the object of interest might change as well as the focal compensation. So the interaxial distance might affect the volume density indirectly.

6. Other depth cues

We should keep in mind, that changing the camera parameters will also affect other depth cues. As the final cognitive depth evaluation is a sum of all perceptual inputs, we should have a brief qualitative look at other depth cues.

Table 5.1 shows how depth cues are altered by interaxial distance and »focal length/staging«

Table 5.1

Qualitative enumeration of some depth cues and how they are affected by the creative camera parameters.

When looking at table 5.1 the importance of focal length gets even more evident. A lot of depth cues are strongly affected by the change of »focal length and staging«. Especially monocular depth cues can only be  modified with the lens. Monocular depth cues and their perception are strongly cognitive. It is hard to develop a quantitative statement with an analytical approach. More empiric research has to be done in this area.

7. Conclusion 

The simple equations 5.1, 5.2 and 5.3 describe the complex interlock of interaxial distance, focal length and staging. Only if each and every of these parameters is available to the stereoscopic creative staff perfection in stereoscopic photography can be achieved.

The most dramatic effect is caused by the focal length because it can distort the volume perception in an unnatural way. Who ever chooses the lens on a stereoscopic show must be aware of its behavior. The lens, when chosen wisely, provides the best opportunity for an creative engagement with stereoscopic photography.

Of similar importance in a »stereoscopic picture« is the staging of action and camera within space. The alignment of »things« is as important as the focal length. Again this is an opportunity for creative engagement with depth and volume.

With the three parameters specific volume, volume gain and volume density the artist has a model that can assist in making creative decisions and make stereoscopic photography more transparent in terms of volume perception.

A color-driven way of thinking about stereoscopic volume capture could be:

Staging is like lightning your scene,

focal length is your contrast correction,

interaxial is your gain correction ,

…and you can hardly alter them in post…

References:

  • Jules, Bela: Foundation of Cyclopean Perception. MIT Press. London 2006.
  • Kuhn, Gerhard: Stereofotografie und Raumbildprojektion. Vfv Verlag. Gilching 1999.
  • Siragusano, Daniele. Target Screensize for stereoscopic feature film. SMPTE Paper 2010.
  • Swartz, S., Charles: Understanding Digital Cinema. Focal Press. Oxford 2005.
  • Tauer, Holger. Stereo 3D Grundlagen, Technik und Bildgestaltung. Schiele Schön. Berlin 2010.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s