From photon to byte – part 1

Oh, of course, everybody knows how a point is transformed to a pixel. But personally for a pretty long time I had more or less blurred vision of what exactly is done by which matrix. Maybe it would be a useful to write it down in a consistent way? In next few artictles I would like to show a way of data from photons in 3D world to precise infromation in computer memory. So let’s start!

In our example we will use some example points which will be just cube. So lets start with some code to draw few points. We will use matplotlib to see real coordinates.


import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
r = [-1, 1]
cube = [[x,y,z] for x in r for y in r for z in r]
C = np.array([3*np.sqrt(3), 3/np.sqrt(2), 3/np.sqrt(2)])
camera_dir = [C, 0.8*C]
edges = [[x,x+2**d] for x in range(0,8) for d in range(0,3) if not (2**d)&x]
edges.append([len(cube), len(cube)+1])
cube.extend(camera_dir)
def plot(points):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.set_aspect("equal")
    for e in edges:
        ax.plot3D(*zip(points[e[0]], points[e[1]]))
    ax.auto_scale_xyz([-4,4],[-4, 4], [-4, 4])
    ax.set_aspect("equal")
    plt.show()
plot(cube)

Okay, so we’re ready. On a start we have some scene with 3D points which emit photons in direction of camera (that’s why we see them).  Unfortunately, we want to look at the world from perspective of a camera but usually we have another point of reference than a center of our camera (let’s say, corner of desk of room). Because all later computations need to be in a camera-centered frame of reference we have to transform scene in some way. And here the first matrix appears: K.

K is the extrinsic parameters matrix. It translates whole world to world where point \([0,0,0]\) is center of camera and camera looks exactly in direction of Z axis. How we can do this transition? The easiest way is to first rotate points to get Z axis aligned with camera direction and then shift all points to align camera to coordinate system center. Why? Because as usual we’re working with homogeneous coordinates what means that our point \([x, y, z]\) in fact is point \([x, y, z, 1]\). Thanks to it our translation is pretty easy – after 3×3 rotation matrix \(R\) we’re adding column with proper translation – and it works like a charm:

$$ [ R | t ] \cdot [x, y, z, 1] = [ R | t ] \cdot ([x, y, z, 0] + [0, 0, 0, 1]) = R \cdot [x, y, z] + t $$

Okay, so on the start we need to get rotation matrix. Easiest way to do that is to compose it from three matrices, each represents rotation around one axis. For the spicy math details I will direct you to wikipedia.


def rotation2d(angle):
    return np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
def rotation3d(angle_x, angle_y, angle_z):
    Rx = np.identity(3)
    Ry = np.identity(3)
    Rz = np.identity(3)
    Rx[1:,1:] = rotation2d(angle_x)
    Ry[[[0,2],[0,2]],[[0,0],[2,2]]] = rotation2d(angle_y)
    Rz[:2,:2] = rotation2d(angle_z)
    return np.dot(Rz, np.dot(Ry, Rx))

Okay, so now we can calculate proper angles for our camera. After moment of thinking we may find out that firstly we need to rotate them by \(\pi/4\) around \(x\) axis, then \( \pi/6 \) around \(y\) to get vector pointing in X direction. But we want Z direction, not X – so we’re adding \(-\pi/2 \) of rotation in \(y\) direction. And we have proper direction, nice.

What about shift? Unfortunately we rotated all points so we don’t know how to shift whole scene to set camera center in (0,0,0). Or do we? Of course it’s enough to get opposite of rotated vector C:  \(t = -R \cdot C\)


R = rotation3d(np.pi/4, np.pi/6 - np.pi/2, 0)
t = -np.dot(R, C)
K = np.hstack([R, t.reshape(3,1)])

After quick verification:


>>>np.dot(K, np.append(Cp, 1))
array([ 0., 0., 0.])

We may be happy that everything works. Now transform whole scene:


def to_hmg(arr):
    return np.append(arr, np.ones((len(arr),1)), axis=1)
def from_hmg(arr):
    return arr[:,:-1]
transformed_cube = [np.dot(K, point) for point in to_hmg(cube)]
plot(transformed_cube)

Oh, great, it works! We transformed whole world to place where camera is center of frame. How data is processed next? From this moment we start to think ‘as a camera’ and we will map points to pixels. But it’s a task of another matrix – and I would describe in next post.