[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
In recent years, there have been a lot of trials to make archives of lectures and distribute them as e-learning content in educational facilities. When users watch the lecture video, the object in the video that users are interested in is different for each user. Also, the included objects and resolution of the video are determined by the producer of the video. Therefore, existing archiving systems cannot satisfy every user's demand regarding the desired objects and resolution of the video.
In order to be able to satisfy all users' demands, we think of acquiring an image which includes all objects that users could have interest in and that has high resolution of them. Acquiring such an image allows users to choose objects from the image and select the resolution of the objects. Therefore, we aim to obtain a wide-field image taken from the rear center of the lecture room.
Image mosaicing is often used for acquiring wide-field images. When the whole scene of a image is or is approximately a single plane in the 3D world, we can use planar projective transformation to synthesize images. Therefore, we use planar projective transformation for synthesizing the background image of a lecture room, because a lecture room is basically constructed from planes.
In the case when the image that we apply the planar projective transformation to includes non-planar object, the region including the object is distorted. In a lecture room, we can assume that screens and walls are planar, but cannot assume the lecturer to be a plane. Existing methods apply planar projective transformation to images which include lecturer, which results in distortion in the lecturer region.
In this paper, we propose a method for synthesizing a wide-field image of a lecture room which has no distortion in the lecturer region. Firstly, by shooting and tracking the lecturer using a camera the pose (pan, tilt, and zoom) of the which can be controlled, we acquire an image which has no distortion of the whole body of the lecturer. In addition, on the assumption that objects except for the lecturer can be approximated a plane or a set of planes, we can apply planar projective transformation to images which are captured by several fixed cameras shooting all objects except for the lecturer and synthesize a wide-field background image of the lecture room. Finally, we project the wide-field background image onto the image plane of the lecturer tracking camera at a current pose. Then, we acquire the wide-field image of a lecture room.
In order to acquire a wide-field background image of the lecture room, we calculate the planar projective transformation matrix H between the image of lecturer tracking camera at a chosen base pose and each image of the fixed cameras by assigning the coordinates of the corresponding points by hand. Using H, we acquire a wide-field background image which is on the image plane of the lecturer tracking camera at the base pose. So, we can project the wide-field background image onto the lecturer tracking image plane at the current pose with single projective transformation matrix M, Next, we estimate M from the current pose of the lecturer tracking camera. We select some pose of the lecturer tracking camera and calculate M corresponding to the pose in advance. We calculate M corresponding to the current pose by bilinear interpolation from M corresponding to neighbors' sampled pose.
Firstly, in order to evaluate the precision of the estimated M, we calculate the Euclidean distance between corresponding points in the wide-field background image and lecturer tracking image. The Euclidean distance of the proposed method was about 10px, which was almost the same as the Euclidean distance corresponding to the M calculated from the data of strong calibrated cameras. Next, we synthesized a wide-field image of a lecture scene using the proposed method. We acquired the image which has no distortion of the lecturer region and the other region. On the other hand, difference between the projected wide-field background image and lecturer tracking image occurred, which came from the error of the estimation of M. Finally, in order to confirm that the image synthesized by our method has higher resolution than the image captured by a general camera, we compared the region of the lecturer's face and confirmed that the image acquired by our method has higher resolution.