[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
This paper describes a method of reconstructing the shape of rigid objects. In the Internet and virtual reality (VR) applications, the reconstruction of the shape is very critical, because we can use them to create more realistic virtual objects.
Several methods have been proposed. Those methods are categorized into three categories; the laser range finders, the multi-baseline stereo and the volume intersection technique.
The laser range finder technique is not suitable to the objects that do not reflect laser's illumination. The multi-baseline stereo technique is based on photometric consistency, so this method is not robust for specular reflection and change of illumination. Comparing with these two methods, the volume intersection technique can reconstruct various objects, because it is robust to specular reflection and change of illumination. Therefore, we use volume intersection technique to reconstruct the shape of objects.
By the volume intersection technique, we obtain the intersection from two or more conic volumes created from the silhouette in the image. The intersection is called as the visual hull. Generally, higher accuracy of reconstructed shape is obtained as we use more viewpoints in volume intersection method. However, the spatial limitation makes the number of used cameras limited. This limitation is caused by the size of the camera's body and the calibration of location, and affects the possibly improved accuracy of the visual hull.
This paper describes the accuracy improvement of the shape's reconstruction by integrating visual hulls in time sequences. Shape of one visual hull is different from any other visual hull calculated at other time, because of the difference of objects' and cameras' location. If the rigid object motion of the target is known, we can integrate the visual hulls in time sequences. Integration of visual hulls in time sequences is equivalent to the increment of the number of cameras.
We acquire rigid object motion of the target from the visual hulls in time sequences. The rigid object motion of the target is calculated by tracking feature points on the visual hulls. Matching the feature points in the visual hull at other time enables tracking of them. Tracking landmarks are effected by the location of the object and viewpoints, because they on the object correspond to ones on the visual hull. The landmarks on a visual hull are adopted as feature points.
A landmark has two features. One is the landmarks are projected to images as elements of edges of silhouette. The voxel which is projected to the edges in many images is likely to be a landmark. However, in the case when many view angles are similar, such voxel is not likely to be a landmark. This problem is solved by weighting the view angles. The voxels that have high likelihood are extracted as landmark candidates.
The other feature of the landmark is the tendency to occur in the direction of the thin line of the visual hull. The thin line is a line one voxel wide and lies in the center of the object. To extract the direction of the thin line, thick parts and short branches must be removed from the thin line. We construct a tree of thin line. Using the tree, thick parts and short branches are removed, and the direction of the thin line is estimated. After that, feature points are extracted from the landmark candidates.
Matching of feature points is obtained by Powell's convergence test. The error function is the measure of difference among the feature points.
If a feature point, which is extracted at one time, is not detected later, a mismatch occurs. Robust coordinates registration technique solves this problem. It neglects the feature points if they do not occur in the defined neighborhood. The rigid object motion is obtained from these processes.
In the experiment, we reconstruct shape in a simulated model environment. As a result, landmarks are extracted and accurate rigid object motion is obtained. After the integration of visual hulls using the extracted motion, we can obtain better accuracy in shape from integrated visual hull than from a visual hull at one time.