Abstract
[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]


High Speed 3D Reconstruction by Pipelining of Video Image Processing and Division of Spatio-Temporal Space


With improvement in processing speed of computers and with increase of their storage size, it may come true to synchronize a virtual space in computers with a real 3D space in the near future. For instance, on a computer supported tele- conference, ther are some attempts to update the virtual meeting space according to pose of remote participants and position of remote objects. In these attempts, remote computers reconstruct remote sites respectively and then transmit the information of the reconstructed site to local computers. Once a real 3D space is reconstructed, we can process it with computer graphics technique. To synchronize a virtual space with a real 3D space, the real 3D space should be reconstructed in real-time.

Conservative reconstruction methods need too much data and too much calculation to do in real-time. This problem is resolved by distributed computing. We reconstruct a real 3D space with several video cameras and serveral computers in the distributed computing environment. In our method;

The reconstruction algorithm has to be suitable for distributed computing, so that the algorithm should have the following two characteristics.

The viewing frustum method(VFM) satisfies these two characteristics. With VFM, objects in a real 3D space are reconstructed from several images taken at same time. We describe the real 3D space by voxels and estimate whether the space corresponds to a voxel is occupied by objects or not. Our method can make this estimation regardless of the value of voxels at another time or the value of the neighborhood voxels. We consider the space as spatio-temporal space, and divide the spatio-temporal space based on this locality. The latency is decreased by reconstructing these spatio-temporal subspace with several computers simultaneously.

We consider a situation that we can get the model of static objects in the real 3D space in advance. In this case, the real 3D space can be reconstructed only by reconstructing the dynamic objects. To reconstruct the dynamic objects by VFM, we should detect moving region in each video image. Therefore, we reconstruct a real 3D space by the following three stages.
1. capturing video image
2. extraction of moving region
3. reconstruction by VFM
These three stages are executed by three processes Video Server, Extractor, and 3D Composer. We construct a pipeline with these processes and achieve improvement of throughput.

We experimentally reconstructed a lecture room of department of information science in Kyoto University by this method. We used 4 computers for video image processing, and 4 more computers for VFM. All 8 computers were connected on a LAN(100BaseT Ethernet and 155Mbps ATM). The size of a voxel was a cube of 5 centimeters on a side. The throughput of the result was 7.2 frames per second, and the latency was 0.5 seconds. In case we used 1 computer for VFM, the throughput was 2.2 frames par second and the latency was 1.4 seconds. As compared to the result of 1 computer for VFM, the throughput of our method was improved 3 times, and latency was decreased 1/3.


Go back to Thesis Page