[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
The author proposes a method for estimation of the position and pose of a real object for purpose of virtual object manipulation in the virtual studio. Recently the system called virtual studio is widely used in image processing industry, TV broadcasting, teaching how to manufacture video, film production, etc. In the scenes created with the virtual studio system, the virtual objects can be seen by the audience as if they and the real objects were in the same world. However, in the studio, the actor is not able to see the virtual objects from his point of view and manipulate them by real object as they manipulate do it usually. The situation in which the actor manipulate virtual objects as comfortable as if they are the real objects is required. The actor should act according to the movement of virtual object positioned in advance, or the studio engineer should adapt the virtual objects to the actor's movement. In these method, the actor does not manipulate the virtual objects for real, thus it is difficult to act naturally. In order to solve this problem, the system of visualization and manipulation, by which the actor can look at the virtual objects from his point of view and manipulate them as well as real objects, is required. It is easy to realize the visualization part of the system, because see-through wearable-display, used in "Augmented Reality (AR)", the existing technology, can be used in such case. Meanwhile, the manipulation part of the system is difficult to realize with AR. To make the actor manipulate virtual objects, like the real ones, the method to manipulate them by real objects is required. Position end pose estimation of the real object is necessary for this method. Conventionally, position sensor, stereo vision technique, optical flow method, and three dimensional geometric model have been utilized in the above process. However, on the one hand, it is not desirable that position sensor or cameras for stereo vision come within the range of the studio camera, which captures the video-image. There is also a restriction of the installation location of the sensor and cameras in virtual studio. In addition, considering that the occlusion of hands arise over the real object when it is manipulated, it is difficult to track specific points, which will be utilized for stereo vision technique and optical flow method for the real objects constantly. Moreover, the method using accurate three dimensional model is not desirable, because it is expected that the system would allow actors to use the real objects with various shapes to manipulate virtual ones. On the other hand in virtual studio situation the author can use captured images from the studio camera, and sensor data from a ultrasonic position sensor which is installed outside of the studio camera's range. The use of the ultrasonic position sensor (ultrasonic beacon) on the actor's forearm covered with sleeve lets the author get the position and the pose data of wrist in wireless manner and without influence on the video image.
Furthermore, the author can make assumption for the physical relationship between the actor's hand and the grasped real object, because it can be said that the forms of the hand is limited when the actor grasp the real object for the purposes of manipulating virtual objects.
In this research, to estimate the position and pose of the real object grasped by the actor, the author uses single studio camera and the ultrasonic position sensor, and applies the assumption of physical relation among wrist, palm and the grasped object to the data. First of all, the author get the pose and position data of actor's wrist from the ultrasonic sensor put on his forearm, and then, applies several constraints sequentially. The centroid position of the actor's hand in the video-image from studio camera, the distance between wrist and centroid of the actor's hand measured beforehand, the assumption of physical relation among the actor's hand and the grasped object, and the information of the area of the real object grasped by the actor in the video-image. Conclusively, the author get the estimation of the position and pose of the real object grasped by the actor. To make the assumption above, the author referred to the classification methods of grasping hand pose, conducted in the field of robotics and occupational therapy.
The author picks up the cylindrical object as the real object to estimate its position and pose, because its shape is simple and such objects are often used by the actors, for example, it can be often used for pointing gesture. The author implemented the method above, and tested its effectiveness experimentally. The experiment was concluded in a virtual studio, with the actor pointing with stick shaped like an usual pen. A ultrasonic position sensor was attached to the tip of the stick. The author compared the position data from the ultrasonic sensor and the estimated stick tip position.