TOP  >  Thesis/Dissertation  >  Detecting Start and End Times of Object-Handlings on Table By Integrating Camera and Load Information

Detecting Start and End Times of Object-Handlings on Table By Integrating Camera and Load Information

Recently, many systems have been proposed to support human activity in real time by multimedia guidance such as sound, video, and so on. An intuitive way of manipulating the systems is to use actions performed naturally in the activity. Such intuitive manipulations help users do the tasks. As the natural actions, we focus on putting and taking actions to objects on a working table. We naturally handle the objects when doing the tasks at the table.
Detecting putting/taking action to each object helps predicting what task the user intends to do next. Hence, the intuitive interface is realized if the system is manipulated throuth these actions. Our goal is to detect putting/taking action to each object on the table in real time through the observation.
There are two different traditional approaches to detect these actions: by cameras and by load sensors. The approach by cameras observes the objects to estimate their positions and to detect putting/taking action to each object. As humans understand various events from visible infoemation, an image contains rich information and is used for wide variety of applications. However, cameras cannot track objects when the objects are occluded. In such case, this approach cannot detect putting/taking actions until the finish of the occlusion. In this sense, this approach cannot detect these actions in real time.
The other approach, by load sensors, uses variation of load. The load variation is caused by putting and taking objects on a table, and is a good evidence to detect putting/taking actions. It is easy to observe the variation because load signals are one dimensional. However, this approach cannot differentiate putting/taking actions when two or more objects are put and/or taken at once. In such case, the load change in load signals caused by the objects' displacements are all combined into one variation of load.
We overcome these difficulties of the sensors by integrating them on a statistical model. Cameras can differentiate putting/taking action to each object as long as they are not occluded, even through two or more objects are displaced at once. Load sensors can detect at least that any of objects are displaced, by focusing on variation of load. This is done in realtime. Hence these two sensors compensate each other by an integration. Moreover, our statistical model enables to recover objects' locations by using camera's information after the finish of an occlusion. This recovery is necessary when two objects are displaced at once under the occlusion.
We evaluated our approach by three experiments; one contains the difficult situations for the use of only the single sensor, and the other two observe real activities at a working table. In the first experiment, our approach could detect putting/taking action to each object correctly while the objects were occluded and two or more objects are displaced at once. Our approach could also estimate the objects' locations rightly when the object is occluded.
In the second and third experiment, we applied our approach to two real activities: paper craft and cooking tasks. As a result of the paper craft task, for detection of put objects, the recall was 64.3%(9/14 times), and the precision was 34.6%(9/26). For detection of taken objects, the recall was 30.0%(3/10), and the precision was 14.3%(3/21). As a result of the cooking task, for detection of put objects, the recall was 58.3%(7/12), and the precision was 29.2%(7/24). For detection of taken objects, the recall was 56.3%(9/16), and the precision was 47.4%(9/19). Load changes caused by disturbance were also successfully rejected [At a rate of 63.0%(17/27)]. We need to improve the algorithm for separating these disturbing vibrations of load in the future.