TOP  >  Thesis/Dissertation  >  {Recognition of ``Stirring'' Actions based on the Positional Relations between Cooks' Hands and Cooking Containers}

{Recognition of ``Stirring'' Actions based on the Positional Relations between Cooks' Hands and Cooking Containers}

{Researchers' interests in cooking support systems are growing and several prototypes of the systems have been developed in recent years. These prototype systems have assumed that a cooking preparation is a totally ordered sequence of ``unit processes,'' which are combinations of culinary actions (such as ``cutting,'' ``peeling,'' or ``stirring'') and its target ingredients. However, this assumption will not be entirely true because the appearing order of the unit processes can be shuffled partially by a cook. To cover such a case, the cooking support systems need to estimate which unit process is ongoing now as well as predict which unit process will be begun next. One of the key techniques for handling the estimation and prediction tasks is to recognize the culinary actions. One of the most frequently-appearing culinary actions is ``stirring'' action, which can influence the quality of the cooking results. If the systems recognize the cook's ``stirring'' action automatically and give an advice to him/her about that action, he/she will be able to make a finer dish. Thus we aim to propose a method for recognizing the ``stirring'' actions in this paper, where they are defined as the action of beating up the ingredients in a cooking container such as a bowl or a pan with the cook's hands or cooking utensils like chopsticks. There are some previous methods which have been proposed for recognizing the culinary actions including ``stirring''. However, these previous methods sometimes misjudge that the cook is stirring something while he/she is actually not stirring. This kind of misrecognition is caused because the previous methods have not used the characteristic features which can be observed only when the cook is stirring. In this paper we will solve the misrecognition problem using the following two characteristic features of the positional relations between the cook's hands and a cooking container: One is that (A) the cook's hands are in or just above the cooking container, and the other is that (B) the cook's hands and the ingredients are continuously moving in the cooking container. The constraints of the positional relations described in (A) and (B) are satisfied when and only when the cook is stirring the ingredients. We observe the above positional relations with depth maps which are obtained from a depth sensor mounted on a ceiling of a cooking room. The detail of the proposed method is as follows: First, the cooking container is detected from each frame of the obtained depth map sequence using a circle detection technique. Next, the cook's hands are detected around the region of the detected container for each frame based on the difference of the depth values between the container region and the hands region. If either the container detection or the hands detection failed on a certain frame, the constraint (A) is determined not to be satisfied on that frame. Using the results of these detection processes, the time intervals during which the constraint (A) is continuously satisfied are extracted from the entire depth map sequence. Each of the extracted intervals becomes a candidate of the recognition results. Finally, we find the candidate intervals during which the constraint (B) is satisfied from the whole candidate set based on the number of the pixels whose inter-frame differences is large. The intervals found in the last step become the final results. To evaluate the performance of the proposed method, we carried out an experiment. In the experiment, we used a Kinect sensor as the depth sensor and got three depth map sequences in which a cook making a salad is observed. We show the depth map sequences to a human evaluator and instruct him to manually extract the intervals during which the cook was actually stirring, which were compared with the outputs of the proposed method. In the result, the proposed method basically could output the same intervals with those of the human evaluator. Each pair of both resultant intervals shared more than 90\% of each other's duration in an average. This indicates that the proposed method is effective for recognizing the ``stirring'' actions. In a few intervals output as the recognition results by the proposed method, however, the cook paused in her ``stirring'' action. The proposed method cannot detect this kind of pauses currently. One of the future works is to deal with this problem, for example by analyzing the inter-frame differences on the region of the cooking container in more detail.}