[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
Because cooking is complicated work on a daily basis, such system is expected that gives the advices to the cooker according to his/her situation. For realizing such system, we address the problem of classifying the name of the food product the cooker cooks from the list of food products in the recipe. Because "Cutting" is one of the most basic process in cooking, in this research, we aim classifying the food object the cooker is cutting.
In the previous research, food products are classified by their sight feature with camera. However, a cooker often cooks two or more food products that have similar sight feature at the same time. In such case, classification by sight feature does not perform.
We propose the method for classifying a food product using vibration sounds generated by the cooker cutting it. The vibration sounds contain the features of kitchen knife making a slit in the surface, passing over inside, and hitting on the cutting plate. It means that the vibration sounds change depend on an internal structure and the size of the food product. But, capturing the vibration sounds with a general microphone, irrelevant noises such as voice and air fan's sound are included in them. For solving this problem, we capture the vibration sounds directly, by the contact microphone put on the back of cooking counter.
The basic vibration sound generated by cutting process consisted of two peaks and the vibration between them. The first peak is occurred by a kitchen knife making a slit in the surface of a food product, and the second is occurred by the knife hitting on the cutting plate. The vibration between them is occurred during the knife is passing over inside the food product. According to this analysis, we picked up 3-features: amplitude of the first peak, amplitude of the second peak, and wave at passing, for classifying food products.
At the first, we extracted the amplitudes of first and second peaks about many kinds of food products and compared them. And we found that these two features widely spread even they came from the same kind of food product. It means that the system cannot classify a food product by these 2 features.
Second, we evaluated the classification ability of "wave at passing". We extracted the vibration from 0.2 second just before the second peak to the second peak as an analytical segment from various kinds of food products. Then we calculated a spectrogram for each analytical segment and compare them. And, we discovered that the feature which depends on the kind of the food product appears in the low frequency part. We picked up 16 dimension feature vector consisted of average and variance of the spectrogram on the lowest 8 frequencies with 128 frequency resolution as classification features.
We conducted the following experiments to evaluate the classification ability of food products by the feature vector described above. We choosed "cabbage", "carrot", "cucumber", "onion", "green pepper", "potato", and "tomato" as subjects that were typical and frequently appeared in many recipes. And we prepared 4 to 6 substances for each food product, and extracted the sets of analytical segments from the vibration sounds by cutting each substance. Then, we converted each analytical segment into the 16 dimension feature vector described above. We trained a classifier for each food product by SVM (Support Vector Machine).
In the result, the classifier trained by all analytical segments succeeded to classify more than 75.0% of analytical segments for all food products. On one hold cross-validation, although classification rate of the tomato was low as 17.8% of analytical segments on average, the other food products were able to be classified more than 52.9% on average. When the system decided the classification result by a majority vote of sets of analytical segments extracted from one substance, it succeeded to classify 88.9% substance on the average except tomato.
We also evaluated the classification ability for the pair of the potato and the onion, and the pair of the cucumber and the green pepper as similar colored food products. The classification rates of both pairs showed more than 81.4%. It means that the system combined this structure with image recognition is able to improve the classification rate.
As a future work, we will construct such structure that integrates the classification result by image and the vibration sound.