[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
We are concerning about a method to estimate an object of students attention in order to use it as an index in a lecture archive.
It has become possible to maintain multimedia data, such as video image of lecture or voice of lecturer in computers. We call this kind of data as lecture archive. By using the lecture archive, students who attended the lecture are able to review the lecture, or students who did not attend it are able to study the subject in the lecture by themselves. Also, lecturers are able to evaluate their lecture in orderto improve it.
Usually, a user of a lecture archive does not watch all the part of the archive but only the part which he/she is interested in. In other words, user can search a part from the lecture archive. This is achieved by putting indexes in lecture archive.
There is a fact that students in lecture room are paying attention to object they are interested in. Hence, we consider to generate indexes of the lecture archive about the objects of students' attention. By using the indexes, users of a lecture archive can search a part of the lecture archive based on which object students in the lecture room are interested in. In order to generate such indexes, it is necessary to estimate the object of students attention.
In a conventional method, first, we figure out the face direction of each student. Then, with the position of each student, the position of each object and the face direction of each student, an object gazed by most of students is estimated. It is regarded as the object of students attention.
Since it is needed to figure out a face direction of each student in this approach, each face of students should have a large area enough in the image for estimating the face direction. As the number of the students becomes larger, this requirement becomes difficult to fulfill in a lecture room.
In order to solve this problem, we propose a method to estimate object of students attention not by estimating the face direction of each student. We consider whole students as one thing, then use the direction of this thing to estimate object of students attention.
In our method, we use feature that, the skin area of an image taken by camera changes depend on the position of camera. Hence, we don't use one camera but use several cameras placed at different positions. We extract skin area images of each cameras, then weight those skin areas in order to loosen skin area difference caused by different position of cameras. Finally, comparing skin area from each image, we estimate object of students attention. Also in order to prevent occulusion of students faces, we place cameras high enough looking down students.
First, in order to verify our method under condition where only skin area of faces are detected, we performed experiments using 3-D model of face. We simulated a lecture room with its seats and cameras, and then we placed 3-D face models on seats and rotated those models. Comparing skin areas taken from virtual cameras, we verified that our method can be adopted to estimate a direction of a student's face.
In turn, we estimated object of students' attention using videos taken by several cameras placed on a real lecture room. We used 3 cameras. In order to take picture of all students, we placed 1 center camera at a wall of a platform side in a lecture room directing to the students. And we placed the other two cameras, at right and left of center camera. The number of students was 9.
From experiment result, we succeeded to estimate the direction of students with about 80% rate at ideal environment. In other hand, we only succeeded to estimate with about 50% rate, when we performed experiments at real lecture room. The accuracy is low compared to the result in an ideal environment because a detecting skin area from images is not accurate. Hence, in future work it is needed to improve the accuracy of skin area detecting from images.