TOP  >  Thesis/Dissertation  >  Supporting Student Pose Tagging by Small Number of Training Samples

Supporting Student Pose Tagging by Small Number of Training Samples

In recent years, Faculty Development (FD), aiming at improving lecture contents, has become mandatory in universities and graduate schools. Many lectures have been analyzed as a part of FD. There are many analyses focused on observing students’ behavior, indicating the attitude of students attending the lecture.

It is necessary to record and classify students’ behavior when observing students’ behavior. However, the observer should bear heavy task for the classification and record of students’ behavior since they have been produced manually. It would be better for the observer if the procedures can be done automatically. A method for automatically recording and classifying students’ behavior has been proposed based on a posture recognition method. The posture recognition method relies on the body parts(head, torso, etc.) position extracted from the student video. Spatio-temporal constraint is introduced to reduce the error of body parts position and enhance the accuracy of posture recognition. In this approach, the posture recognition method applies the same constrains for different students in the video. However, students’ poses are diverse from each other and the relationship of camera and seat is different among students. For this reason, the accuracy of posture recognition is not so good.

In this paper, we use a posture recognition method for semi-automatical student posture tagging, to obtain high accuracy of behavior recognition. We extract small number of samples for each student from the video. We use them to train the specialized classifier for each student. It is expected to enhance the accuracy by using the specialized classifier. Although, all the samples for training classifier should be tagged manually, since the number of the samples is limited, there will be no more heavy task for the observer.

To obtain high accuracy of behavior recognition, the extracted samples should be representative ones that reflects the tendency of pose taken by the student. In this paper, we focus on 3 points when considering methods of extracting samples. They are ”to cover all postures that are appeared (Completeness)”, ”to reflect the variation of the features obtained from the same students (Diversity) ”, and ”as few samples as possible(Limitation) ”

Considering the above 3 points, we propose 3 methods for extracting samples. Given a sequence of frames, in ”equal time interval sampling”, we just simply extracts one frame from fixed time interval. In ”variation weighted sampling”, the variation of a frame is calculated from the variance of the feature extracted from the frames in a given time period. The frame which has a large variation has a large probability selected as a sample. In ”clustering sampling”, we generate several clusters which contain frames sharing similar features and extract samples from each cluster. The 3 methods have their own advantages.” Equal time interval sampling” has high completeness, while ”variation weighted sampling” has high diversity. The combination of the 3 method can combine the advantages of the methods. In this paper, we evaluate 5 methods by combinating the 3 methods memtioned above, that is ”equal time interval sampling”, ”variation weighted sampling”, ”clustering sampling”, ”clustering + equal time interval sampling” and ”clustering + variation weighted sampling”.

In the experiment, we apply 5 sample extraction methods to a student from the student video taken in a real lecture. Comparing the average accuracy, ”equal time interval sampling” shows better and stable accuracy. However, when the number of samples is limited to a smaller number, ”clustering + variation weighted sampling” shows better accuracy.

As a future work, we should apply our method to more students and investigate how small number of samples are needed for the accurate student pose tagging.