[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
Various studies have been presented in order to shoot human activities such as lectures, concerts, plays, and so on automatically. We study in the case that one camera shoots them.
Shooting can be defined by a camera-work which is described by 2D information on a video image. A shooting camera is controlled so as to realize a given camera-work which is determined appropriately for the situation of the activities. In the case of automatic shooting with a pan/tilt/zoom camera, the given camera-work is determined by referring a 3D location of a shooting target and a state of a direction of the camera. The camera-work sometimes can not be entirely realized because of uncontrollable factors such as noise on location measurement of the target and uncertain factors involved in a camera rotating mechanism. Therefore, it is necessary to evaluate whether the given camera-work is entirely realized or not and to adjust camera control on the automatic shooting. Since video images are generated as a result of the automatic shooting, it is reasonable to refer image information extracted directly from the resultant video images on evaluating and adjusting.
In this paper, we propose a method to evaluate the camera-work and to adjust the camera control. We assume that the pan/tilt/zoom camera shoots one target of which a location and a velocity are unknown and that the camera-work to be realized has already been determined to meet the situation of the shooting.
With a pan/tilt/zoom camera, the camera-work can be defined by camera-work state variables. They are a 2D location and a 2D size of the target, its velocity and magnification rate as their time differentials, and a velocity of the background in the video images.
Camera-work evaluation needs to estimate current values of the camera-work state variables. In order to do so, it is necessary to separate a target region from a background region in the image. This is difficult without the knowledge of the target, such as color, shape, texture and so on.
We overcome the difficulty by paying attention that the number of the target in the camera-work is limited to at most one. This means there are expected to be at most two regions each of which has unique motion in the image. One is the background region and the other is the target region if it exists. Hence, there are at most two sets of optical flow. In addition to that, the optical flow enables us to estimate the velocity and the magnification rate of the target region and the velocity of the background region respectively. Estimations of the optical flow are often influenced by noise such as outliers. We develop an estimation method which is able to extract two different velocities and magnification rates regardless of influence of the outliers. We use M-estimators in the estimation method.
To adjust the camera control after the camera-work evaluation, we calculate camera control parameters with estimated values of the camera-work state variables. We use Kalman filter which is appropriate to filter both the location and the velocity of the target at discrete time intervals.
We conducted two experiments to show the validity of our method. The first experiment shows that the camera-work evaluation was correctly carried out against the video images in the situation that the camera-works had been determined in advance. In the second experiment, both the camera-work evaluation and the camera control adjustment were properly achieved to realize the given camera-work.