[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
With the recent progress of multimedia technology, various media including images become widely used. In order to produce images, we need to ask designers for help to produce the images, by describing our imagery for the images. This paper focuses on the case in which we can describe visual features, which are the shapes, colors and layouts of objects in the images. We call the imagery that can be described by these visual features, ``visual imagery".
We usually use linguistic expressions to describe our visual imagery when we communicate it to the designers. Since it is difficult for us to describe the visual imagery by the linguistic expressions without ambiguity, we needs to interact with the designers in order to verify whether we can share the same visual imagery with the designers by checking and modifying the prototypic images produced by the designers. It takes a lot of time for us to share the same imagery with the designers by this interaction.
In addition, it becomes usual for us to make a contact with designers directly using the internet in order to ask them to produce the images. In the internet, we can employ various kinds of media for communication, but we cannot have many interactions in comparison with face-to-face communications. It is not realistic to assume frequent interactions to share the same visual imagery with the designers under this situation.
In this paper, it is proposed to use sample images together with linguistic expressions, in order to share the same visual imagery with others by less interactions. Those sample images are synthesized by the system by combining image regions, which are stored in the database.
In comparison with linguistic expressions, sample images can describe the visual imagery with less ambiguity. However, a single sample image sometimes represents only a part of the imagery of the client, due to the small ambiguity. In this paper, a set of sample images is used to represent the imagery. We call this set of sample images ``representative images."
The representative images corresponding to the imagery of a user are obtained by selecting the images that are different from one another, in the $image$ $feature$ $space$, which is a space with the axes corresponding to the feature parameters of images. This process is realized as the following steps.
The user draws a rough sketch called a $query$ to describe his/her imagery. The system synthesizes various sample images by replacing each region of the query by an image region in the database. The user selects images corresponding to his/her imagery from among those sample images. The sample images that are selected by the user corresponds to the positive examples of the user's visual imagery, and the images not selected corresponds to the negative examples of the imagery. The sample images are synthesized by the system so that they are close to the positive examples and different from the negative examples obtained so far. The difference between sample images are evaluated based on their visual features.
This method are evaluated by the experiments with pair of two subjects. In the experiments, one of the two subjects describes his/her visual imagery with linguistic expressions, a query and representative images, respectively. The other subject estimates the visual imagery from those descriptions. The correspondence between the visual imageries of the two subjects are evaluated by the images selected by each subject from test images as those which correspond to the visual imagery of the subject. The results of this experiment show that the imagery estimated from linguistic expressions strongly depends on the subjects and often does not correspond to the original imagery to be communicated between the two subjects. By employing a query together with linguistic expressions to describe a visual imagery, what is estimated form the descriptions becomes to be more closer to the imagery, but it covers only a limited range of the imagery. When a few representative images are additionally employed to describe an imagery, what is estimated from them can cover the broader range of the described imagery. However, the correspondence between the imagery and what is estimated from its representative mages decreases when the representative images increased and become quite similar to one another.