TOP  >  Thesis/Dissertation  >  Modeling tuorist activity via probe-car data and spot similarity

Modeling tuorist activity via probe-car data and spot similarity

The number of personal tourists, who freely decide on visiting spot, is increasing in recent years. Conventionally, the questionnaire method in which a tourist records his/her touristic activities based on his/her memory has been used for survey and analysis of the personal tourism. However, tourists cannot remember all of their activities precisely and it is not realistic to collect a lot of data by this method.
On the other hand, a large amount of probe-car data using GPS has been stored. Probe-car data provides cars' location points, speeds and so on. Since probe-car data is automatically stored by car navigation system, it has an advantage that data with high objectivity is able to be collected at low cost compared with the questionnaire method.
Thus, we aim at modeling tourists activities using probe-car data of users who use cars for sightseeing. The tourist activity model helps in comprehending which tourism spot tends to be visited by tourists. Furthermore, it can be used for the spot recommendation system by predicting the tourism spots where a user will visit next. We grasp tourists activities as spot transitions and build the model which can predict the user's next transition.
In previous research related to the tourist activity modeling, tourists activities are modeled using first-order Markov model (we simply call it Markov model). Markov model assumes that the next visiting spot depends only on the current spot. However, there is a sparsity problem in Markov model, that is, it is difficult to predict the next transition from a minor spot where a few users visit. In this paper, we assume that users who are in similar spots tend to move to the same spot, and cope with the sparsity problem in Markov model using spot similarities.
Probe-car data, in this paper, consists of two kinds of data. One is the stop location data, which records when and where a car stops more than 10 minutes. A user's stop location data ordered by time can be regarded as his/her location history of the tour. The other is the place of interest(POI) data, which records the POI name which a user set as his/her destination. In this paper, we relate the stop locations to spots and calculate spot similarities using the POI data to model tourists activities. Details of the proposed method are as follows.
First, areas where many users stop are extracted as spots in order to model tourists activities as spot transitions. We cluster the stop location data by the mean shift procedure and define the spots each of which corresponds to a cluster. Once we have extracted a set of spots, each location history, which is a sequence of locations, is converted to a sequence of spots.
Next, spot similarities are calculated based on external information from the Web. Using external data acquired from the Web, we can (1) get fine features of spots, (2) get features which reflect impression of the POIs people generally have, and (3) relate POIs to sightseeing areas in the real world.
Finally, the scores of transitions from a user's current spot to another spot are calculated. Our model predicts that a user will move to the largest score spot. Conventional Markov model uses data of transitions only from the current spot. In contrast, our model uses data of transitions even from other spots which are similar to the current spot. This enables us to predict the next spot properly even if the user is in a minor spot at the time.
To examine the effectiveness of the proposed method, we conducted an experiment, in which we evaluated the accuracy of the transition prediction by our model. We use the probe-car data in Kyoto in 2011 stored by car navigation systems produced by Pioneer. We filtered out tourists from all the car navigation users by collecting data of users who live in outside of Kyoto prefecture and who have stopped in Kyoto city area on at most 7 days. Applying proposed method to 4 data sets, which vary in parameters of spot extraction, the prediction accuracies for all 4 data sets were higher than ones with Markov model. This result shows that our model describes tourists activities better than the conventional method.