[Japanese | Thesis | Researches in Minoh Lab | Minoh Lab]
This paper presents an algorithm of learning next hop for file discovery to reduce network load in Peer-To-Peer network in which each computer communicate with each other on an equivalent basis, and verifies effectiveness of the algorithm by simulation experiments.
With increase of computers connected to the network, users want to access files in the network more. Client-server system has been commonly used to discover and utilize files via the network. Computers in a system on the network are classified into two kinds; a server to store, search and provide files, and clients to utilize them. In this system the server is usually under heavy load, while with the advancement of technology the clients have surplus performance of computing, storage, network bandwidth and so on.
Two types of file discovery protocol are suggested to deterilize its surplus resources. One is that clients store and provide files and a server only searches files to find the location with its index of files in clients. the other is that without any server, all computers connected to a network have the same functionality and collaborate one another to store, discover and provide files.
This paper puts focuses on the latter type for the following three reasons. First, it is easy to manage the system. Second, it has so high anonymity that it is impossible to keep watch and collect logs on the whole network. Third, no single node fauilure stops the whole functionality of the system.
Gnutella is one of the well-known file discovery protocols among the latter network protocols. In the Gnutella network, each node collaborates one another for the file discovery with a Query -- a packet for the file discovery.
The file discovery mechanism of Gnutella is as follows. When users request a file a Query is generated on the node. The node forwards the Query to all of its neighboring nodes -- the nodes it is connected to. The nodes that receive the Query forward copies of all of its neighboring nodes . This is broadcasting.
The nodes that have the requested file respond the Query by a QueryHit -- a packet to indicate where and what file is found. The nodes forward the QueryHit back by way of a reverse path where the corresponding Query was forwarded. The node that generates the Query gets QueryHits and gets the requested file.
The broadcasting method increases the number of Query packets exponentially per one hop, which causes many problems. First, computing resources for forwarding the Query and QueryHits may disturb applications users run at the same time. Second, it takes longer time to get QueryHits for the node that generates a corresponding Query. Third, network load increases.
This paper suggests a mechanism of select neighboring nodes not "a node forwards copies of a Query to all the neighboring nodes", but "a node forwards copies of a Query to the limited neighboring nodes to get QueryHit more and faster". It is expected that limiting the number of nodes to forward Queries avoids the exponential increase of the number of packets.
To select a better neighboring node for the next hop, each node assignes probability is assigned to every neighboring node. The probabilities of neighboring nodes gets lower where a few QueryHits get back slowly, and the probabilities of nodes gets higher where many QueryHits get back early.
To refine the probabilities of forwarding on a node, the author proposes a reinforcement learning algorithm based on AntNet which shows excellent performance in IP routing.
This algorithm uses turn around time of QueryHit to determine how good the neighboring node is. The nodes refine the probability of forwarding according to information of how fast the QueryHit gets back on all nodes where a QueryHit is forwarded back on the path that the corresponding Query is forwarded. A node selects neighboring nodes to forward a Query according to the probability.
Simulation experiments show that our method can discover the six times more number of nodes that provide a target file with same network load. Under a situation that nodes generates Queries very frequently, the number of packets in queue increased rapidly in broadcasting but the number is more stable in the proposed method. Also,the proposed method can discover files more quickly than the broadcasting method.