Server-less VoIP and Instant Messaging


over IP (VoIP) and Instant Messaging (IM) are very popular communication services for individuals and corporate. A VoIP/IM system needs to provide the following capability: resource identification, session establishment and presence indication. Resource identification is to identify and locate other users who are connected to the sys Voice tem so that conversations can be established. Session establishment and management allows a user to initiate a session (e.g. a audio call or a text message) with another user and manage that connection through to the end of the session. Presence indication is to refer to the ability to be notified if other users have arrived or left the system. For example, a user may have a list of people with whom he wants to communicate (so-called ѢuddiesҠor Ѧriendsҩ. Presence indication allows the user to see if his buddy has entered the system (e.g. by highlighting his buddy on the GUI of the software) Most of existing VoIP (such as Skype) and IM (such as MSN Messenger, Yahoo Messenger) systems are highly centralized. Users first have to connect to a central server. The server is responsible for registration and authentication of users, identifying location of users within the system, and routing signaling traffic between users. These servers are typically hosted at the providerҳ site or operated by a third-party. Such a centralized approach could potentially limits the growth of VoIP/IM system. First, many companies banned the use of VoIP/IM system for security reason because they worry confidential corporate information might get stolen since their private conversations flow through a third party location. Second, individual users who wish to communicate with other VoIP users are required to connect to public Internet first even that they just want to talk to a guy next door. Finally, the clients of different system can not communicate with each other (e.g. a MSN user can not talk to a Skype a user) since each service provider typically implement their servers differently using proprietary protocols. Hence, freeing the VoIP/IM from the need of having a central server is important to make these technologies become pervasive. In this project, we plan to develop a fully decentralized, standard-based VoIP and IM Systems to address the above issues. We propose to combine standards-based VoIP and IM technologies with the self-organizing properties of existing P2P systems to provide a standard-driven solution that allows us to leverage existing applications and protocols. Specifically, we plan to combine the standard SIP protocol with open-source Distributed Hash Table (DHT) protocols to reach this objective.

Related Work

SIP is a text-based protocol similar to HTTP. SIP is a general protocol for establishing and controlling multimedia sessions. Two SIP devices can be configured to communicate directly with each other. However, a central server (or proxy) is required when there are more than two SIP users. SIP is widely used for VoIP. Voice messages are sent from the caller to the server, which is responsible for locating the callee. SIP messages are passed from the server, possibly through additional intermediary proxies. SIP messages flowing through the central servers is only responsible for setting up a VoIP session. The actual voice packets travel directly between the end users, usually using Real Time Protocol (RTP). Distributed hash tables (DHTs) are a class of decentralized distributed systems that provide a lookup service similar to a hash table: (name, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows DHTs to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.


In our architecture, the user first selects an unique identification (such as email address). When initiating a new call, the caller consult the overlay via DHT to locate and then connect to the callee. For backward compatibility, some of the nodes in the overlay can act as a proxy to receive queries from existing SIP clients and then translate those queries into search request in the overlay. These special nodes then returns a redirect message to the caller with the location of the callee. Therefore, our design requires few or no modification for existing SIP clients and is fully decentralized.