Improving End-to-End Availability Using Overlay Networks

David G. Andersen
Massachusetts Institute of Technology, Cambridge, MA,

The end-to-end availability of Internet services is between two and three orders of magnitude worse than other important engineered systems, including the US airline system, the 911 emergency response system, and the US public telephone system. This dissertation explores three systems designed to mask Internet failures, and, through a study of three years of data collected on a 31-site testbed, why these failures happen and how effectively they can be masked.

A core aspect of many of the failures that interrupt end-to-end communication is that they fall outside the expected domain of well-behaved network failures. Many traditional techniques cope with link and router failures; as a result, the remaining failures are those caused by software and hardware bugs, misconfiguration, malice, or the inability of current routing systems to cope with persistent congestion. The effects of these failures are exacerbated because Internet services depend upon the proper functioning of many components---wide-area routing, access links, the domain name system, and the servers themselves---and a failure in any of them can prove disastrous to the proper functioning of the service.

This dissertation describes three complementary systems to increase Internet availability in the face of such failures. Each system builds upon the idea of an overlay network, a network created dynamically between a group of cooperating Internet hosts. The first two systems, Resilient Overlay Networks (RON) and Multi-homed Overlay Networks (MONET) determine whether the Internet path between two hosts is working on an end-to-end basis. Both systems exploit the considera ble redundancy available in the underlying Internet to find failure-disjoint paths between nodes, and forward traffic along a working path. RON is able to avoid 50\% of the Internet outages that interrupt communication between a small group of communicating nodes. MONET is more aggressive, combining an overlay network of Web proxies with expli citly engineered redundant links to the Internet to also mask client access link failures. Eighteen months of measurements from a six-site deployment of MONET show that it increases a client's ability to access working Web sites by nearly an order of magnitude.

Where RON and MONET combat accidental failures, the Mayday system guards against denial-of-service attacks by surrounding a vulnerable Internet server with a ring of filtering routers. Mayday then uses a set of overlay nodes to act as mediators between the service and its clients, permitting only properly authenticated traffic to reach the server.

[PDF (2212KB)] [PostScript (5406KB)] [Gzipped PostScript (1410KB)]