Monday, November 14, 2011

VL2: A Scalable and Flexible Data Center Network and PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric and c-Through: Part-time Optics in Data Centers

VL2 is a datacenter network to be cost effective and agile for better utilization and performance of the cluster.  Current datacenter networks usually have high cost hardware in a tree like structure, and cannot provide the agility required for todays services.  The high costs limit the available capacity between any two nodes.  The network does not do very much to isolate traffic floods.  Also, typical networks have a fragmented address space which is a nightmare for configuration.  VL2 solves these problems by providing uniform high capacity, performance isolation, and layer-2 semantics.  VL2 uses low cost switch ASICs in a Clos topology and uses valiant load balancing for decentralized coordination.  VL2 chooses paths for flows, rather than packets for better TCP performance.  Measurements of network traffic patterns show that the traffic is highly variable, and the hierarchical topology is inherently unreliable near the top.  VL2 uses scales out hardware with many low cost switches instead of scaling up to expensive hardware.  This provides high aggregate and bisection bandwidth in the network.  An all-to-all data shuffle showed that VL2 had a 94% efficiency and that a conventional hierarchical topology would have taken 11 times longer.  Experiments also showed that VL2 provided fairness and performance isolation.

Portland is scalable, easy maintenance, reliable network fabric for datacenters.  The key insight is that the baseline topology and the growth model of the network is known, so they can be leveraged.  Current network protocols incur lots of overhead to route and manage 100,000s of machines in a datacenter.  Portland uses a centralized fabric manager which holds the soft state of the topology, and uses pseudo MAC addresses to separate host location from host identifier, and to reduce forwarding table sizes.  The fabric manager can reduce the overheads of broadcasts in the common case because it can respond to ARP requests from its PMAC mappings.  Switches can use a distributed location discovery protocol to determine where in the topology they are.  This allows for efficient routing as well as no maintenance required.  Portland uses a simple loop free routing by following an up/down protocol.  All packets are routed up through aggregation switches, then down to the destination node.  Faults are detected during the location discovery protocol and are forwarded to the fabric manager, which can later notify the rest of the network.  Portland only uses O(n) instead of O(N^2) messages to resolve the failure.  Experiments show that Portland is scalable, fault-tolerant, and easy to manage.

c-through is a new system which uses optical switching and a hybrid packet and circuit switching.  Traditional hierarchical topologies have cost and other limitations, and fat tree topologies require many more switches and links which makes wiring more difficult to manage.  Optical fibers can support greater capacities, but require coarser-grained flows, instead of packet switching.  HyPaC is a hybrid architecture with traditional packet switching, but also has a second high-speed rack-to-rack circuit switching optical network.  The optical network switches much slower where the fast bandwidth between two racks are provided on demand.  c-through implements the HyPaC architecture but using increased buffers to estimate rack-to-rack traffic and fully utilize optical links, and uses the estimations to calculate the perfect matching between racks.  This can be done within a few hundreds of milliseconds for 1000 racks.  The HyPaC architecture is emulated and experiments show that the emulation behavior follows expectations and the large buffers do not result in large packet delays.  Several experiments with over-subscribed networks showed that c-through performed similarly to a network with full bisection bandwidth.

Future networks will move towards lots of lower cost network hardware, just like how clusters have moved towards many low cost commodity machines.  However, being able to provide large capacity with many commodity switches will be the challenge.  The fat tree topology like in VL2 will be popular since since it uses lots of commodity switches.  Large companies like facebook, google, and microsoft can easily purchase many switches and wires in bulk for their datacenters to be very cost effective, so the extra hardware and wiring will not be a huge problem.  Easier maintenance will be crucial for the network to be sustainable, so techniques similar to the ones in portland will be required.  having easy mechanisms for fault-tolerance, fairness and isolation will be important.  These mechanisms will be required for a resource manager like mesos to be able to allocate them to applications.  I don't think special optical switches will make it into large datacenters, because it will cause extra management of new hardware, but rather, more switches and wires could make it into the cluster.

No comments:

Post a Comment