Microsoft’s Albert Greenberg, Distinguished Engineer, Networking Development, has been chosen to receive the 2015 ACM SIGCOMM Award for pioneering the theory and practice of operating carrier and data center networks. ACM SIGCOMM’s highest honor, the award recognizes lifetime achievement and contributions to the field of computer networking. It is awarded annually to a person whose work, over the course of his or her career, represents a significant contribution and a substantial influence on the work and perceptions of others in the field. Albert will accept the award at the 2015 ACM SIGCOMM conference in London, UK, where he delivered the conference keynote address. Below, Albert shares more about Microsoft’s innovative approach to networking and what’s going on at SIGCOMM.
Microsoft Showcases Software Defined Networking Innovation at SIGCOMM
This week I had the privilege of delivering the keynote address at ACM SIGCOMM, one of our industry’s premier networking events. My colleagues and I are onsite to talk about and to demonstrate some of Microsoft’s latest research and innovations, and share more about how we leverage software defined networking (SDN) to power Microsoft Azure.
Microsoft Bets Big on SDN
To meet growing cloud demand for Azure, we have invested over $15 billion in building our global cloud infrastructure since we opened our first datacenter in 1989. Our datacenters hold over a million physical servers, and it is unthinkable to run infrastructure at this scale using the legacy designs the industry produced prior to the cloud revolution. In my keynote, I discussed how we applied the principles of virtualized, scale-out, partitioned cloud design and central control to everything from the Azure compute plane implementation to cloud compute, storage, and of course, to networking. Given the scale we had to build to, and the need to create software defined datacenters, for millions of customers, we had to change everything in networking, and so we did – from optical to server to NIC to datacenter networks to WAN to Edge/CDN to last mile.
It is always a pleasure to speak at SIGCOMM, since key ideas of hyperscale SDN were put forward in the VL2 (Virtual Layer 2) paper at SIGCOMM 2009: (a) to build a massive uniform high bandwidth Clos network to provide the physical datacenter fabric, and (b) to build for each and every customer a virtual network through software running on every host. Together, these two enabled SDN and Network Function Virtualization (NFV), through iterative development by amazing teams of talented Microsoft Azure engineers. In particular, over the last ten years, we have revised the physical network design every 6 months, constantly improving scale and reliability. Through virtualizing the network in the host, we ship new network virtualization capabilities weekly, updating the capabilities of services such as Azure ExpressRoute.
First, in the keynote, I talked about the challenges of managing massive scale Clos networks built with commodity components (in optics, in merchant silicon, and in switches) to achieve 100X improvements in capex and opex as compared to prior art. Indeed, we now think back on prior art of data center networking as ‘snowflakes’ — closed, scale up designs, lacking real vendor interoperability, each specialized and fragile, requiring frequent human intervention to manage. In contrast, our cloud scale physical networks are managed via a simple, common platform (the network state service), which abstracts the details of the complexity of individual networks, and allows us to build applications for infrastructure deployment, fault management, and traffic engineering, as loosely couple applications on the platform.
I talked about the Switch Abstraction Interface (SAI) and the Azure Cloud Switch (Microsoft’s own switching software that runs on top of the Switch Abstraction Interface) inside the switch. At Microsoft, we are big believers in the power of open technology, and networking is no exception. The SAI is the first open-standard C API for programming network switching ASICs. With ASIC vendors are innovating ferociously fast the formerly strict coupling of switch hardware to protocol stack software prevented us from choosing the best combination of hardware and software to build networks, because we couldn’t port our software fast enough. SAI enables software to program multiple switch chips without any changes, making the base router platform simple and consistent. Tomorrow, we will give a SAI and Azure CloudSwitch demonstration with many industry collaborators.
Managing massive scale Clos networks calls has called for new innovations in management, monitoring, and analytics. We took on those challenges by using technologies developed for cloud scale — leveraging the same big data and monitoring capabilities that Azure makes available to our customers. At cloud scale, component failures will happen, and Azure is fine with that as we scale out to numerous components. Our systems detect, pinpoint, isolate, and route around the faulty components. At Sigcomm, this year, we talk about two such technologies — PingMesh and EverFlow — used every day to manage the Azure network.
In the second part of the talk, I focused on network virtualization, allowing customers to create on shared cloud infrastructure, the full range of networking capabilities that are available on private, dedicated infrastructure. Though virtual networks (VNets), customers can seamlessly span their enterprises to the cloud, which allows customers to protect existing investment, while upgrading to the cloud at their own pace. To make VNets work at higher levels of reliability, we needed to develop two technologies: (a) scalable controllers capable of managing 500,000 servers per regional datacenter, and (b) fast packet processing technologies on every host that light up the functionality through controller APIs. All of this is stitched together through the same principles of cloud design applied to our physical network, as well as to our compute and storage services.
Again, we leverage cloud technologies to build the Azure SDN. In particular, Azure Service Fabric provides the micro-service platform on which we have built our Virtual Networking SDN controller. Service Fabric takes care of scale-up, scale-down, load balancing, fault management, leader election, key-value stores, and more, so that our controllers can focus on the key virtual functions needed to light up networking features on demand and huge scale.
ExpressRoute, where we essentially create a datacenter scale router, through virtualization, and networking capabilities on every host, enables customers that have an ISP and IXP partner to attach Azure immediately to their enterprises, for their VNets and Azure’s native compute and storage services. It’s been little over a year since we announced ExpressRoute and that time, the adoption has been phenomenal with new ISP and IXP partners onboarding at amazing pace.
This gives me the opportunity to talk about our Virtual Filtering Platform (VFP). In VFP we have developed the extensions for SDN that run on every host in the data center. VFP provides a simple set of networking APIs and abstractions that allow Microsoft to introduce new networking features in an agile and efficient way, through chaining typed match action tables. VFP is fast and simple as it focuses on packet transformations and forwarding. All service semantics are removed from the host and located in the SDN controller.
That said, there are limits to what can be done purely in software with tight controls on cost and low latency. As a result, we introduced new super low latency RDMA technologies, and a new protocol that we run in Azure’s NICs, DCQCN, also being presented at this year’s SIGCOMM. I showcased some performance measurements taken for Bing, showing that we dramatically improve latency, going from 700us to 90us at the 99th percentile. That brings me to the question of how we can leverage hardware to offload the VFP technologies, and get even better performance as networking continues on its journey to every greater speed, feature set, and support for larger numbers of VMs and containers.
Azure SmartNIC meets these challenges. Our SmartNIC incorporates Field Programmable Gate Arrays (FPGAs), which enable reconfigurable network hardware. Through FPGA’s we can create new hardware capabilities at the speed of software. No one knows what SDN capabilities will be needed a year from now. Our FPGA-based SmartNIC allows us to reprogram the hardware to meet new needs, as they appear — reprogramming, not redeploying hardware. To make the enormous potential clear to the audience, I demonstrated encrypted VNet, providing strong security for all communications with a VNet.
I will update this post soon with a link to my keynote and slides, so keep an eye on this space.
Celebrating Networking Innovation
What I am most excited about at SIGCOMM is to see many of my colleagues recognized for innovating, ground-breaking research. Microsoft has a proud history of publishing early and working with the industry to deliver innovation. This year marks the ten year anniversary of the paper “A Clean Slate 4D Approach to Network Control and Management,” which I wrote with Gisli Hjalmtysson, David A. Maltz, Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, and Hui Zhang. This paper was selected by SIGCOMM for this year’s “Test of Time Award.”
In the paper, we proposed the key design principles of centralizing control, programming the network to meet network-wide objectives, based on network-wide views. This research gave rise SDN and NFV. It also illustrates that the best way to have impact is to imagine the future and then work on the engineering and products to make it happen. We provided the scenario, the team, the systems and tools to turn the dream of the 4D approach into the SDN and NFV reality for Azure.
This year’s SIGCOMM is no different, with many publications from our Microsoft Research colleagues and collaborators in Academia being presented, providing insightful measurements, and opening up new challenges and promising innovations. I hope you’ll check out these papers and let us know what you think in the comments.
- Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis
- Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
- Low Latency Geo-distributed Data Analytics
- Silo: Predictable Message Latency in the Cloud
- Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale
- Packet-Level Telemetry in Large Datacenter Networks
- Enabling End-Host Network Functions
- Congestion Control for Large-Scale RDMA deployments
- R2C2: A Network Stack for Rack-scale Computers
- Programming Protocol-Independent Packet Processors