Looking for Senior AWS Serverless Architects & Engineers?
Let's TalkIntroduction
Cloud networking has been one of the areas where AWS has had to position itself as a visionary and leader in the quest to deliver on Cloud Computing promises on traditional networking technologies. This has led to years of growth and innovations that will be shown off with real-world examples and behind-the-scenes challenges later in this post.
In this article, we'll delve into the most important announcements from Dave Brown, VP of Amazon EC2 Networking and Compute Services during his AWS re:Invent 2023 innovation talk in Las Vegas and explore their potential impact on businesses, developers, and the overall cloud ecosystem.
In today’s world where Data, Generative AI, and Security dominate all news headlines and business objectives, is AWS’s global infrastructure ready to support this new trend? Let’s find out.
Why does networking matter?
He goes back to the year 1440, in Germany, when Johannes Gutenberg invented the movable-type printing press, a revolutionary invention that drastically reduced the cost and speed of printing. Although Gutenberg eventually died penniless, the printing industry found new hope when other entrepreneurs moved to Venice and sold books to ship captains who traveled the world. It was this network of vessels that distributed the books to other markets that Gutenberg unfortunately didn’t have access to in his lifetime. How is this related to networking? Dave says “…even if you have the greatest product in the world, without connectivity, it can be very difficult to succeed". So AWS has built extensive dark fiber cables around the world to carry your data securely. Networking drives collaboration and the exchange of ideas and creates communities that transcend geographies.
AWS has total control of its global network. It controls every single inch of the optical dark fiber cables and connections to its Regions, Availability Zones, and Edge Locations. AWS continues to extend this network both in reach and capacity. This has enabled them to accompany Fox News in moving from HD to 4K which doubled the bitrate needed to carry the workload. Also mentioned was the capability of AWS's network to support a single client trafficking over 74 zettabytes of enterprise data every single year. As more devices requiring connectivity to the cloud such as driverless cars continue to emerge, the network will also need to continue expanding to meet the increased demand for networking capacity.
The network growth in numbers
Core network Capacity
In 2019, AWS officially doubled its core network capacity since the first services launched in 2005 but it only took 3 years for the next capacity doubling to happen. Unfortunately, no numbers were mentioned to complement this claim.
Regions
More Regions mean more choices. Customers can deploy applications closer to the end-users to improve performance or meet regulatory and compliance requirements around data sovereignty and residency. More Regions also means higher availability.
AWS now has 32 Regions with 4 more Regions coming up in Canada, Malaysia, New Zealand, and [Thailand](https://press.aboutamazon.com/2022/10/aws-to-launch-an-infrastructure-region-in-thailand#:~:text=SEATTLE—Oct.,Zones across 27 geographic regions.). The Tel Aviv Region was the only Region that went into general availability this year.
Local Zones
No other form of signal transmission exists yet that is faster than the speed of light so the best way to deliver values to end-users is to shorten the cable. Local Zones were created primarily to serve customers in the media and entertainment industry who require single-digit millisecond round trip times for their applications.
The first Local Zone was launched in 2019 in Los Angeles and helped reduce the round trip time from 25ms to 1-2ms. Today there are over 35 Local Zones around the world with 19 more coming up. The general availability for AWS Local Zones in Lima, Lagos, and Queretaro was announced in the first quarter of 2023.
Direct Connect
Connecting to AWS over the internet comes with some level of unpredictability both in terms of end-to-end performance and security. AWS Direct Connect provides customers with direct fiber connections to AWS’s routers in colocation facilities around the world similar to traditional MPLS networks which are private and support speeds up to 100Gbps.
20 new Direct Connections were launched in 2023.
Amazon CloudFront
Amazon CloudFront helps in serving your data to your end-users. Today, there are 600 CloudFront points of presence and 13 regional edge caches that handle over 3 trillion requests every day. CloudFront played a critical role in the ICC, Cricket World Cup on Disney+, and Hostar setting a record of over 59 million people watching India vs Australia in the final of the cup.
AWS Nitro System
AWS launched the first Nitro chips 11 years ago to overcome the limitations of software-based virtualization that the initial EC2 instances were built on. They had to redefine virtualization from scratch and Dave claims they are the only cloud provider that doesn’t use any part of the central machine resource for workloads. This means they do not use the Intel processor or GPU that comes with the machine and rely entirely on their hardware for security, storage, and networking.
The result of the improvements delivered by every iteration of the Nitro System can be seen in the growth of network bandwidth over the years. According to AWS, relying solely on software-based virtualization wouldn’t have given them the possibility to exceed 100Gbps. Now it is possible to launch EC2 instances with 400 Gbps of networking throughput.
But it doesn’t end there. The rise of AI/ML workloads and the need for lower latencies and high throughput has pushed AWS to innovate yet again with the release of their Trainium machine learning accelerator instances which started in 2022, supporting 800Gbps throughput and today with Nvidia’s H100 Tensor Core GPUs, the P5 instances can support up to 3200 Gbps.
After solving the single instance throughput requirement for AI/ML workloads, then came the challenges on the core network with traffic congestion risks, inefficient performance, and scaling limits. AWS introduced a new network topology called the Amazon EC2 UltraCluster providing scale, performance, and availability improvements dedicated to the support of AI/ML workloads running on tens of thousands of GPUs.
New Feature: Amazon EC2 Instance Topology API in General Availability
Provide detailed information on where your instances are so that you can move them to the same network nodes to optimize for better network performance
Redefining traditional packet flows
The traditional TCP/IP routing protocol does not account for congestion down the forwarding path which leads to inefficient use of all the interconnected links. AWS invented SRD (Scalable Reliable Datagram) routing which optimizes performance by sending data to as many network paths as possible, thereby avoiding the overload of any particular segments in the forwarding paths. It is a key technology required to support AI/ML workloads. Though SRD was initially used for HPC workloads, it has been added to 58 new instance types to support various types of workloads.
Amazon VPC
EC2 was launched on one flat network segmented only by security groups until the VPC was introduced. AWS thought that no customer would need more than one VPC but today some do run thousands of VPCs. The same assumption was made regarding the nature of connectivity between the VPCs. VPC peering which launched in 2013, was the first attempt at providing a simple peer-to-peer connection between a handful of customer VPCs. VPC peering doesn’t support transitive routing which means that if VPC A is connected to VPC B and VPC B is connected to VPC C, VPC A cannot reach VPC C by transiting over VPC B. This meant multiple peering connections became impossible to manage when you have over a hundred VPCs. So in 2018, AWS launched the Transit Gateway to manage not only connections between hundreds of VPC between multiple Regions, but also connections to on-premise networks and VPN terminations.
AWS Cloud WAN
AWS Cloud WAN is the next step towards interconnecting larger customer networks globally through AWS’s backbone network to overcome the Region-specific limitations of VPCs. It was first announced at AWS re:Invent 2022 to simplify the management of global networks. Think of AWS CloudWAN as your global AWS MPLS provider providing you with a unified interface to manage all your network connections as a single unit. Before AWS Cloud WAN users could still build global networks with multiple Transit Gateways peering across Regions and on-premises but CloudWAN facilitates network management of this global network into a simple interface. You can segment traffic among your various VPCs and departments and automate the implementation of guardrails and security controls through network configuration policies.
New Feature: AWS CloudWAN Tunnel-less Connect
AWS Cloud WAN Tunnel-less Connect allows you to extend your SD-WAN infrastructure into AWS. AWS Cloud WAN launched last year with the AWS Transit Gateway Connect feature which relied on Generic Routing Encapsulation (GRE) tunnels for the SD-WAN infrastructure extension. But with the new AWS Cloud WAN Tunnel-less Connect feature, as inferred from the name, the GRE tunnels are not required. Firstly, it simplifies the integration process and the absence of the GRE overhead increases your bandwidth by up to 5x.
More Features for Better IP Management and Automation
Manually tracking IP addresses with spreadsheets is not practical so AWS released Amazon VPC IP Address Manager (IPAM) in 2021 to simplify IP tasks such as assigning tracking and monitoring of IP addresses. This year, the service has been improved with automatic IP address assignment for your VPC subnets, Bring Your Own ASN, and a Free Tier to get visibility of public IP usage.
More features were also added to improve the migration and management of IPv6.
Load Balancing
New Feature: Anomaly Detection with Automatic Target Weight.
Anomaly detection allows you to automatically detect and mitigate grey failures for application load-balancing targets. The ALB will identify failing targets and reduce the traffic to those targets. Detecting these grey failures is possible due to insights AWS has gained from their experience with load balancing coupled with Machine Learning algorithms. More ambiguous situations such as backend application issues or network issues between the load balancer itself. Once an anomaly is detected, you will be notified via CloudWatch alert while the service weighs out of the failing targets and distributes the rest of the traffic to healthy targets.
New Feature: Application Load Balancer Mutual TLS Authentication
AWS Application Load Balancer now supports fully managed authentication for certificate-based identities with mTLS (Mutual TLS). With this feature, you get client certificate authentication powered by either AWS Certificate Manager or other integrated 3rd party certificate authorities. You can also integrate it with secure libraries to avoid issues like in the case of the Heartbleed bug.
Amazon VPC Lattice
Amazon VPC Lattice was launched last year but went into GA earlier this year. The goal is to simplify the developer experience for building service-to-service communications enabling developers to achieve their goals without worrying about IP address assignment and networking.
Security
At AWS, security is job zero. So how does AWS protect their network? They’ve built an internal tool called Project MadPot. AWS launched this pioneering initiative in the late 2010s, leveraging its vast cloud infrastructure to attract and observe potential cyberattacks. This "honeypot" strategy enabled them to analyze attacker methods and develop more effective countermeasures. It doesn’t only detect but also alerts other AWS preventive services like AWS WAD, AWS Security Hub, AWS Network Firewall, Amazon GuardDuty, and AWS Shield. It also shares insights with the intelligent community and partners leading to the dismantling of botnets, DDoS attacks, and command-control centers.
AWS Network Firewall
AWS Network Firewall is an AWS-managed service that makes it easy for customers to deploy network protections to inspect and filter traffic to and from or between their VPCs. It was launched in 2020 and this year, AWS added 3 key features.
- Decrypt and inspect TLS Connection. The majority of internet traffic is SSL/TLS encrypted so without deep packet inspection (DPI), there is no visibility into encrypted traffic. This presents challenges to customers who haven’t set up alternative means to inspect traffic as TLS encryption can also hide malware and conceal data theft. This new feature adds a step in the ingress routing into your VPC. Traffic in the scope of TLS encryption configuration is forwarded for a decrypt operation. Performing DPI without this feature would have required using AWS WAF or implementing 3rd party security appliances behind a Gateway Load Balancer.
- Resource tagging. This enables the implementation of micro-segmentation policies in your environment. With this feature, you can tag and filter AWS resources to centrally manage and reference sets of resources in your stateful firewall rules, instead of manually updating your rule groups every time you make changes to a set of resources. You can tag your EC2 instances and ENIs as resource groups and reference the tag in your AWS Network Firewall rule groups. Previously, updating individual firewall rules with changes to your AWS resources was tedious and error-prone. Now, AWS Network Firewall automatically keeps your rules current by syncing them with the latest IP addresses and CIDR ranges of the resources in your specified resource groups. This ensures consistent firewall protection as your infrastructure evolves, saving you time and effort.
- Multiple administrator support: This enables customers to create AWS Firewall Manager administrator accounts from AWS Organizations service to manage their firewall policies. Customers can delegate responsibility for firewall administration by restricting access based on Organization unit, account, policy type, and region.
AWS Gateway Load Balancer
AWS Gateway Load Balancer helps customers easily deploy, scale, and manage third-party virtual appliances. Before this service was introduced, customers who depended on Software-based appliances from vendors that they were familiar with in the on-premise networks needed to rely on EC2 in combination with some hacks to provide automatic failover.
No new future was announced but the takeaway from this bit of the presentation is that software appliances from partners like Cisco, Fortinet, Palo Alto, Checkpoint, Trend, Aviatrix, etc. represent the highest segment in the AWS Marketplace today.
AWS Verified Access
AWS Verified Access is a service built on Zero Trust guiding principles to provide secure access to corporate applications without a VPN. It helps reduce the risks associated with remote connectivity which traditionally relied only on perimeter security. AWS Verified access also allows you to integrate with trust network providers like Okta and Cyberark as well as SIEM providers like Datadog and New Relic just to name a few.
This service was launched last year and went into general availability this year with the following new features.
- Support for AWS Web Application Firewall (WAF). This integration helps protect applications from application-layer threats by filtering out common exploits such as SQL injection and cross-site scripting.
- Policy Assistant. Policy Assistant for AWS Verified Access makes it easier to express, troubleshoot, and simulate application access policies. With Policy Assistant, you can accelerate the validation and authoring of your application access policies while adhering to Zero Trust Principles.
Conclusion
While no groundbreaking new service was announced, Dave Brown's talk at AWS re:Invent 2023 highlighted significant advancements in existing services and infrastructure. AWS continues to invest heavily in network expansion and performance, with key developments including the launch of 19 more Local Zones, the introduction of the "Trainium" machine learning accelerator instances, and the new EC2 UltraCluster topology for AI/ML workloads. Additionally, AWS is focused on improving the developer experience with features like the Amazon VPC Lattice and the IP Address Manager. Security remains a top priority, with enhancements to services like AWS Network Firewall and the introduction of AWS Verified Access, which promotes Zero Trust principles for secure application access. Overall, these advancements demonstrate AWS's continued commitment to providing a robust, secure, and scalable network infrastructure for its customers.