A Lemongrass success story: Enhancing Multi-Region SD-WAN failover with AWS Cloud WAN

Managing multi-Region network connectivity at scale is a critical challenge for modern enterprises. At Lemongrass Consulting, we enhanced our Amazon Web Services (AWS) network architecture by implementing AWS Cloud WAN. This implementation enabled intent-based routing between multiple AWS Regions while providing seamless on-premises integration through SD-WAN in our multi-Region AWS environment. Throughout this transformation, we maintained our existing AWS Transit Gateway infrastructure without disruption. This solution provided dynamic failover capabilities and improved our overall network resilience.

In this post, we demonstrate how AWS Cloud WAN helped us create an efficient cross-Region network architecture with automated failover mechanisms. We walk through our implementation process, sharing key insights and best practices that allowed us to achieve this transformation while minimizing disruption to our customers’ environments.

Customer environment

At Lemongrass, we manage AWS environments for numerous enterprise customers, using Transit Gateway to implement hub-and-spoke network architectures. This architecture consists of Transit Gateway as the hub, connecting to spoke VPCs within each AWS Region, with inter-Region connectivity established through Transit Gateway peering.

One of our customers operates a hybrid cloud network where SD-WAN appliances facilitate connectivity between Lemongrass’s AWS environment and the customer’s on-premises network. These appliances operate in a high-availability configuration, with primary and secondary instances distributed across two Availability Zones (AZs). They connect to Transit Gateway using Transit Gateway Connect attachments, with each appliance maintaining two Generic Routing Encapsulation (GRE) tunnels for redundancy. This setup enabled seamless Border Gateway Protocol (BGP) route propagation between on-premises and AWS environments.

The SD-WAN appliances are set up in active-passive mode with two Amazon Elastic Compute Cloud (Amazon EC2) based instances, with each establishing two GRE tunnels to the Transit Gateway. Each instance is deployed in a different AZ and advertises on-premises prefixes over BGP to the Transit Gateway. Both appliances continuously advertise the same prefixes, but the primary/secondary behavior is controlled using BGP AS_PATH prepending. For example, SD-WAN appliance 1 advertises prefixes with an AS_PATH of 65000, while SD-WAN appliance 2 prepends another AS number, resulting in an AS_PATH of 65000 65000. This AS_PATH length difference makes sure that SD-WAN appliance 1 is preferred under normal conditions. If appliance 1 fails or its BGP session drops, then BGP routing automatically converges to choose SD-WAN appliance 2 as the next-best path, because its routes continue to be advertised. Furthermore, the SD-WAN appliances maintain connectivity to the broader SD-WAN fabric and propagate routes learned from AWS over BGP.

A three-Region (US-WEST-2, US-EAST-2, US-EAST-1) AWS network infrastructure connecting to a corporate data center through SD-WAN tunnels and Transit Gateway peering, with public/private subnet routing

Figure 1: A three-Region (US-WEST-2, US-EAST-2, US-EAST-1) AWS network infrastructure connecting to a corporate data center through SD-WAN tunnels and Transit Gateway peering, with public/private subnet routing.

To further enhance resilience and to support their disaster recovery capabilities, the customer needed a tertiary SD-WAN appliance in a second Region. This would provide protection against regional outages impacting both existing SD-WAN appliances in the US-East-2 Region. It ensures continued connectivity when the connection to on-premises is impaired, or issues arise at the on-premises data center.

Challenge

The primary challenge in this architecture was ensuring seamless failover between SD-WAN appliances across AWS Regions without manual intervention. Transit Gateway is a regional networking service by design, thus inter-Region connectivity relies on Transit Gateway peering, which uses static route configurations. Although this approach works well for predictable traffic patterns, it presents limitations when handling dynamic failover scenarios.

In this customer’s environment, the SD-WAN appliances advertise only a default route to AWS, because AWS Regions function as spokes in the SD-WAN topology and need only a default route for on-premises connectivity. When the customer decided to implement a tertiary SD-WAN appliance in the secondary Region for regional redundancy, they needed a solution to dynamically route traffic from workloads in Region-A to the SD-WAN appliance in Region-B.

To address this requirement, we needed an approach that would enable dynamic route propagation across AWS Regions while preserving the customer’s existing network architecture and default route advertisement strategy. This solution would need to complement the existing Transit Gateway infrastructure that connects application VPCs in each Region, while adding support for dynamic failover scenarios between SD-WAN appliances.

Solution

With critical workloads running in production, our customer needed a solution that would enable dynamic regional failover while ensuring minimal disruption to their operations. The design needed to reduce complexity and end-to-end dynamic routing failover, making AWS native services an ideal choice for this transformation.

AWS Cloud WAN proved to be the optimal solution, offering a fully managed wide-area networking service that streamlines multi-Region connectivity through centralized management and automated route propagation. This allowed us to extend the customer’s AWS network with BGP-based dynamic routing while preserving their existing Transit Gateway infrastructure.

Our implementation strategy prioritized minimal disruption to the production environment. We designed a phased approach to integrate AWS Cloud WAN alongside the existing network architecture, making sure of continuous operations throughout the transition. The service’s native capabilities provided end-to-end route awareness across AWS Regions, enabling dynamic path selection for SD-WAN failover scenarios without introducing more complexity. We used AS_PATH prepending on the SD-WAN devices to determine primary, secondary, and tertiary nodes that advertise and receive prefixes from on-premises and AWS.

Figure 2: Transitional architecture showing AWS Cloud WAN integration into the existing three-Region infrastructure, introducing segmented routing between corporate data centers (East and West) while maintaining SD-WAN connectivity.

Figure 2: Transitional architecture showing AWS Cloud WAN integration into the existing three-Region infrastructure, introducing segmented routing between corporate data centers (East and West) while maintaining SD-WAN connectivity.

Implementation approach

Our implementation followed a carefully planned phased approach to minimize disruption to the production environment, as shown in the preceding figure. We maintained existing SD-WAN connections through Transit Gateway until the AWS Cloud WAN setup was fully operational and validated.

Phase 1: AWS Cloud WAN core network setup

  • Provisioned an AWS Cloud WAN core network in AWS Network Manager
  • Deployed Core Network Edges (CNEs) in each participating AWS Region
  • Defined and configured unique BGP Autonomous System Numbers (ASNs) range to prevent routing conflicts with existing Transit Gateways
  • Defined dedicated IP ranges for GRE tunnels to enable BGP connectivity with SD-WAN appliances

Phase 2: Network segmentation and route management

  • Implemented two distinct AWS Cloud WAN network segments:
    • Egress Segment for SD-WAN connectivity through GRE tunnels
    • Transit Segment for route table attachments between Transit Gateways and AWS Cloud WAN
  • Enabled automatic route sharing between segments to streamline route management

Phase 3: Integrating AWS Cloud WAN with Transit Gateway

  • Created dedicated Transit Gateway route tables for AWS Cloud WAN connectivity
  • Established route propagation from AWS Cloud WAN’s transit segment to Transit Gateway route tables
  • Validated route synchronization between AWS Cloud WAN and Transit Gateway

Phase 4: SD-WAN connection migration

  • Deployed new Elastic Network Interfaces (ENIs) on SD-WAN appliances for connectivity to AWS Cloud WAN
  • Established GRE tunnels from SD-WAN appliances to the AWS Cloud WAN egress segment
  • During the transition, we maintained parallel connectivity through the existing Transit Gateway. The Transit Gateway continues to prioritize routes through SD-WAN through connect attachment over routes received through AWS Cloud WAN peering attachment. This behavior is based on the Transit Gateway route evaluation order.
  • Verified BGP route propagation across all network components such as on-premises network

Phase 5: Production cutover and validation

  • Executed staged migration:
    1. Disabled BGP sessions to Transit Gateway
    2. Activated route advertisement through AWS Cloud WAN
    3. Validated route propagation across all Transit Gateway route tables. Route Analyzer can be used in this validation effort
    4. Removed legacy Transit Gateway Connect attachments for SD-WAN
  • Performed comprehensive failover testing across all three SD-WAN appliances across both AWS Regions

Final architecture demonstrating fully integrated AWS Cloud WAN deployment across three AWS Regions, with optimized segmentation (Transit and Egress), connecting dual corporate data centers through SD-WAN tunnels and streamline Transit Gateway route table attachments

Figure 3: Final architecture demonstrating fully integrated AWS Cloud WAN deployment across three AWS Regions, with optimized segmentation (Transit and Egress), connecting dual corporate data centers through SD-WAN tunnels and streamline Transit Gateway route table attachments.

Outcome

Implementing AWS Cloud WAN allowed us to successfully enable dynamic cross-Region failover, thereby eliminating the need for manual static route updates and enhancing network resilience. The AWS Cloud WAN centralized network management and BGP-based route propagation provided the customer with improved fault tolerance, operational efficiency, and streamlined network management.

The new architecture integrates AWS Cloud WAN with Transit Gateways and three SD-WAN appliances across multiple AWS Regions, following a primary-secondary-tertiary model. The SD-WAN appliances are deployed as follows:

  • Primary and secondary SD-WAN appliances reside in Region-A
  • Tertiary SD-WAN appliance is deployed in Region-B

Transit Gateways and route tables are linked to AWS Cloud WAN, enabling dynamic route learning from on-premises (0.0.0.0/0). Using BGP, the active SD-WAN node becomes the default egress point, enabling seamless failover and regional redundancy. This dynamic routing approach eliminates reliance on static routing, allowing the network to adapt automatically to the changes in regional availability.

To illustrate the effectiveness of this architecture, we examine a typical failover scenario:

When the primary SD-WAN appliance in Region-A is handling all traffic between AWS and the on-premises network, the following failover sequence occurs if issues arise:

  1. If a failure occurs in Region-A (whether due to an AZ failure or appliance failure), then the BGP session between the primary SD-WAN appliance and AWS Cloud WAN terminates.
  2. AWS Cloud WAN detects the BGP session drop and automatically withdraws routes associated with the primary appliance.
  3. Through BGP route selection, AWS Cloud WAN immediately shifts traffic to the secondary SD-WAN appliance in Region-A, which maintains active BGP sessions.
  4. In a scenario where both Region-A appliances become unavailable, the AWS Cloud WAN dynamic routing capabilities automatically redirect traffic to the tertiary SD-WAN appliance in Region-B, establishing it as the active path.
  5. This entire failover process occurs automatically without manual intervention, maintaining continuous connectivity between AWS and on-premises networks.
  6. When Region-A recovers, BGP reconverges, and AWS Cloud WAN restores traffic flow to the primary SD-WAN appliance based on BGP path selection criteria.

Looking ahead, Cloud WAN supports more enhancements, including inter-Region communication without sole dependence on Transit Gateway peering. Although AWS Cloud WAN can replace Transit Gateway in certain scenarios, we retained the Transit Gateway in this environment for intra-Region VPC-to-VPC connectivity. AWS Cloud WAN handles inter-Region and on-premises routing, optimizing traffic control through the SD-WAN solution.

The implementation makes sure that the customer’s network is future-ready, capable of scaling without regional limitations, and equipped to handle complex network traffic management requirements while maintaining operational clarity. The AWS Cloud WAN intent-based routing policies enable the customer to define network behavior through business-centric rules rather than complex routing configurations. This significantly streamlines network management while providing the flexibility to adapt to changing business needs.

Summary

This implementation successfully met the customer’s goal of achieving multi-Region SD-WAN redundancy with dynamic failover. Key benefits of using AWS Cloud WAN in this setup include the following:

  • Seamless multi-Region failover: The solution made sure of automatic failover across AWS Regions without necessitating manual intervention, thereby improving network resilience.
  • Simplified network management: The AWS Cloud WAN network segmentation capabilities provide enhanced flexibility for future scaling and efficient traffic management.
  • Minimal downtime during migration: The transition from Transit Gateway-based SD-WAN to AWS Cloud WAN-based SD-WAN was smooth, with minimal disruption to existing services.
  • Preserved existing infrastructure: We successfully integrated AWS Cloud WAN while maintaining Transit Gateway for intra-Region VPC connectivity.

During the testing phase, we validated redundancy by failing over traffic to a different Region, confirming that routing behaved as expected and no traffic was lost. The customer was highly satisfied with the outcome, as this solution not only solved the immediate challenge of regional redundancy but also paved the way for a more flexible and scalable network design moving forward.

Adopting AWS Cloud WAN allowed for the customer’s network to become resilient, scalable, and ready for future expansion. The architecture is future-proof and capable of handling increased global traffic flows and more advanced network segmentation as the customer’s business grows. To learn more about implementing AWS Cloud WAN for your organization, visit the AWS Cloud WAN documentation.

About the authors

Hardik.jpg

Hardik Shah

Hardik is a Sr. Technical Account Manager at AWS. He brings extensive experience from finance, travel, and retail industries to support customers on their cloud journey. With a deep passion for technology and networking, he enjoys solving complex technical challenges and helping customers optimize their AWS infrastructure. Outside of work, Hardik likes to spend time with his family, traveling, and exploring cultures and cuisines.

Ankush Goyal

Ankush Goyal

Ankush is an Enterprise Support Lead in AWS Enterprise Support who helps Enterprise Support customers streamline their cloud operations on AWS. He enjoys working with customers to help them design, implement, and support cloud infrastructure. He is a results-driven IT professional with over 20 years of experience.

Ronnie

Ronnie Butler (Guest)

Ronnie Butler is a Senior AWS Infrastructure Architect at Lemongrass Consulting, based in the Philadelphia Metro area, with over 15 years of experience in IT and more than 6 years of hands-on AWS expertise. A seasoned Cloud Infrastructure technologist with deep proficiency in infrastructure architecture design, cloud networking, consulting, and professional services. Passionate about helping organizations accelerate their cloud adoption journey by leveraging AWS technologies to build scalable, secure, and resilient environments. Holds a Bachelor of Science in Information Systems and multiple AWS certifications.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top