March 02, 2021
Choosing Your Next Data Center Fabric: Cisco’s ACI or EVPN, Part 3
Key differentiators between Cisco’s fabric options, Application Centric Infrastructure and Ethernet VPN.
Cisco customers have a choice between Application Centric Infrastructure (ACI) and Ethernet VPN (EVPN) for building their data center network fabrics. Since these technologies are thought to be interchangeable, the choice often falls to cultural, political or other “Layer 8” decision-making. But the fabric technologies are, in fact, different in ways both subtle and significant, and one’s choice has a profound impact on fabric orchestration, operations, and staffing.
I hope you’ll join me all the way through this three-part blog series as I compare and contrast these technologies and provide a foundation for deciding on the best choice for your organization’s needs.
Security for ACI and EVPN
ACI is widely lauded for its fabric integrity. Switches and APICs are hardened with certificate-based discovery, SSL encryption, image signing and verification, authentication, throttling, control plane policing, underlay/overlay separation, etc. ACI also provides automatic security in the overlay, such as DHCP snooping, ARP proxy, ARP inspection, IP source guard, broadcast suppression, subnet enforcement and control plane policing. Finally, ACI also has optional security features, such as blacklist “taboo” filters, L4 contracts between and within endpoint groups (EPGs and ESGs), VRF route leaking and Tenant/VRF separation.
EVPN’s only out-of-box security features are the EVPN control plane itself, which provides a level of endpoint integrity, and VTEP authentication. NX/OS has many optional security features that can incrementally harden EVPN, but these are seldom deployed to the full extent of what a default ACI fabric provides. EVPN also has no equivalent to ACI’s contracts; EVPN only supports normal access-lists, which are powerful but difficult to manage.
ACI and EVPN both offer MACsec link encryption for fabric links and CloudSec for VTEP-to-VTEP encryption over IP backbones. With ACI, CloudSec is only supported with Multi-Site. With EVPN, CloudSec can be used in any topology, including Campus/WAN/MAN.
ACI and EVPN both tunnel traffic in VXLAN, limiting visibility in firewalls and IPS tools. This doesn’t impact data centers where security appliances are engineered at the fabric edge, or within the fabric. But It can affect fabrics extended over MAN/WAN links. As of this writing, Palo Alto Networks is the only firewall vendor able to inspect within VXLAN tunnels. Other firewalls can build their own VXLAN endpoints, but that is a different use case that is incompatible with ACI and EVPN.
Finally, EVPN is additionally at risk of a sloppy fabric deployment that doesn’t maintain underlay/overlay separation. Such a fabric is vulnerable to VXLAN attacks, a new threat unique to fabrics where the attacker can inject traffic into overlay networks in any security zone. If a VTEP can be compromised, the attacker may even be able to insert endpoints or hijack endpoints. VXLAN attacks are in their infancy and not on the radar of most security teams, but that may change as insecure fabrics become attractive targets.
Cisco Fabric Topologies Beyond the Data Center
Cisco’s ACI Anywhere extends ACI’s prescriptive approach to a variety of topologies.
A single APIC cluster can manage:
- One ACI fabric – up to 400 switches/20k ports in a spine/leaf fabric.
- ACI Multi-Pod – up to twelve pods containing up to 500 total switches/24k ports, all within 2000 miles and interconnected by 10+ Gbps links.
- ACI Remote Leaf – up to 64 leaf pairs can be connected to a parent fabric over an IP backbone, extending the fabric into small server rooms and other satellite locations. Remote leaves are fully survivable if the parent fabric is multi-pod.
- ACI vPod – This niche solution extends partial fabric capabilities to a virtual spine/leaf that can run on top of VMware vSphere, such as on a bare metal cloud.
The Multi-Site Orchestrator (MSO) is a higher-level orchestration plane that supports:
- ACI Multi-Site – up to 12 sites (APIC clusters), each of which could be a multi-pod. This allows for a maximum scale of 6000 switches/288k ports. Fabric objects can stretch across sites, or the sites can remain independent islands. There are no bandwidth or latency requirements between sites.
- ACI Cloud APIC – This extends partial fabric capabilities to up to two clouds and four regions (AWS and Azure today, GCP in the future).
EVPN is topology-agnostic with no deployment restrictions, but the Cisco DCNM fabric manager works best with a few prescribed topologies:
A single DCNM cluster can manage:
- One EVPN fabric with up to 150 leaves/7k ports. This is Cisco’s recommendation.
- Any number of EVPN fabrics, so long as the total switch count is under 350. These fabrics can be independent islands or can be configured with Multi-Site1.
The Multi-Site Orchestrator used for ACI (above) can also orchestrate DCNM:
- Up to six DCNM clusters, each with a single EVPN site of 150 switches. This allows for a maximum scale of 900 switches. In this mode, MSO also manages the Multi-Site connectivity.
- Cisco plans to offer both DCNM-to-Cloud and DCNM-to-ACI connectivity in the future.
DCNM can also provide light management of individual switches that are not part of the EVPN fabrics, such as core/MAN/WAN switches, and even switches from other vendors.
1. Cisco’s EVPN Multi-Site architecture is unique to Cisco and efficiently manages the connectivity between multiple local EVPN fabrics each in its own availability zone. Other vendors prefer to flatten multiple sites into a single EVPN fabric for simplicity, telling customers wanting true availability zones to daisy-chain their EVPN fabrics together.
All external ACI/EVPN connectivity is handled through standard L2/L3 protocols ― static routes, BGP, OSPF, EIGRP, VRF-Lite with 802.1q, LACP and so on. With EVPN, the demark between fabric and external is flexible and can take many different logical and physical forms; it is somewhat analogous to the flexibility one has with SVI placement in a traditional network. In contrast, the ACI demark is rigid, involving ACI border leaves on one side and the external network or device on the other.
For service providers needing VRF at high scale, Segment Routing (SR-MPLS) can be used with both ACI and EVPN. ACI also has “GOLF,” an engineered EVPN/VXLAN handoff that behaves similarly.
An emerging area of external connectivity is multi-domain integration, where fabric handoffs are orchestrated with other domain managers. For example:
- Cisco ACI and Campus integration links APIC with the Cisco Identity Service Engine (ISE) to exchange endpoint security information. This allows ACI security policy to apply to campus users, and campus security policy to ACI endpoints.
- Cisco ACI and Viptela SD-WAN integration links APIC with the vManage SD-WAN controller so that ACI endpoints can be assigned a WAN SLA policy for loss/jitter/latency that is honored by SD-WAN.
Multi-domain integration is not limited to external connectivity, nor to Cisco solutions. For example:
- Nutanix ACI Integration allows Nutanix Prism to create and manage an ACI Tenant for physically attaching a Nutanix AHV cluster.
- ACI’s own VMM feature is essentially a “multi-domain manager” for vSphere, NSX, KVM, etc.
Things to Watch Out For, ACI and EVPN
With ACI or EVPN fabrics, network administrators should prepare themselves for a laundry list of quirks. Below are some that CDW engineers have encountered:
VPC: ACI and EVPN do their VPC peering over the fabric and do not need physical VPC peer links. VPC should always be used to connect other L2-adjacent network devices (firewalls, L2 switches, etc.), but never to L3-adjacent devices (routers, L3 switches/firewalls). Within the fabric, VPC to hosts is something the admin should consider eliminating, since the complexity of LACP/VPC does not justify its benefits. Instead, fabrics should standardize on simple active/failover teaming with bare metal hosts and MAC-based teaming with hypervisors.
NIC teaming: EVPN and ACI both support active/failover (Linux bonding mode 1), LACP (mode 4) and MAC-based teaming (on hypervisors). All other bonding modes should be avoided. In particular, “smart” or “adaptive” teaming (mode 6) will not work as expected with EVPN and ACI.
Host-routing: ACI and EVPN both route endpoints (MAC or IP+MAC), rather than using traditional L2/L3 flood and learn semantics. The result is a cleaner fabric that supports all hypervisors and simple endpoints, such as Windows and Linux VMs. The problem lies with endpoints that are either silent or that do their own network trickery ― virtual IPs, IP pools, proxy ARP, local static routes, routing protocols, anycast loopbacks, broadcast discovery, etc. These problematic endpoints can still attach to the fabric, but fabric and/or host changes may be needed. EVPN fabrics are more tolerant of such endpoints than ACI is.
Microsoft Network Load Balancing (NLB): Surprisingly, ACI and Cisco EVPN both support NLB in unicast, multicast and IGMP mode. In many ways, NLB is now a first-class network citizen.
Out-of-band (OOB) management: ACI and EVPN should both have out-of-band management Ethernet and terminal server connections to all switches. In-band management is possible and may be needed outside of the data center, but extreme care is needed to ensure it will work properly during emergencies.
MTU: All EVPN networks require +50 byte MTU because VTEPs don’t support fragmentation and NX/OS does not support maximum segment size (MSS) clamping (“ip tcp adjust-mss …”). ACI similarly needs +50 bytes for multi-pod, multi-site and remote leaf. The presence of NSX or another software overlay will require an even larger MTU. Note that ACI’s Cloud APIC for public cloud works with any MTU.
Multi-vendor EVPN: Although many of this blog’s points about EVPN are applicable to Arista, Juniper and other vendors, some content is very Cisco-specific. For example, CloudSec, Multicast and NLB handling by Cisco is very different from other vendors. And, almost more importantly, Cisco’s DCNM is very different from Arista’s CloudVision Portal, Juniper’s Contrail Enterprise Multicloud and Apstra’s AoS (now owned by Juniper). Each fabric manager represents a slightly different philosophy.
Conclusion ― Which Is Right for You?
Although ACI and EVPN are functionally similar, they couldn’t be more different operationally.
EVPN is favored by teams of skilled and savvy engineers who are comfortable being responsible for the nuts and bolts of fabric forwarding. They also either are most comfortable with the CLI, or they have a proven track record of using tools and systems to orchestrate and operate their current network. EVPN’s non-prescriptive nature also appeals to those wanting to deploy EVPN over campus, MAN, WAN and other ad hoc topologies.
ACI is favored for its secure, cookie-cutter approach and its cloud-like automation. ACI’s prescriptive nature also extends to “ACI Anywhere” topologies such as Multi-Site, Multi-Pod and Cloud APIC. ACI still requires plenty of engineering skill, but those skills are directed at the logical overlay networking, rather than wrangling the underlay fabric.
It’s no surprise that CDW strongly recommends ACI for most enterprise data centers. Time and time again, CDW has heard from EVPN operators struggling with painful problems that simply don’t exist with ACI2. It’s not that EVPN is broken in some way, only that EVPN takes much more effort to get right and keep right.
2. To be fair, CDW also hears from ACI operators struggling with its problems. The difference is that ACI problems are usually localized to a narrow part of the fabric overlay and fixing them does not imperil other workloads or the entire fabric.
Blog Series Links
This blog series begins in Choosing Your Next Data Center Fabric: Cisco’s ACI or EVPN, Part 1 where I explain the high level differences between these two technologies, and then continues in Choosing Your Next Data Center Fabric: Cisco’s ACI or EVPN, Part 2, where I discuss the technical differences between them.