Leveraging data is critical to success in today’s data-driven world, and the explosion of AI/ML workloads is accelerating the need for data centers that can deliver them while streamlining operations. According to the Cisco AI Readiness Index, 84% of enterprises believe AI will have a significant impact on their business, but only 14% of organizations worldwide say they are ready to integrate AI into their business.
The rapid adoption of large-scale language models (LLMs) trained on massive data sets has introduced complexity in managing production environments. Data center strategies that embrace agility, resilience, and cognitive intelligence capabilities are needed for better performance and future sustainability.
The Impact of AI on Enterprises and Data Centers
As AI continues to drive growth, realign priorities, and accelerate operations, organizations often face three key challenges:
- How do you modernize your data center networks to handle changing demands, especially AI workloads?
- How can we scale the infrastructure of our AI/ML clusters in a sustainable paradigm?
- How can they ensure end-to-end visibility and security of their data center infrastructure?
AI visibility and observability are essential to support AI/ML applications in production, but challenges remain. There is still no universal agreement on what metrics to monitor or best monitoring practices. Moreover, defining monitoring roles and the best organizational model for ML deployments continues to be debated across most organizations. With data and data centers everywhere, distributed data center environments with co-location or edge sites, encrypted connections, and traffic between sites and clouds, using IPsec or similar services for security is essential.
AI workloads that leverage inference or augmented search generation (RAG) require distributed and edge data centers with robust infrastructure for processing, security, and connectivity. Enabling encryption for secure communication across multiple sites (private or public clouds) is critical for GPU-to-GPU, application-to-application, or traditional workload-to-AI workload interactions. Networking advancements are needed to meet these requirements.
Cisco’s AI/ML approach transforms data center networking.
At Cisco Live 2024, we announced several advancements in data center networking, specifically for AI/ML applications. These include Cisco Nexus One Fabric Experience, which simplifies configuration, monitoring, and maintenance across all fabric types through a single point of control, the Cisco Nexus Dashboard. This solution simplifies management of diverse data center requirements with unified policies, reducing complexity and improving security. Nexus HyperFabric also extends the Cisco Nexus portfolio, delivering an easy-to-deploy, as-a-service offering to enhance private cloud delivery.
Nexus Dashboard integrates services to streamline software installations and upgrades, creating a more user-friendly experience that requires fewer IT resources. It also serves as a comprehensive operations and automation platform for on-premises data center networks, providing valuable capabilities such as network visualization, faster deployments, switch-level energy management, and AI-based root cause analysis for rapid performance troubleshooting.
As new deployments continue to accelerate to support AI workloads and related data trust domains, much of the focus in the network is on the physical infrastructure and the ability to build non-blocking, low-latency Ethernet. Ethernet’s ubiquity, component reliability, and superior cost-effectiveness will continue to lead the way with 800G and 1.6T roadmaps.
By enabling appropriate congestion management mechanisms, telemetry capabilities, port speeds, and latency, operators can build AI-centric clusters. Our customers are already telling us that discussions are rapidly progressing toward extending the management paradigm to fit these clusters into their existing operational models. That’s why it’s also essential to innovate toward simplifying the operator experience with new AIOps capabilities.
Cisco Validated Designs (CVDs) provide pre-configured solutions optimized for AI/ML workloads, ensuring that the network meets the specific infrastructure requirements of your AI/ML cluster, minimizing latency and packet loss to ensure smooth data flow and more efficient task completion.
Protect and connect traditional and new AI workloads in a single data center environment (edge, colocation, public or private cloud) that exceeds customer requirements for reliability, performance, operational simplicity, and sustainability. We focus on delivering operational simplicity and networking innovations such as seamless local area network (LAN), storage area network (SAN), AI/ML, and Cisco IP Fabric for Media (IPFM) implementations. This enables new use cases and greater value creation.
Our cutting-edge infrastructure and operational capabilities, including our platform vision called Cisco Networking Cloud, will be showcased at the 2024 Open Compute Project (OCP) Summit. We look forward to meeting you and sharing these developments.
share: