Artificial Intelligence (AI) and Machine Learning (ML) is here, and companies are doing everything in their power to leverage it. Among the torchbearers are the service providers in Asia Pacific who are building large language models (LLMs) – which are a derivative of AI/ML; and they’re rapidly deploying infrastructure to support its training and use.

The leaders driving this transition don’t see AI/ML as a trend. They believe it will transform everything they do. Once operational, they expect their LLMs to provide new and unique insights into their business, helping identify new revenue sources, finding efficiencies, highlighting potential synergies, and more.

It’s no secret that LLMs are powerful. Just look at the impact ChatGPT, the world’s most popular LLM, had on individuals, companies, and governments around the world. It’s the single most powerful way to experience the power of AI/ML. For businesses to leverage such publicly-available LLMs, however, they would need to send their data to the public cloud where the LLM is hosted. Organizations with sensitive data, like service providers, often have privacy, security, and regulatory concerns that prevent them from sending their data to public clouds.

This is why leaders in the service provider space are working on their own LLMs, and they are choosing Cisco as a preferred partner on this journey. Let’s see makes us a partner of choice.

Training, Inferencing, And Everything In Between

Over the last few years, companies preparing to work with AI/ML have organized their data and are prepared to use it to train their LLMs. However, to get started on their journey, they need to deploy the right infrastructure first.

Training an LLM is a resource-intensive process, demanding robust storage, compute, and network capabilities. While GPUs are a known critical component, it’s the synergy between CPUs, GPUs, Ethernet with RoCEv2 transport, and smart networking tools that accelerates AI/ML jobs.

AI/ML clusters have stringent infrastructure requirements. All components of the infrastructure play an important function in making large AI/ML jobs complete more quickly. It is important to understand the requirements to size the infrastructure. AI/ML clusters can efficiently use Ethernet to provide low latency and high throughput of traffic using RoCEv2 as the transport.

As critical as the network is, the tools that provide visibility are also equally important. Cisco provides the best-in-class network fabric infrastructure and also provides tools for visibility to traffic, congestion, and QoS policies, and helps bring network observability in your AI infrastructure. This is a holistic approach and ensures the infrastructure is finely tuned to meet the intensive demands of AI workloads.

Our own journey with AI/ML has taught us that these workloads exhibit unique behaviors, and thus, have unique requirements – especially when training LLMs. There’s no doubt that Cisco brings the best intelligence on the network fabric to deliver the shortest job completion time for AI/ML workloads.

As a result, we have created multiple enhancements across various offerings in our portfolio that are engineered to help optimize AI/ML workflows at scale. For example, we have specially configured our data center switches meant for use with AI workloads, powering them with intelligent solutions such as Priority Flow Control (PFC) and Explicit Congestion Notification (ECN). Together, they optimize the flow of data within LLM training clusters, making it easier and faster to train the LLMs.

Of course, there are smart configurations across our portfolio, in storage, compute, and networking. And they come together to help service providers and other organizations build the robust training infrastructure they need.

But training isn’t where the money is. The real value of an LLM is in its inferencing capability once it has been trained. This is where service providers will gain the most, leveraging the LLM to maximize benefits to all stakeholders.

The infrastructure needed for inferencing, however, is a little different than what is used on the training side of things. Since it only needs to be fed with real-time data, the capacity needed from compute and storage modules is far lower. However, the speed of the compute, storage, and networking modules need to be top-notch for the LLM to generate insights that can deliver the opportunities for organizations to take the quantum leap they dream of with AI/ML.

Having talked about the infrastructure needs of AI/ML workloads, especially of LLMs, it’s important we call out the fact that most organizations don’t have the infrastructure needed to get the job done – neither for training nor for inferencing. According to the recently published Cisco’s AI Readiness Index, infrastructure readiness (for the adoption of AI/ML) is low, with just 17% of organizations globally categorized as Pacesetters, and more than half (53%) as Followers or Laggards.

So – for organizations that want to transform themselves with AI/ML in this new era, now is the time. Service providers seem to be leading the flock everywhere because their industry is facing a complex landscape with various challenges, demands, and pressures, and LLMs hold an important lever to helping unlock growth and success for the future.

Getting Rid of Unwanted Distractions

Cisco is synonymous with networking. It’s also a robust player in the storage and compute space. Most service providers – and most large companies across industries – rely on Cisco’s technologies to power their ‘business as usual’. Their teams know they can trust Cisco’s solutions. But most importantly, they’re familiar with Cisco’s portfolio and know that the networking, storage, and compute modules fit together and integrate seamlessly with one another.

Not to forget that Cisco has been a torchbearer of AI/ML workloads in the world, using the technology in its own operations. This has allowed us to capture the learnings and bring smart configurations to our infrastructure that ultimately benefit customers.

Finally – for most organizations, sustainability is a priority. Cisco’s technology stack, powered by SiliconOne, provides energy efficiencies like no other. This again makes Cisco a natural choice for organizations exploring their path to AI/ML.

As a result, Cisco helps companies focus on the LLM they’re building and the AI/ML workloads they’re creating. The technology stack, namely the storage, compute, and networking modules, for the learning and inferencing phases of the process – all of that fades away in the background because it’s cutting-edge yet reliable and familiar. This is how service providers we work with are leapfrogging into the AI/ML era with LLMs today – and joining the Pacesetters. Those that are too slow, the Followers or Laggards, might find it hard to catch up if they wait too long.