
Leader in hybrid, full-stack infrastructure observability, Paul has career expertise in scaling high-growth businesses fuels Virtana’s mission to advance observability for AI-powered and hybrid IT environments, establishing the company as a trusted partner to Global 2000 enterprises.
A seasoned technology executive, Paul’s track record of building high-performing teams and driving profitable growth across AI factories, IT infrastructure, cloud platforms, and enterprise applications at some of the industry’s most respected companies.
Prior to joining Virtana, Paul held leadership roles at Elastic, SAP, Salesforce, and BMC, where he led customer-focused teams and shaped successful growth strategies.
On a personal note, Paul splits his time between San Francisco and New York, where he enjoys spending time with his two college-aged children. He serves on the board of the John Maclean Foundation and is an avid surfer, skier, and cyclist.
M.R. Rangaswami: For readers who may not be familiar with Virtana, can you give us a quick picture of what the company does and the market you operate in?
The world’s largest banks, telcos, healthcare systems and airlines run on extraordinarily complex digital infrastructure with hundreds of interdependent services, networks, databases and compute environments all operating simultaneously. When those systems degrade or fail, they result in more than just technical incidents. The impacts are business-critical events that halt revenue, disrupt customer experience and put brand reputation at risk.
Virtana operates in the observability market, a category of software that monitors, analyzes and helps organizations govern their entire technology environment. However, observability is undergoing a fundamental shift. Legacy approaches focused on monitoring discrete components in isolation: the network, the application and the storage layer. That model was built for a simpler era and it is no longer sufficient. AI has turned this evolution into an operational mandate, as system-level understanding is now required to achieve autonomous IT operations and control performance, cost and risk.
Virtana delivers end-to-end AI-powered observability across the full system, including AI infrastructure. We collect approximately 20,000 metrics in subsecond time, use AI agents to discover system components and map their dependencies and identify the root cause of performance degradation or failure in hours rather than days. Our platform helps organizations reduce mean time to resolution, eliminate infrastructure waste and operate their most critical services with the visibility and control that modern enterprise demands.
The Global 2000 relies on us for operational excellence and what determines enterprise value at scale.
M.R.: “AI factories” are the next evolution of enterprise infrastructure, where AI workloads move from experimentation to full-scale production. Given that roughly a quarter of AI jobs fail today, what fundamentally breaks when organizations try to industrialize AI and what distinguishes the companies that are successfully making that transition?
Paul: The shift from AI experimentation to industrialization is where most organizations encounter their first serious operational reckoning. For the past several years, enterprises have been running AI in controlled and relatively forgiving environments: cloud sandboxes, pilot programs and proof-of-concept deployments. Those environments masked a problem that scale now exposes. The infrastructure governance required to run AI in production is categorically different from anything most organizations have built before.
An AI factory is not simply a cluster of GPUs. It is a highly complex interdependent system spanning compute, networking, storage, orchestration layers and the workloads themselves, all operating under continuous load with real business outcomes riding on its performance. When 25% of AI jobs fail, the cause isn’t the model. It’s the lack of end-to-end system visibility. Organizations cannot identify where jobs are stalling, which dependencies are creating bottlenecks or whether the infrastructure is operating efficiently because they are still relying on legacy monitoring tools designed to watch discrete components rather than govern interconnected systems. AI doesn’t break at scale because models fail. It breaks because the system running them cannot see or manage its own constraints.
The companies making the transition successfully share a common discipline where they treat the AI factory as infrastructure. They have invested in observability that captures telemetry across the entire system, uses AI agents to identify risk and causality in real time and provides the controls needed to ensure that AI workloads are available, performant and efficient. They have also closed the disconnect we consistently see in our research, where executives believe their organizations are AI-ready while their IT leadership recognizes significant gaps in governance and operational readiness.
M.R.: CIOs are under pressure to simultaneously deliver more powerful AI capabilities while reducing operational costs. In an AI factory model, where are the biggest hidden inefficiencies today, whether in GPU utilization, energy consumption or orchestration, and how should leaders rethink ROI when the cost of under-optimized infrastructure is so high?
Paul: The tension CIOs are facing is real. They are being asked to scale AI capabilities rapidly while reducing operational costs. That looks contradictory, but it isn’t if the system is properly understood end to end.
The biggest inefficiencies in AI factories today are not where most leaders are looking. GPU utilization is the most visible example, but it is often misunderstood. These are expensive assets, yet in many environments they are idle or fragmented because workloads are not orchestrated at the system level. The issue is not procurement. It is the lack of visibility into how GPUs are actually being used relative to the workloads they support.
Energy consumption is a related challenge. Underutilized or poorly scheduled workloads drive unnecessary power, cooling and data movement costs across the entire environment. This is not just a sustainability issue. It is a direct consequence of system inefficiency. Where leaders need to rethink ROI is in how they measure value. It is no longer about how much is spent on infrastructure or even how busy it is. It is about what the system is producing.
Metrics like token throughput, cost per inference, job completion rates and workload efficiency are far more meaningful indicators of return. Without connecting those outcomes to the underlying infrastructure and system behavior, inefficiencies remain hidden and costs continue to scale. The organizations that get this right are the ones that move beyond monitoring individual components and instead understand how the entire system operates.
That is what allows them to reduce cost and scale AI at the same time.