Why Model Context Protocol is the runtime for Agentic FinOps

Dec 18, 2025
6 min read

Updated: Jan 2

What you'll learn: Why MCP solves the 24-48 hour billing lag and how it enables real-time cost intelligence for AI workloads.

Part of the FinOps for AI series.

→ Start here: The Dawn of Agentic FinOps

← Previous post: Why AI costs management require Real-Time Intelligence, not another dashboard

The Real-Time Problem

AWS Cost Explorer updates once daily. CUR files arrive in batches. The billing system operates on a delay ranging from 4 hours to 5 days, depending on the service.

For traditional workloads, this lag is tolerable. For AI workloads, it creates blind spots that cost thousands.

The problem intensifies with data-intensive ML pipelines such as Genomics. Modern sequencers generate terabytes per day. A single NovaSeq 6000 run produces up to 6 Tb of data. Population-scale studies routinely process petabytes. When these datasets move across regions for GPU access, costs accumulate faster than billing systems can report.

Consider a genomics research team running an ML-enhanced variant calling pipeline. On Friday at 6 PM, they launch the job on 200 whole genomes. Each genome generates approximately 50 Gb of sequencing data. Total dataset: 10 Tb.

The data sits in us-east-1, where the team's primary S3 buckets reside. GPU capacity for the deep learning variant caller is available only in us-west-2. This is not unusual: GPU instance availability varies by region, and teams take capacity where they can find it. Cross-region transfer costs $0.02 per Gb.

The pipeline pulls the full dataset per training epoch. Four epochs over the weekend: 10Tb × 4 × $0.02/Gb = $800 in cross-region transfer alone.

Compounding the problem: a misconfigured checkpoint export. The pipeline was set to export model checkpoints every 15 minutes instead of every 4 hours. Each checkpoint: 10Gb. Over 48 hours: 192 exports × 10Gb × $0.02 = $384 in checkpoint data movement. A simple configuration error, invisible to traditional monitoring.

Total weekend surprise: $1,184 in data transfer unplanned costs, before compute costs. Detection with CUDOS happens on Tuesday 10 AM, 4 days later. Detection with an MCP-enabled agent happens in near real-time on Friday 9 PM, 3 hours in, before the second epoch completes. Potential savings: $900+, that is 80% of those data transfer costs could have been avoided.

This pattern repeats across organisations. The billing lag that served traditional IT workloads creates unacceptable exposure for AI operations.

What Is MCP?

Model Context Protocol (MCP) is an open standard that connects large language models to external data sources and tools. Developed by Anthropic and now adopted by AWS, Google, and Microsoft, MCP provides a universal interface for AI agents to query systems in real time.

The protocol operates on a simple principle: standardise the interface, not the implementation. An MCP server exposes data through a consistent API. The agent queries that API without knowing or caring about the underlying system. Cost Explorer, CloudWatch, custom databases, all appear identical to the agent.

For FinOps, MCP decouples the agent from the data source. The agent does not need custom integrations for Cost Explorer, CloudWatch, or billing APIs. It speaks MCP. Each data source implements its own MCP server. The result is real-time access without vendor lock-in.

3 advantages define MCP for cost management:

Real-time data access.

The agent queries Cost Explorer APIs directly, not yesterday's SPICE cache. Anomalies surface in seconds, not days. When a training job starts consuming unexpected resources, the agent detects it immediately, not 48 hours later when the CUR refreshes.

Governance loops at the agent level.

Before executing any action, the agent can check budget constraints, approval requirements, and policy rules. Governance becomes part of the workflow, not an afterthought. The agent that detects the anomaly can also enforce the policy response.

Reusable, version-controlled integrations.

The same MCP server works with Claude Desktop, AWS Bedrock, or custom agents. Deploy once, use everywhere. Update once, benefit everywhere.

MCP transforms FinOps from scheduled reporting to continuous monitoring. The agent does not wait for dashboards to refresh. It queries live APIs the moment anomalies emerge.

Why Standardisation Matters

Traditional FinOps tools require separate connectors for each cloud, each billing API, each monitoring system. Every connector demands maintenance. API changes break pipelines. The integration tax compounds with every new data source.

MCP inverts this model. The agent speaks one protocol. Each data source implements its own MCP server.

Adding Azure to an AWS-only setup requires deploying an Azure Cost MCP server—no changes to the agent, no new dashboards, no retraining.

The same principle applies across clouds. AWS, Azure, GCP, each provider's cost data becomes accessible through the same interface. Multi-cloud cost intelligence without multi-vendor tooling. The 87% of enterprises operating across multiple clouds gain unified visibility without unified lock-in.

Version control applies to integrations as it does to application code. MCP servers can be tested, deployed, and rolled back independently. When AWS updates its Cost Explorer API, you update one MCP server. The agent continues operating without modification. When a new cost dimension becomes available, you extend the server. The agent gains capability without redeployment.

This modularity matters for enterprise adoption. Security teams can audit MCP servers independently. Platform teams can manage server deployments through existing CI/CD pipelines. FinOps practitioners can extend capabilities without engaging development resources. The separation of concerns that makes software maintainable applies equally to AI integrations.

Architecture: Minimal Working Example

A production-ready FinOps agent requires three components: a frontend for user interaction, MCP servers for data access, and AWS credentials with appropriate permissions.

The architecture follows a straightforward pattern. User queries flow to the LLM, Claude Desktop for local use, an EC2 instance for enterprise deployments. The LLM routes requests through the MCP protocol to specialised servers: AWS Cost Explorer MCP for billing data, CloudWatch MCP for utilisation metrics, Bedrock MCP for inference costs.

The minimal stack requires no custom development:

Frontend: Claude Desktop (free) or Chainlit for custom interfaces
MCP servers: Open-source implementations for AWS Cost Explorer, CloudWatch, and API operations
Backend: Your AWS account with read-only IAM credentials

Total setup time: 15 minutes.

MCP Set up with claude Desktop — MCP local set up with Claude desktop

Compare this with traditional dashboard deployment. CUDOS requires CloudFormation templates, Athena views, and QuickSight SPICE configuration. Minimum deployment time: 60-90 minutes. Data latency: 24-48 hours. Customisation demands SQL and QuickSight expertise. Each modification requires understanding the underlying data model.

The MCP approach inverts the complexity profile. Initial setup is trivial. Customisation happens through natural language. The agent interprets intent and constructs appropriate queries. No SQL required. No dashboard design skills needed.

The genomics team from Section 1 could have asked: "What is my current cross-region data transfer for the variant-calling job?"

The agent queries Cost Explorer, correlates with the running job, and responds in seconds, not days.

Follow-up questions refine the analysis: "Break that down by hour." "Which S3 buckets are the source?" "What would this cost if I moved the data to us-west-2 first?"

Each query executes against live data.

GPU Capacity Reservation Monitoring

Organisations purchase EC2 Capacity Reservations or Capacity Blocks for ML to guarantee GPU access for training jobs. P4d instances cost $32 per hour. P5 instances reach $98 per hour. Unused reservations burn cash at industrial scale!

The business case for reservations is sound. GPU availability remains constrained. Capacity Blocks guarantee access for scheduled training runs. The risk emerges when reservations sit idle, or when training jobs finish early, when experiments are delayed, when teams forget to release capacity.

CUDOS provides an 'Unused On-Demand Capacity Reservations Cost per Account' visual. The data refreshes daily. CloudWatch captures GPU utilisation metrics per instance in real time. The gap between these systems creates a blind spot. You know what you paid. You know what you used. Connecting those facts requires manual effort.

Cost data lives in the Cost and Usage Report. Utilisation data lives in CloudWatch. CUDOS shows you paid for capacity. CloudWatch shows GPUs sat idle. Correlating them manually takes hours of spreadsheet work. By the time you complete the analysis, the situation has changed.

An MCP-enabled agent queries both sources simultaneously. The question: "Are my reserved P4d instances being utilised?" The agent fetches Cost Explorer reservation costs and CloudWatch GPU metrics. Correlation completes in seconds: the response: "4 of 8 reserved GPUs averaged 15% utilisation over the past 6 hours. Estimated waste: $480."

You do not wait for tomorrow's dashboard. The agent surfaces underutilisation while the training job is still running; time enough to reassign capacity, adjust workloads, or release unused reservations before the next billing hour.

Conclusion and the dawn of Agentic FinOps

MCP closes the 24-48 hour billing gap. It transforms FinOps from scheduled reporting to continuous intervention. Genomics pipelines pulling terabytes cross-region, GPU capacity reservations sitting idle, misconfigured checkpoint exports, all those unexpected costs are avoidable when dealt with in real time.

The economic case is straightforward. Traditional dashboards report what happened. MCP-enabled agents intervene while it is still happening. The genomics team saves $900 not by optimising after the fact, but by catching the anomaly three hours after the job launch, not 4 days after. The ML platform team reclaims idle GPU capacity the same day, not the following week.

The protocol is open. The implementations are available. The setup takes 15 minutes. The barrier to adoption is not technology, it is awareness that the alternative exists.

Next week: How MCP unifies cost, security, and compliance governance into a single policy layer.

Explore our open-source MCP implementations: github.com/optimnow/finops-mcp-resources

Compare LLM pricing, token costs, monthly budgets, and cloud compute pricing here

Want to know how to optimize your spending?: Estimate your saving here

Risk-free optimization consulting, guaranteed results - Schedule your call today!