Your cloud bill says $47,000/month. But is that good or bad? You have 500 customers—is that $94 per customer? Or do 10 enterprise clients account for 80% of costs? When an AI feature suddenly adds $15,000 to the bill, which customers caused it?
Traditional FinOps stops at the monthly bill. Modern SaaS companies need unit economics: cost-per-tenant, cost-per-request, cost-per-AI-token. This guide shows you how to implement granular cost tracking in Kubernetes using resource tagging, Prometheus metrics, and Grafana dashboards that reveal your true margins.
Traditional cloud cost management focuses on aggregate numbers. Your AWS Cost Explorer shows:
| Service | Cost |
|---|---|
| EC2 / EKS Compute | $18,500 |
| RDS Database | $8,200 |
| S3 Storage | $3,400 |
| Data Transfer | $6,100 |
| OpenAI API Costs | $14,800 |
| Total | $51,000 |
This tells you nothing about profitability. Critical questions remain unanswered:
Instead of tracking total spend, track these unit economics:
CPT = Total Infrastructure Cost / Active Tenants
If infrastructure costs $50K/month and you have 500 active tenants: CPT = $100
But averages lie. You need per-tenant costs:
| Customer | Plan | Monthly Cost | MRR | Margin |
|---|---|---|---|---|
| Acme Corp | Enterprise | $850 | $5,000 | 83% |
| Beta Inc | Pro | $120 | $299 | 60% |
| Gamma LLC | Starter | $68 | $49 | -39% |
| Delta Co | Free | $22 | $0 | -100% |
Now you can make informed decisions: Upgrade Gamma to a higher tier, add usage limits to free tier, or optimize infrastructure for high-cost tenants.
CPR = Total Infrastructure Cost / Total API Requests
If you handle 50 million requests/month at $50K cost: CPR = $0.001 (0.1¢)
This metric reveals efficiency trends. If CPR increases over time, your infrastructure isn't scaling linearly with traffic. You might need better caching, database query optimization, or architectural changes.
AI features (LLM APIs, embeddings, image generation) can explode costs unpredictably. You need:
Here's how to implement granular cost tracking in Kubernetes:
Kubernetes labels enable cost attribution. Add labels to all pods, namespaces, and persistent volumes:
apiVersion: v1
kind: Pod
metadata:
name: api-server
labels:
app: api-server
tenant: acme-corp # Customer identifier
environment: production
cost-center: engineering
feature: core-api # Feature attribution
spec:
containers:
- name: api
image: myapp:v1.2.3
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"The tenant label is crucial—it ties infrastructure resources to specific customers. For multi-tenant applications, inject this label dynamically during deployment.
AWS, GCP, and Azure allow tagging resources for cost reporting. Sync Kubernetes labels to cloud tags:
# AWS EKS: Use Cost Allocation Tags
# In your node group / EC2 instances, add tags:
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=tenant,Value=acme-corp \
Key=environment,Value=production
# Enable Cost Allocation Tags in AWS Cost Explorer
aws ce list-cost-allocation-tags
aws ce update-cost-allocation-tags-status \
--cost-allocation-tags-status Status=Active,TagKey=tenantUse OpenCost (open-source, CNCF project) to track Kubernetes resource costs:
# Install OpenCost with Helm helm repo add opencost https://opencost.github.io/opencost-helm-chart helm install opencost opencost/opencost \ --namespace opencost \ --create-namespace \ --set opencost.prometheus.external.url=http://prometheus.monitoring:9090
OpenCost calculates the cost of each pod based on CPU/memory requests and cloud provider pricing. It exposes Prometheus metrics you can query and visualize in Grafana.
Create Grafana dashboards that overlay cost data with business metrics. Example PromQL queries:
# Total cost by tenant sum(container_cpu_allocation * on (node) node_cpu_hourly_cost) by (tenant) # Cost-per-request for API service sum(rate(http_requests_total[5m])) / sum(container_cpu_allocation * on (node) node_cpu_hourly_cost) # Daily AI token costs sum(increase(openai_tokens_total[24h]) * 0.00003) # Assumes $0.03/1K tokens
The key insight: Your cost dashboard should look like your business dashboard. Overlay MRR, active users, and feature usage with infrastructure costs to see margin trends in real-time.
Cloud provider bills don't show AI API costs (OpenAI, Anthropic, etc.). You need to instrument your application:
// TypeScript / Node.js example
import { Counter, Histogram } from 'prom-client';
const tokenCounter = new Counter({
name: 'ai_tokens_total',
help: 'Total AI tokens consumed',
labelNames: ['tenant', 'model', 'feature', 'direction'] // direction = input/output
});
const tokenCost = new Histogram({
name: 'ai_cost_dollars',
help: 'AI API cost in dollars',
labelNames: ['tenant', 'model', 'feature']
});
async function callOpenAI(prompt: string, tenant: string) {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }]
});
const inputTokens = response.usage.prompt_tokens;
const outputTokens = response.usage.completion_tokens;
const cost = (inputTokens * 0.00003) + (outputTokens * 0.00006); // GPT-4 pricing
tokenCounter.inc({ tenant, model: 'gpt-4', feature: 'chat', direction: 'input' }, inputTokens);
tokenCounter.inc({ tenant, model: 'gpt-4', feature: 'chat', direction: 'output' }, outputTokens);
tokenCost.observe({ tenant, model: 'gpt-4', feature: 'chat' }, cost);
return response;
}Now you can query Prometheus for per-tenant AI costs and set budget alerts when customers exceed thresholds.
Once you have visibility into unit economics, you can optimize strategically:
If your Starter plan costs $49/month and infrastructure cost is $35/tenant, you have $14 margin. If a user makes 10,000 AI requests, you lose money. Set limits:
LLM responses to similar prompts can be cached with semantic similarity matching:
Most pods are over-provisioned. Use Vertical Pod Autoscaler (VPA) recommendations:
# Install VPA kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml # Analyze a deployment kubectl describe vpa my-deployment-vpa # Typical findings: # Requested: 2 CPU, 4Gi RAM # Actual usage: 0.3 CPU, 1.2Gi RAM # Potential savings: 65%
Not all queries need GPT-4. Implement tiered model routing:
| Use Case | Model | Cost/1K Tokens | Savings vs GPT-4 |
|---|---|---|---|
| Simple classification | GPT-3.5 Turbo | $0.002 | 93% |
| Summarization | Claude Haiku | $0.0008 | 97% |
| Complex reasoning | GPT-4 Turbo | $0.03 | baseline |
| Code generation | GPT-4 | $0.06 | -100% |
A B2B analytics platform serving 800 customers implemented unit economics tracking. Here's what they discovered:
Before (Aggregate Metrics Only):
After (Per-Tenant Unit Economics):
Root cause: Low-tier customers were running unoptimized queries generating massive database load. High-tier customers had dedicated infrastructure and query optimization.
Action taken:
Implementing comprehensive FinOps requires expertise in Kubernetes, Prometheus, cloud billing APIs, and cost allocation strategies. It typically takes engineering teams 2-3 months to build and requires ongoing maintenance.
HostingX's Managed FinOps Service includes:
Our FinOps service typically identifies 25-40% in cost savings within the first month through right-sizing, idle resource cleanup, and commitment discounts. Most clients ROI-positive within 2 weeks.
Infrastructure teams are often seen as cost centers—necessary overhead that doesn't directly generate revenue. But when you implement unit economics, infrastructure becomes a strategic profit driver.
You can answer questions that define product strategy:
Traditional FinOps stops at monthly bills. Modern SaaS companies need per-tenant, per-request, per-feature visibility. The infrastructure to build this exists—Kubernetes labels, Prometheus metrics, OpenCost, Grafana. What's missing is the expertise and time to implement it. That's where platform engineering partners become invaluable.
HostingX IL provides Platform Engineering and FinOps services for B2B SaaS companies running on Kubernetes. We implement granular cost tracking, optimization strategies, and predictive budgeting so your team can focus on product, not cloud bills. Learn more about our FinOps & Cost Optimization Services.
HostingX IL
Scalable automation & integration platform accelerating modern B2B product teams.
Services
Subscribe to our newsletter
Get monthly email updates about improvements.
Copyright © 2025 HostingX IL. All Rights Reserved.