I'm always excited to take on new projects and collaborate with innovative minds.
Traditional monitoring isn't enough for AI applications. Learn how to build production-grade AI observability in ASP.NET Core using OpenTelemetry, Application Insights, Serilog, distributed tracing, token metrics, latency breakdowns, cost monitoring, dashboards, and intelligent alerting.
Traditional APIs expose metrics such as:
Those metrics are still important for AI applications.
However, they are no longer enough.
Once AI features reach production, teams start asking very different questions:
Traditional monitoring cannot answer these questions.
Without observability, debugging AI systems becomes expensive guesswork.
In this article, we'll build a complete observability layer for AI applications in ASP.NET Core using OpenTelemetry, Application Insights, structured logging, distributed tracing, token monitoring, cost tracking, and dashboards.
The goal is not simply to know when something fails.
The goal is to understand why.
Our architecture looks like this:
Client
│
ASP.NET Core API
│
AI Service Layer
│
OpenTelemetry Activity
│
Prompt & Token Metrics
│
Azure App Insights
│
Grafana / Seq / Kibana
│
Dashboards & Alerts
Every AI request becomes observable from end to end.
Traditional monitoring answers questions like:
Did the API fail?
AI monitoring answers questions like:
Why did the model generate this answer?
Those are very different problems.
Traditional applications typically execute deterministic logic.
AI systems execute probabilistic logic.
Two identical requests can produce different outputs.
This introduces entirely new operational challenges.
One of the most important concepts in observability is correlation.
Every request should receive a unique identifier.
Example:
CorrelationId
↓
API
↓
RAG
↓
Embedding
↓
LLM
↓
Database
↓
Response
Instead of investigating five separate systems, you can trace the entire request lifecycle using a single identifier.
Imagine a customer reports:
The AI response took 12 seconds.
Without correlation IDs:
Search Logs
↓
Guess
↓
Search More Logs
↓
Guess Again
With correlation IDs:
Find Request
↓
View Complete Trace
↓
Identify Bottleneck
Debugging becomes dramatically easier.
Many applications still generate logs like:
Request Started...
Request Finished...
These logs are difficult to query.
Instead, use structured logging.
Example:
{
"Provider":"OpenAI",
"Model":"gpt-4o",
"Latency":1450,
"PromptTokens":832,
"CompletionTokens":251,
"Cost":0.013,
"RequestId":"abc123"
}
Now logs become searchable.
Questions become easy to answer:
Show all GPT-4o requests
↓
Latency > 3000ms
Structured logs turn debugging into data analysis.
For every AI request, consider tracking:
These attributes become essential as systems grow.
Prompt logging deserves special attention.
Many teams either:
Log Everything
or
Log Nothing
Both approaches are problematic.
Prompts may contain:
Logging everything can create compliance issues.
Store only a portion of the prompt.
Example:
First 250 Characters
Replace sensitive values.
Example:
Customer Email
↓
********
Store a hash instead of raw content.
Useful for:
This provides observability without exposing sensitive information.
Most AI costs are directly tied to token usage.
Track:
Example dashboard:
| Metric | Value |
|---|---|
| Daily Tokens | 1.8M |
| Daily Cost | $42 |
| Avg Tokens/Request | 1,120 |
| GPT-4o Requests | 5,800 |
Without token visibility, cost optimization becomes impossible.
One of the biggest observability mistakes is measuring only total latency.
Example:
Total Latency
2073ms
Useful.
But incomplete.
Instead measure each stage.
Embedding
320ms
↓
Vector Search
41ms
↓
LLM
1700ms
↓
Formatting
12ms
Now the bottleneck becomes obvious.
The model call is responsible for most of the delay.
Without breakdowns, teams often optimize the wrong component.
Modern AI applications span multiple systems.
A typical request might look like:
HTTP Request
↓
AI Gateway
↓
RAG Retrieval
↓
OpenAI
↓
Redis
↓
PostgreSQL
↓
Response
Distributed tracing allows us to visualize the entire execution path.
Create activities for:
Each activity contributes to a complete trace.
This makes complex AI workflows understandable.
Traditional health checks aren't sufficient.
AI systems require additional signals.
Monitor:
Health monitoring should answer:
Can the AI system actually serve requests?
Not merely:
Is the API running?
Many teams discover their AI costs only after receiving a cloud bill.
By then it is too late.
Cost should be monitored continuously.
Track:
Today's Cost
Cost Per User
Cost Per Feature
Cost Per Provider
Cost Per Model
Example:
| Feature | Daily Cost |
|---|---|
| Chat | $28 |
| RAG | $9 |
| Embeddings | $4 |
| Summaries | $2 |
Now optimization decisions become data-driven.
If your application supports multiple providers, compare them.
Track:
Example:
| Provider | Avg Latency |
|---|---|
| OpenAI | 1.7s |
| Claude | 2.1s |
| Gemini | 1.2s |
| Ollama | 0.8s |
These metrics help determine routing strategies.
Observability without alerts is incomplete.
Teams need proactive notifications.
Examples:
GPT Latency > 5 Seconds
Daily Token Usage Doubles
Error Rate > 10%
Embedding Service Unavailable
Ollama Unreachable
Good alerts identify issues before customers do.
Useful dashboards typically include:
Requests per minute.
Latency and success rates.
Daily and monthly consumption.
Feature-level spending.
End-to-end request execution.
Retries and exceptions.
These views provide operational confidence.
These issues appear frequently in production environments.
Creates privacy and compliance risks.
Often leads to unpleasant surprises.
Makes debugging significantly harder.
Provider-specific issues become invisible.
Complex workflows become difficult to diagnose.
Failures appear random.
Teams optimize the wrong components.
The reference implementation includes:
The goal is to provide complete visibility into AI workloads.
Include the following visuals.
Complete request lifecycle.
Request telemetry and failures.
Prompt and completion token trends.
Latency breakdown by component.
Latency and success rate metrics.
Provider, tokens, latency, and cost information.
These screenshots significantly increase the practical value of the article.
AI applications introduce challenges that traditional monitoring was never designed to solve.
Understanding latency, token usage, prompt behavior, provider reliability, and operational costs requires a dedicated observability strategy.
By combining OpenTelemetry, structured logging, distributed tracing, token metrics, cost monitoring, dashboards, and intelligent alerting, we gain visibility into every stage of the AI lifecycle.
This visibility transforms AI operations from guesswork into engineering.
Observability tells us what happened, but modern AI systems often perform multiple coordinated tasks instead of a single request.
In the next article, we'll explore how to orchestrate specialized AI agents using Semantic Kernel and build workflows where multiple agents collaborate to solve complex problems.
Your email address will not be published. Required fields are marked *