AI Engineering

Building AI Observability in ASP.NET Core with OpenTelemetry, Metrics, and Cost Tracking

Traditional monitoring isn't enough for AI applications. Learn how to build production-grade AI observability in ASP.NET Core using OpenTelemetry, Application Insights, Serilog, distributed tracing, token metrics, latency breakdowns, cost monitoring, dashboards, and intelligent alerting.

Traditional APIs expose metrics such as:

Response time
CPU usage
Memory consumption
Exceptions

Those metrics are still important for AI applications.

However, they are no longer enough.

Once AI features reach production, teams start asking very different questions:

Which model generated this answer?
Why did latency suddenly increase?
Which prompts consume the most tokens?
Why did OpenAI fail while Claude succeeded?
How much does each endpoint cost?
Which prompts consistently produce poor results?

Traditional monitoring cannot answer these questions.

Without observability, debugging AI systems becomes expensive guesswork.

In this article, we'll build a complete observability layer for AI applications in ASP.NET Core using OpenTelemetry, Application Insights, structured logging, distributed tracing, token monitoring, cost tracking, and dashboards.

The goal is not simply to know when something fails.

The goal is to understand why.

What We'll Build

Our architecture looks like this:

               Client

                  │

        ASP.NET Core API

                  │

          AI Service Layer

                  │

      OpenTelemetry Activity

                  │

       Prompt & Token Metrics

                  │

      Azure App Insights

                  │

       Grafana / Seq / Kibana

                  │

       Dashboards & Alerts

Every AI request becomes observable from end to end.

Why AI Needs Different Monitoring

Traditional monitoring answers questions like:

Did the API fail?

AI monitoring answers questions like:

Why did the model generate this answer?

Those are very different problems.

Traditional applications typically execute deterministic logic.

AI systems execute probabilistic logic.

Two identical requests can produce different outputs.

This introduces entirely new operational challenges.

Correlation IDs

One of the most important concepts in observability is correlation.

Every request should receive a unique identifier.

Example:

CorrelationId

↓

API

↓

RAG

↓

Embedding

↓

LLM

↓

Database

↓

Response

Instead of investigating five separate systems, you can trace the entire request lifecycle using a single identifier.

Why Correlation IDs Matter

Imagine a customer reports:

The AI response took 12 seconds.

Without correlation IDs:

Search Logs

↓

Guess

↓

Search More Logs

↓

Guess Again

With correlation IDs:

Find Request

↓

View Complete Trace

↓

Identify Bottleneck

Debugging becomes dramatically easier.

Structured Logging

Many applications still generate logs like:

Request Started...

Request Finished...

These logs are difficult to query.

Instead, use structured logging.

Example:

{
  "Provider":"OpenAI",
  "Model":"gpt-4o",
  "Latency":1450,
  "PromptTokens":832,
  "CompletionTokens":251,
  "Cost":0.013,
  "RequestId":"abc123"
}

Now logs become searchable.

Questions become easy to answer:

Show all GPT-4o requests

↓

Latency > 3000ms

Structured logs turn debugging into data analysis.

What Should Be Logged?

For every AI request, consider tracking:

Provider
Model
Request ID
Latency
Prompt Tokens
Completion Tokens
Total Tokens
Cost
Success
Failure
Retry Count

These attributes become essential as systems grow.

Prompt Logging

Prompt logging deserves special attention.

Many teams either:

Log Everything

Log Nothing

Both approaches are problematic.

The Risks

Prompts may contain:

Personal information
Customer data
Internal business information
Financial information

Logging everything can create compliance issues.

Safe Logging Techniques

Truncation

Store only a portion of the prompt.

Example:

First 250 Characters

Masking

Replace sensitive values.

Example:

Customer Email

↓

********

Hashing

Store a hash instead of raw content.

Useful for:

Deduplication
Trend analysis
Privacy protection

This provides observability without exposing sensitive information.

Token Usage Dashboard

Most AI costs are directly tied to token usage.

Track:

Prompt Tokens
Completion Tokens
Cached Tokens
Daily Cost
Monthly Cost
Tokens Per User
Tokens Per Endpoint

Example dashboard:

Metric	Value
Daily Tokens	1.8M
Daily Cost	$42
Avg Tokens/Request	1,120
GPT-4o Requests	5,800

Without token visibility, cost optimization becomes impossible.

Latency Breakdown

One of the biggest observability mistakes is measuring only total latency.

Example:

Total Latency

2073ms

Useful.

But incomplete.

Instead measure each stage.

Embedding

320ms

↓

Vector Search

41ms

↓

LLM

1700ms

↓

Formatting

12ms

Now the bottleneck becomes obvious.

The model call is responsible for most of the delay.

Without breakdowns, teams often optimize the wrong component.

Distributed Tracing with OpenTelemetry

Modern AI applications span multiple systems.

A typical request might look like:

HTTP Request

↓

AI Gateway

↓

RAG Retrieval

↓

OpenAI

↓

Redis

↓

PostgreSQL

↓

Response

Distributed tracing allows us to visualize the entire execution path.

OpenTelemetry Activities

Create activities for:

Chat Requests
Embedding Generation
Vector Search
Prompt Construction
LLM Completion
Response Processing

Each activity contributes to a complete trace.

This makes complex AI workflows understandable.

Health Monitoring

Traditional health checks aren't sufficient.

AI systems require additional signals.

Monitor:

Provider Availability
API Quota Status
Rate Limits
Retry Counts
Completion Failures
Embedding Failures
Token Consumption

Health monitoring should answer:

Can the AI system actually serve requests?

Not merely:

Is the API running?

Cost Monitoring

Many teams discover their AI costs only after receiving a cloud bill.

By then it is too late.

Cost should be monitored continuously.

Track:

Today's Cost

Cost Per User

Cost Per Feature

Cost Per Provider

Cost Per Model

Example:

Feature	Daily Cost
Chat	$28
RAG	$9
Embeddings	$4
Summaries	$2

Now optimization decisions become data-driven.

Provider Comparison Metrics

If your application supports multiple providers, compare them.

Track:

Latency
Success Rate
Cost
Token Efficiency
Error Rate

Example:

Provider	Avg Latency
OpenAI	1.7s
Claude	2.1s
Gemini	1.2s
Ollama	0.8s

These metrics help determine routing strategies.

Alerting

Observability without alerts is incomplete.

Teams need proactive notifications.

Examples:

High Latency

GPT Latency > 5 Seconds

Token Spike

Daily Token Usage Doubles

Error Rate Increase

Error Rate > 10%

Embedding Failures

Embedding Service Unavailable

Local Model Offline

Ollama Unreachable

Good alerts identify issues before customers do.

Dashboard Design

Useful dashboards typically include:

AI Request Volume

Requests per minute.

Provider Comparison

Latency and success rates.

Token Usage

Daily and monthly consumption.

Cost Trends

Feature-level spending.

Trace Explorer

End-to-end request execution.

Failure Analysis

Retries and exceptions.

These views provide operational confidence.

Common Mistakes

These issues appear frequently in production environments.

Logging Entire Prompts

Creates privacy and compliance risks.

Ignoring Cost

Often leads to unpleasant surprises.

No Correlation IDs

Makes debugging significantly harder.

No Provider Metrics

Provider-specific issues become invisible.

No Distributed Tracing

Complex workflows become difficult to diagnose.

No Retry Visibility

Failures appear random.

No Latency Breakdown

Teams optimize the wrong components.

Repository Features

The reference implementation includes:

ASP.NET Core Web API
OpenTelemetry
Azure Application Insights
Serilog
Structured Logging
Correlation IDs
Token Metrics
Cost Monitoring
Distributed Tracing
Health Checks
Docker Support
Sample Grafana Dashboard

The goal is to provide complete visibility into AI workloads.

Suggested Screenshots

Include the following visuals.

OpenTelemetry Trace

Complete request lifecycle.

Application Insights Dashboard

Request telemetry and failures.

Token Usage Dashboard

Prompt and completion token trends.

Request Timeline

Latency breakdown by component.

Provider Comparison Dashboard

Latency and success rate metrics.

Structured Logs

Provider, tokens, latency, and cost information.

These screenshots significantly increase the practical value of the article.

Conclusion

AI applications introduce challenges that traditional monitoring was never designed to solve.

Understanding latency, token usage, prompt behavior, provider reliability, and operational costs requires a dedicated observability strategy.

By combining OpenTelemetry, structured logging, distributed tracing, token metrics, cost monitoring, dashboards, and intelligent alerting, we gain visibility into every stage of the AI lifecycle.

This visibility transforms AI operations from guesswork into engineering.

Observability tells us what happened, but modern AI systems often perform multiple coordinated tasks instead of a single request.

In the next article, we'll explore how to orchestrate specialized AI agents using Semantic Kernel and build workflows where multiple agents collaborate to solve complex problems.

6 min read

Jul 19, 2025

By Dheer Gupta

Your email address will not be published. Required fields are marked *

Comment

Name

Website

Save my name, email, and website in this browser for the next time I comment.

Building AI Observability in ASP.NET Core with OpenTelemetry, Metrics, and Cost Tracking

What We'll Build

Why AI Needs Different Monitoring

Correlation IDs

Why Correlation IDs Matter

Structured Logging

What Should Be Logged?

Prompt Logging

The Risks

Safe Logging Techniques

Truncation

Masking

Hashing

Token Usage Dashboard

Latency Breakdown

Distributed Tracing with OpenTelemetry

OpenTelemetry Activities

Health Monitoring

Cost Monitoring

Provider Comparison Metrics

Alerting

High Latency

Token Spike

Error Rate Increase

Embedding Failures

Local Model Offline

Dashboard Design

AI Request Volume

Provider Comparison

Token Usage

Cost Trends

Trace Explorer

Failure Analysis

Common Mistakes

Logging Entire Prompts

Ignoring Cost

No Correlation IDs

No Provider Metrics

No Distributed Tracing

No Retry Visibility

No Latency Breakdown

Repository Features

Suggested Screenshots

OpenTelemetry Trace

Application Insights Dashboard

Token Usage Dashboard

Request Timeline

Provider Comparison Dashboard

Structured Logs

Conclusion

Leave a comment

Related posts