I'm always excited to take on new projects and collaborate with innovative minds.

Social Links

AI Engineering

Building AI Observability in ASP.NET Core with OpenTelemetry, Metrics, and Cost Tracking

Traditional monitoring isn't enough for AI applications. Learn how to build production-grade AI observability in ASP.NET Core using OpenTelemetry, Application Insights, Serilog, distributed tracing, token metrics, latency breakdowns, cost monitoring, dashboards, and intelligent alerting.

Traditional APIs expose metrics such as:

  • Response time
  • CPU usage
  • Memory consumption
  • Exceptions

Those metrics are still important for AI applications.

However, they are no longer enough.

Once AI features reach production, teams start asking very different questions:

  • Which model generated this answer?
  • Why did latency suddenly increase?
  • Which prompts consume the most tokens?
  • Why did OpenAI fail while Claude succeeded?
  • How much does each endpoint cost?
  • Which prompts consistently produce poor results?

Traditional monitoring cannot answer these questions.

Without observability, debugging AI systems becomes expensive guesswork.

In this article, we'll build a complete observability layer for AI applications in ASP.NET Core using OpenTelemetry, Application Insights, structured logging, distributed tracing, token monitoring, cost tracking, and dashboards.

The goal is not simply to know when something fails.

The goal is to understand why.


What We'll Build

Our architecture looks like this:

               Client

                  │

        ASP.NET Core API

                  │

          AI Service Layer

                  │

      OpenTelemetry Activity

                  │

       Prompt & Token Metrics

                  │

      Azure App Insights

                  │

       Grafana / Seq / Kibana

                  │

       Dashboards & Alerts

Every AI request becomes observable from end to end.


Why AI Needs Different Monitoring

Traditional monitoring answers questions like:

Did the API fail?

AI monitoring answers questions like:

Why did the model generate this answer?

Those are very different problems.

Traditional applications typically execute deterministic logic.

AI systems execute probabilistic logic.

Two identical requests can produce different outputs.

This introduces entirely new operational challenges.


Correlation IDs

One of the most important concepts in observability is correlation.

Every request should receive a unique identifier.

Example:

CorrelationId

↓

API

↓

RAG

↓

Embedding

↓

LLM

↓

Database

↓

Response

Instead of investigating five separate systems, you can trace the entire request lifecycle using a single identifier.


Why Correlation IDs Matter

Imagine a customer reports:

The AI response took 12 seconds.

Without correlation IDs:

Search Logs

↓

Guess

↓

Search More Logs

↓

Guess Again

With correlation IDs:

Find Request

↓

View Complete Trace

↓

Identify Bottleneck

Debugging becomes dramatically easier.


Structured Logging

Many applications still generate logs like:

Request Started...
Request Finished...

These logs are difficult to query.

Instead, use structured logging.

Example:

{
  "Provider":"OpenAI",
  "Model":"gpt-4o",
  "Latency":1450,
  "PromptTokens":832,
  "CompletionTokens":251,
  "Cost":0.013,
  "RequestId":"abc123"
}

Now logs become searchable.

Questions become easy to answer:

Show all GPT-4o requests

↓

Latency > 3000ms

Structured logs turn debugging into data analysis.


What Should Be Logged?

For every AI request, consider tracking:

  • Provider
  • Model
  • Request ID
  • Latency
  • Prompt Tokens
  • Completion Tokens
  • Total Tokens
  • Cost
  • Success
  • Failure
  • Retry Count

These attributes become essential as systems grow.


Prompt Logging

Prompt logging deserves special attention.

Many teams either:

Log Everything

or

Log Nothing

Both approaches are problematic.


The Risks

Prompts may contain:

  • Personal information
  • Customer data
  • Internal business information
  • Financial information

Logging everything can create compliance issues.


Safe Logging Techniques

Truncation

Store only a portion of the prompt.

Example:

First 250 Characters

Masking

Replace sensitive values.

Example:

Customer Email

↓

********

Hashing

Store a hash instead of raw content.

Useful for:

  • Deduplication
  • Trend analysis
  • Privacy protection

This provides observability without exposing sensitive information.


Token Usage Dashboard

Most AI costs are directly tied to token usage.

Track:

  • Prompt Tokens
  • Completion Tokens
  • Cached Tokens
  • Daily Cost
  • Monthly Cost
  • Tokens Per User
  • Tokens Per Endpoint

Example dashboard:

MetricValue
Daily Tokens1.8M
Daily Cost$42
Avg Tokens/Request1,120
GPT-4o Requests5,800

Without token visibility, cost optimization becomes impossible.


Latency Breakdown

One of the biggest observability mistakes is measuring only total latency.

Example:

Total Latency

2073ms

Useful.

But incomplete.

Instead measure each stage.

Embedding

320ms

↓

Vector Search

41ms

↓

LLM

1700ms

↓

Formatting

12ms

Now the bottleneck becomes obvious.

The model call is responsible for most of the delay.

Without breakdowns, teams often optimize the wrong component.


Distributed Tracing with OpenTelemetry

Modern AI applications span multiple systems.

A typical request might look like:

HTTP Request

↓

AI Gateway

↓

RAG Retrieval

↓

OpenAI

↓

Redis

↓

PostgreSQL

↓

Response

Distributed tracing allows us to visualize the entire execution path.


OpenTelemetry Activities

Create activities for:

  • Chat Requests
  • Embedding Generation
  • Vector Search
  • Prompt Construction
  • LLM Completion
  • Response Processing

Each activity contributes to a complete trace.

This makes complex AI workflows understandable.


Health Monitoring

Traditional health checks aren't sufficient.

AI systems require additional signals.

Monitor:

  • Provider Availability
  • API Quota Status
  • Rate Limits
  • Retry Counts
  • Completion Failures
  • Embedding Failures
  • Token Consumption

Health monitoring should answer:

Can the AI system actually serve requests?

Not merely:

Is the API running?

Cost Monitoring

Many teams discover their AI costs only after receiving a cloud bill.

By then it is too late.

Cost should be monitored continuously.

Track:

Today's Cost
Cost Per User
Cost Per Feature
Cost Per Provider
Cost Per Model

Example:

FeatureDaily Cost
Chat$28
RAG$9
Embeddings$4
Summaries$2

Now optimization decisions become data-driven.


Provider Comparison Metrics

If your application supports multiple providers, compare them.

Track:

  • Latency
  • Success Rate
  • Cost
  • Token Efficiency
  • Error Rate

Example:

ProviderAvg Latency
OpenAI1.7s
Claude2.1s
Gemini1.2s
Ollama0.8s

These metrics help determine routing strategies.


Alerting

Observability without alerts is incomplete.

Teams need proactive notifications.

Examples:

High Latency

GPT Latency > 5 Seconds

Token Spike

Daily Token Usage Doubles

Error Rate Increase

Error Rate > 10%

Embedding Failures

Embedding Service Unavailable

Local Model Offline

Ollama Unreachable

Good alerts identify issues before customers do.


Dashboard Design

Useful dashboards typically include:

AI Request Volume

Requests per minute.


Provider Comparison

Latency and success rates.


Token Usage

Daily and monthly consumption.


Cost Trends

Feature-level spending.


Trace Explorer

End-to-end request execution.


Failure Analysis

Retries and exceptions.

These views provide operational confidence.


Common Mistakes

These issues appear frequently in production environments.


Logging Entire Prompts

Creates privacy and compliance risks.


Ignoring Cost

Often leads to unpleasant surprises.


No Correlation IDs

Makes debugging significantly harder.


No Provider Metrics

Provider-specific issues become invisible.


No Distributed Tracing

Complex workflows become difficult to diagnose.


No Retry Visibility

Failures appear random.


No Latency Breakdown

Teams optimize the wrong components.


Repository Features

The reference implementation includes:

  • ASP.NET Core Web API
  • OpenTelemetry
  • Azure Application Insights
  • Serilog
  • Structured Logging
  • Correlation IDs
  • Token Metrics
  • Cost Monitoring
  • Distributed Tracing
  • Health Checks
  • Docker Support
  • Sample Grafana Dashboard

The goal is to provide complete visibility into AI workloads.


Suggested Screenshots

Include the following visuals.

OpenTelemetry Trace

Complete request lifecycle.


Application Insights Dashboard

Request telemetry and failures.


Token Usage Dashboard

Prompt and completion token trends.


Request Timeline

Latency breakdown by component.


Provider Comparison Dashboard

Latency and success rate metrics.


Structured Logs

Provider, tokens, latency, and cost information.

These screenshots significantly increase the practical value of the article.


Conclusion

AI applications introduce challenges that traditional monitoring was never designed to solve.

Understanding latency, token usage, prompt behavior, provider reliability, and operational costs requires a dedicated observability strategy.

By combining OpenTelemetry, structured logging, distributed tracing, token metrics, cost monitoring, dashboards, and intelligent alerting, we gain visibility into every stage of the AI lifecycle.

This visibility transforms AI operations from guesswork into engineering.

Observability tells us what happened, but modern AI systems often perform multiple coordinated tasks instead of a single request.

In the next article, we'll explore how to orchestrate specialized AI agents using Semantic Kernel and build workflows where multiple agents collaborate to solve complex problems.

6 min read
Jul 19, 2025
By Dheer Gupta
Share

Leave a comment

Your email address will not be published. Required fields are marked *

Related posts

Apr 18, 2026 • 6 min read
Building AI-Native ASP.NET Core Applications: Architecture Patterns That Scale

Most applications bolt AI onto existing architectures. AI-native appli...

Mar 08, 2026 • 7 min read
Securing AI Applications in ASP.NET Core: Prompt Injection, Tool Abuse & Data Protection

Traditional application security is not enough for AI systems. This gu...

Jan 24, 2026 • 6 min read
Building an LLM Evaluation Framework in ASP.NET Core

AI quality degrades silently when prompts, models, retrieval strategie...