Distributed Tracing with Jaeger

20 minLesson 5 of 7

Learning Objectives

  • Understand distributed tracing concepts
  • Deploy Jaeger for trace collection
  • Instrument applications with OpenTelemetry
  • Analyze traces to find performance bottlenecks

What is Distributed Tracing?

Tracing follows a single request as it flows through multiple services, showing where time is spent.

Request: GET /api/courses
├── API Gateway (5ms)
├── Auth Service (15ms)
├── Course Service (120ms)
│   ├── Database Query (80ms)  ← Bottleneck!
│   └── Cache Lookup (5ms)
└── Response (total: 145ms)

Key Concepts

TermDefinition
TraceEnd-to-end journey of a request
SpanA single operation within a trace
ContextMetadata propagated between services
Trace IDUnique identifier for the entire request
Span IDUnique identifier for each operation
Parent SpanThe calling operation

Jaeger Architecture

┌──────────┐    ┌──────────────┐    ┌──────────┐
│  App +   │───▶│   Jaeger     │───▶│  Jaeger  │
│  Agent   │    │  Collector   │    │    UI    │
└──────────┘    └──────┬───────┘    └──────────┘
                       │
                ┌──────▼───────┐
                │   Storage    │
                │(Elasticsearch│
                │  or Cassandra)│
                └──────────────┘

Deploy Jaeger (All-in-One)

docker run -d \
  --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest
PortProtocolPurpose
16686HTTPJaeger UI
4317gRPCOpenTelemetry collector
4318HTTPOpenTelemetry collector

OpenTelemetry Instrumentation

Node.js Application

// tracing.js — Initialize before app code
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
 
const sdk = new NodeSDK({
  serviceName: 'nextgen-api',
  traceExporter: new OTLPTraceExporter({
    url: 'http://jaeger:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});
 
sdk.start();
# Install dependencies
npm install @opentelemetry/sdk-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/auto-instrumentations-node
 
# Run with tracing
node --require ./tracing.js app.js

Custom Spans

const { trace } = require('@opentelemetry/api');
 
const tracer = trace.getTracer('nextgen-api');
 
async function getCourses() {
  return tracer.startActiveSpan('getCourses', async (span) => {
    try {
      span.setAttribute('query.type', 'courses');
 
      const courses = await db.query('SELECT * FROM courses');
      span.setAttribute('result.count', courses.length);
 
      return courses;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: 2, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}

Analyzing Traces

In Jaeger UI

  1. Select service from dropdown
  2. Choose operation (endpoint)
  3. Set time range
  4. Click "Find Traces"

What to Look For

PatternIndicates
Long spansSlow operations (optimize)
Many child spansN+1 query problem
Gaps between spansNetwork latency
Error spansFailed operations
Fan-out patternParallel calls (good)

Trace-Based Alerts

# Prometheus alert based on trace metrics
- alert: HighLatencyTrace
  expr: histogram_quantile(0.95, rate(traces_spanmetrics_latency_bucket{service="nextgen-api"}[5m])) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "P95 latency above 2s for nextgen-api"

Connecting the Three Pillars

┌─────────────────────────────────────────┐
│           Unified Observability          │
├─────────────┬─────────────┬─────────────┤
│   Metrics   │    Logs     │   Traces    │
│ (Prometheus)│   (Loki)    │  (Jaeger)   │
├─────────────┴─────────────┴─────────────┤
│              Grafana                      │
│    (Single pane of glass)                │
└─────────────────────────────────────────┘

Link them together:

  • Trace → Logs: Use trace ID to find related log entries
  • Metrics → Traces: Click high-latency metric to see example traces
  • Logs → Traces: Extract trace ID from log to view full request path

Summary

You've learned:

  • Distributed tracing concepts (traces, spans, context)
  • Deploying Jaeger for trace collection
  • Instrumenting applications with OpenTelemetry
  • Analyzing traces to find bottlenecks
  • Connecting metrics, logs, and traces

Next Steps

Next, we'll explore monitoring Kubernetes clusters and containerized applications.