Distributed Tracing

Follow a Request's Journey Across Your Microservices

The Problem

User reports: "The checkout is slow." In a monolith, you check one log file. With microservices? The request touches API Gateway → User Service → Inventory Service → Payment Service → Order Service → Notification Service. Which one is slow? Good luck finding it in 6 different log files.

Distributed tracing assigns a unique ID to each request and tracks it across all services.

// Logs WITHOUT tracing
[user-service]  Processing user 123
[order-service] Creating order
[payment-service] Processing payment
// Which request is which? No idea.

// Logs WITH tracing
[user-service]  [trace-id: abc123] Processing user 123
[order-service] [trace-id: abc123] Creating order
[payment-service] [trace-id: abc123] Processing payment
// All from the same request!

Key Concepts

Trace

The entire journey of a request. Has a unique trace ID.

Span

A single operation within a trace (e.g., one service call).

Context Propagation

Passing trace ID from service to service via HTTP headers.

Trace: abc123
├── Span: API Gateway (50ms)
│   └── Span: User Service (20ms)
│       └── Span: Database Query (5ms)
├── Span: Order Service (100ms)
│   ├── Span: Inventory Check (30ms)
│   └── Span: Payment Service (60ms)
└── Span: Notification Service (10ms)

Total: 160ms (some parallel, some sequential)

Setup with Micrometer Tracing

<!-- pom.xml (Spring Boot 3.x) -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>
# application.yml
management:
  tracing:
    sampling:
      probability: 1.0  # 100% of requests (use 0.1 for 10% in production)
  zipkin:
    tracing:
      endpoint: http://localhost:9411/api/v2/spans

logging:
  pattern:
    level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

Run Zipkin

# Using Docker
docker run -d -p 9411:9411 openzipkin/zipkin

# Or download and run
curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar

# Access UI at http://localhost:9411

Automatic Instrumentation

With the dependencies added, tracing works automatically for:

  • Spring MVC controllers
  • RestTemplate / WebClient calls
  • Feign clients
  • Spring Data repositories
  • Message queues (Kafka, RabbitMQ)
@RestController
public class OrderController {

    @Autowired
    private RestTemplate restTemplate;

    @GetMapping("/orders/{id}")
    public Order getOrder(@PathVariable Long id) {
        // Trace ID automatically propagated in header
        User user = restTemplate.getForObject(
            "http://user-service/users/{userId}",
            User.class,
            order.getUserId()
        );
        // Both services share the same trace ID!
        return orderService.findById(id);
    }
}

Custom Spans

@Service
public class PaymentService {

    @Autowired
    private Tracer tracer;

    public PaymentResult processPayment(Payment payment) {
        // Create custom span for important operations
        Span span = tracer.nextSpan().name("process-payment").start();
        try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
            // Add useful tags
            span.tag("payment.amount", payment.getAmount().toString());
            span.tag("payment.method", payment.getMethod());

            PaymentResult result = paymentGateway.process(payment);

            span.tag("payment.status", result.getStatus());
            return result;

        } catch (Exception e) {
            span.error(e);  // Record error in trace
            throw e;
        } finally {
            span.end();
        }
    }
}

Using Annotation

@Service
public class InventoryService {

    @NewSpan("check-inventory")
    public boolean checkAvailability(
        @SpanTag("product.id") Long productId,
        @SpanTag("quantity") int quantity) {

        return inventoryRepository.checkStock(productId, quantity);
    }
}

Jaeger Alternative

<!-- Use OpenTelemetry for Jaeger -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
# Run Jaeger
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

# application.yml
management:
  otlp:
    tracing:
      endpoint: http://localhost:4318/v1/traces

# Access UI at http://localhost:16686

Correlating Logs with Traces

// Logback pattern in logback-spring.xml
<pattern>
    %d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36}
    [traceId=%X{traceId}, spanId=%X{spanId}] - %msg%n
</pattern>

// Now your logs include trace context:
2024-01-15 10:30:45 [http-nio-8080-exec-1] INFO  OrderService
[traceId=abc123, spanId=def456] - Processing order 789

Zipkin vs Jaeger

Feature Zipkin Jaeger
Setup Simpler, single jar More components
UI Basic but functional More features, DAG view
Storage Memory, MySQL, Cassandra, ES Memory, Cassandra, ES, Kafka
Best For Getting started quickly Production at scale

Best Practices

  • Sample in production: Don't trace 100% of requests - use 1-10% sampling
  • Add meaningful tags: user.id, order.id, payment.status - things you'll search for
  • Name spans well: "db-query-users" not "span1"
  • Set up alerts: Alert on traces over X duration
  • Correlate with metrics: Link traces to dashboards

Master Observability

Learn tracing, metrics, and logging for production systems.

Explore Full Stack Java Course