The Problem
User reports: "The checkout is slow." In a monolith, you check one log file. With microservices? The request touches API Gateway → User Service → Inventory Service → Payment Service → Order Service → Notification Service. Which one is slow? Good luck finding it in 6 different log files.
Distributed tracing assigns a unique ID to each request and tracks it across all services.
// Logs WITHOUT tracing [user-service] Processing user 123 [order-service] Creating order [payment-service] Processing payment // Which request is which? No idea. // Logs WITH tracing [user-service] [trace-id: abc123] Processing user 123 [order-service] [trace-id: abc123] Creating order [payment-service] [trace-id: abc123] Processing payment // All from the same request!
Key Concepts
Trace
The entire journey of a request. Has a unique trace ID.
Span
A single operation within a trace (e.g., one service call).
Context Propagation
Passing trace ID from service to service via HTTP headers.
Trace: abc123 ├── Span: API Gateway (50ms) │ └── Span: User Service (20ms) │ └── Span: Database Query (5ms) ├── Span: Order Service (100ms) │ ├── Span: Inventory Check (30ms) │ └── Span: Payment Service (60ms) └── Span: Notification Service (10ms) Total: 160ms (some parallel, some sequential)
Setup with Micrometer Tracing
<!-- pom.xml (Spring Boot 3.x) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
<groupId>io.zipkin.reporter2</groupId>
<artifactId>zipkin-reporter-brave</artifactId>
</dependency>
# application.yml
management:
tracing:
sampling:
probability: 1.0 # 100% of requests (use 0.1 for 10% in production)
zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans
logging:
pattern:
level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"
Run Zipkin
# Using Docker docker run -d -p 9411:9411 openzipkin/zipkin # Or download and run curl -sSL https://zipkin.io/quickstart.sh | bash -s java -jar zipkin.jar # Access UI at http://localhost:9411
Automatic Instrumentation
With the dependencies added, tracing works automatically for:
- Spring MVC controllers
- RestTemplate / WebClient calls
- Feign clients
- Spring Data repositories
- Message queues (Kafka, RabbitMQ)
@RestController
public class OrderController {
@Autowired
private RestTemplate restTemplate;
@GetMapping("/orders/{id}")
public Order getOrder(@PathVariable Long id) {
// Trace ID automatically propagated in header
User user = restTemplate.getForObject(
"http://user-service/users/{userId}",
User.class,
order.getUserId()
);
// Both services share the same trace ID!
return orderService.findById(id);
}
}
Custom Spans
@Service
public class PaymentService {
@Autowired
private Tracer tracer;
public PaymentResult processPayment(Payment payment) {
// Create custom span for important operations
Span span = tracer.nextSpan().name("process-payment").start();
try (Tracer.SpanInScope ws = tracer.withSpan(span)) {
// Add useful tags
span.tag("payment.amount", payment.getAmount().toString());
span.tag("payment.method", payment.getMethod());
PaymentResult result = paymentGateway.process(payment);
span.tag("payment.status", result.getStatus());
return result;
} catch (Exception e) {
span.error(e); // Record error in trace
throw e;
} finally {
span.end();
}
}
}
Using Annotation
@Service
public class InventoryService {
@NewSpan("check-inventory")
public boolean checkAvailability(
@SpanTag("product.id") Long productId,
@SpanTag("quantity") int quantity) {
return inventoryRepository.checkStock(productId, quantity);
}
}
Jaeger Alternative
<!-- Use OpenTelemetry for Jaeger -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
# Run Jaeger
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
# application.yml
management:
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
# Access UI at http://localhost:16686
Correlating Logs with Traces
// Logback pattern in logback-spring.xml
<pattern>
%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36}
[traceId=%X{traceId}, spanId=%X{spanId}] - %msg%n
</pattern>
// Now your logs include trace context:
2024-01-15 10:30:45 [http-nio-8080-exec-1] INFO OrderService
[traceId=abc123, spanId=def456] - Processing order 789
Zipkin vs Jaeger
| Feature | Zipkin | Jaeger |
|---|---|---|
| Setup | Simpler, single jar | More components |
| UI | Basic but functional | More features, DAG view |
| Storage | Memory, MySQL, Cassandra, ES | Memory, Cassandra, ES, Kafka |
| Best For | Getting started quickly | Production at scale |
Best Practices
- Sample in production: Don't trace 100% of requests - use 1-10% sampling
- Add meaningful tags: user.id, order.id, payment.status - things you'll search for
- Name spans well: "db-query-users" not "span1"
- Set up alerts: Alert on traces over X duration
- Correlate with metrics: Link traces to dashboards