Using traces to observe how different parts of a program work together helps prevent problems when making changes.
Table of Contents
In modern software development, where teams manage numerous projects, it’s crucial to have visibility into our systems and track dependencies. Imagine our program as a big puzzle, including many smaller pieces. Often, when we troubleshoot or improve functionality, it’s essential to understand how these smaller puzzles fit together.
To achieve observability and simplify the software maintenance process, we need to utilize tracing within our system.
Tracing in Context
In software development, tracing refers to the process of monitoring and recording the flow of data or events as they move through a system.
It involves instrumenting code to generate logs or metrics that provide visibility into the behavior and performance of applications.
Tracing helps developers understand how requests or transactions propagate across different components or services, enabling them to diagnose issues, optimize performance, and ensure reliability. Tracing tools often capture information such as timestamps, service dependencies, and contextual metadata, allowing developers to trace the path of a request and identify bottlenecks or errors more effectively.
Overall, tracing is a valuable tool for maintaining code quality and reliability during the refactoring process.
Tracing user profile update request
Tracing main components
Tracer: The core component responsible for capturing and propagating tracing information throughout the system. It provides APIs for creating and managing traces, spans, and context propagation.
Span: Represents a unit of work or a logical operation within a system. Spans are used to track the execution of specific parts of a request as it traverses through various components.
Context Propagation: Mechanisms for passing tracing context between different components of a distributed system. This ensures that traces can be correlated across multiple services and tiers.
Correlation ID (Unique Id): is a key component in distributed tracing systems and is typically associated with the context propagation aspect of tracing. It’s used to uniquely identify and correlate related spans or traces as they propagate through various components of a distributed system.
Instrumentation: The process of adding code to your application to generate tracing data. This involves instrumenting libraries, frameworks, and custom code to capture relevant information about request processing.
Data Store: The storage backend where trace data is stored for analysis and visualization. This could be a distributed database, a dedicated tracing platform, or an external storage service.
Visualization and Analysis Tools: Tools and dashboards for visualizing and analyzing trace data. These tools help developers and operators understand system behavior, identify performance bottlenecks, and troubleshoot issues.
These components work together to enable end-to-end tracing across distributed systems.
How tracing works in more technical detail
The following flowchart illustrates how tracing works in a distributed system.
- The client application sends an HTTP GET request to /api/v1/users.
- The request is received by service A, which creates a new trace context with a correlation ID.
- The trace context, including the correlation ID, is propagated to service B.
- Service B records a new span for processing the request and propagates the trace context to the database.
- Instrumentation is applied to capture relevant information about the span, such as start time, end time, and metadata.
- Trace data is exported to a tracing collector, which stores it in a tracing backend for analysis and visualization.
We need to use instrumentation libraries to instrument code, initialize tracers, and create spans to represent work done for each request.
Trace context, including correlation IDs, is propagated between components to correlate spans. Automatic instrumentation and sampling reduce overhead, while exporters send trace data to tracing backends. Integration with logging and metrics systems provides a comprehensive view of system behavior.
Simplified workflow for tracing in a distributed system
How tracing enhances system visibility and performance
Fixing Problems: When something breaks, being able to see the whole picture helps us find and fix the problem faster.
Changing Stuff Safely: When we want to make changes, we can see how everything fits together. This makes it less likely that we’ll accidentally break something else.
Making Sure Things Work: Before we send our changes out into the real world, we can check to make sure everything still works as it should.
Making Things Faster: Seeing how everything connects helps us find ways to make our program run faster and smoother.
Keeping Everything Easy to Understand: By keeping track of how everything fits together, it’s easier for everyone on the team to understand how our program works.
Overall, being able to see how all the pieces of our program fit together helps us fix problems faster, make changes more safely, ensure everything works correctly, improve performance, and keep everything easy to understand for everyone on the team.
The most famous and widely used distributed tracing libraries
- OpenTracing and its successor, OpenTelemetry, are popular open-source projects for distributed tracing. They enable developers to instrument applications and gather trace data.
- Zipkin is an open-source distributed tracing system originally developed by Twitter. It provides instrumentation libraries for popular programming languages Java, Go, JavaScript, and other languages. Zipkin supports various backend storage systems, including Elasticsearch and Cassandra, and is widely used in the industry.
- Jaeger is an open-source distributed tracing system developed by Uber. It offers instrumentation libraries for popular languages such as Go, Java, Node, Python, and C++, along with features like adaptive sampling and integration with other observability tools. Jaeger is known for its scalability and performance in large-scale distributed systems.
These distributed tracing solutions are widely recognized and used by developers and organizations worldwide to monitor, debug, and optimize the performance of their distributed systems.
Detecting API usages using the insights provided by Digma
Digma is an observability tool that provides feedback in run time, while you’re developing a feature. You can easily set it up by installing the Digma IntelliJ IDEA plugin. It’s straightforward to use and helps you identify issues at the early stages of development.
Digma gathers observability data following the OTEL standard, analyzes the data, and then shares insights via the IDE plugin.
Currently, Digma fully supports Java and IntelliJ, along with related frameworks such as Spring, Spring Boot, Dropwizard, and Micronaut. If you’re interested in other languages, you can find more information here.
Getting started with Digma
How to install Digma plugin and use it
Let’s explore one example to learn how Digma can help us in the refactoring process
We have a service implemented in Spring Boot and Java that exposes multiple versions of an endpoint. We aim to deprecate the oldest version. Considering backward compatibility, we need to verify if the oldest version is still in use and inform its users accordingly.
- Make sure to install the Digma IDE plugin and enable it.
- Instrument code using Automatic Instrumentation in the IDE, we can add Observability to each part of our code using the Plugin.
To explore insights offered by the Digma IDE plugin locally, run the application, Digma is free to use locally.
Usage Analysis
We’ll leverage Digma’s Usage Analysis feature to identify which version of our endpoint is being used. In the event we discover usage of the old version, we’ll ensure our clients are notified before removal.
Digma plugin Insights for Usage Analysis
In the Observability section If we click on each call we can see more observability details such as Endpoint Low/High Usage and Top Usage providing a deeper understanding of how our system is utilized.
Observability
Top Usage
By using this feature we can find out how much specific endpoint is used.
- We can see who is using the given endpoint
- The insight status can be Active or Evaluating
- The percentage of this flow usage
- Can access to trace of the current flow if you click on the action part.
Now that we can see our endpoint is used it’s essential to communicate with teams that rely on our services to ensure a smooth transition.
Endpoint Low/High Usage
This section tells us how much an endpoint is used, such as the number of requests sent to this endpoint within a certain timeframe.
Here, we can also verify whether a specific endpoint receives traffic, providing valuable insights into the usage of that endpoint and aiding in the refactoring process.
Tracing
You can access traces for each endpoint by clicking “Trace” within the Observability section.
Tracing serves as a valuable tool in modern software development for observing how different parts of a program interact, ultimately helping to prevent issues when making changes. By providing visibility into system behavior and performance, tracing enables developers to diagnose problems, optimize performance, and ensure reliability.
Through the utilization of tracing libraries and tools like Digma, developers can gain valuable insights into their applications, facilitating effective maintenance, debugging, and optimization efforts.
Conclusion: Using traces to avoid breaking changes
Tracing is a valuable tool in modern software development for observing how different parts of a program interact, ultimately helping to prevent issues when making changes. By providing visibility into system behavior and performance, tracing enables developers to diagnose problems, optimize performance, and ensure reliability.
Through the utilization of tracing libraries and tools like Digma, developers can gain valuable insights into their applications, facilitating effective maintenance, debugging, and optimization efforts.
Install Digma: Here