A Guide to Performance Testing: From Results to CI Pipeline

Performance testing should not merely be a tick on some checkbox.

R&D teams gain valuable insights into application behavior by running performance tests automatically in the CI pipeline and acting upon the results. Integrating performance testing into CI pipelines provides actionable insights to identify bottlenecks, optimize scalability, and enhance team velocity by identifying and resolving performance issues early, reducing disruptions, and accelerating delivery cycles.

While the amount of data can be overwhelming, turning it into actionable insights doesn’t have to be.

In this guide, we’ll explore the process of Performance Testing and look at tools that can help us along the way.

Understanding your test results

When you first look at your performance test results, you’ll likely be confronted with a wall of numbers and graphs.

This can be quite daunting at first glance, but the trick is to break it down and look at the key elements, and if not done yet processing the data a bit further to make it more digestible.

For example, our test result could look like:

----
{
  "summary": {
    "totalRequests": 10000,
    "duration": 300,
    "successRate": 99.2,
    "metrics": {
      "responseTime": {
        "mean": 245,
        "p95": 450,
        "p99": 850
      },
      "throughput": {
        "mean": 33.3,
        "peak": 45.2
      },
      "errors": {
        "count": 80,
        "rate": 0.8
      }
    }
  }
}
----

Interpreting our key metrics

Generally speaking, as it can vary from business-to-business, and even application to application, the key metrics fall into these four categories:

Response times
Throughput
Error rates
Resource utilization

Let’s take a look at it using the following dataset:

[source]

80 requests take 200ms
15 requests take 300ms
4 requests take 450ms
1 request takes 2500ms

Response times: beyond the averages

This is usually one of the first metrics we look at, but we need to interpret them carefully and take a proper look at the distribution.

While the average suggests decent performance, we can see there’s a significant outlier affecting user experience

Averages can be misleading

An average might not tell us the full story, for example:

Percentile analysis

The sample request provides us with the following numbers:

[source]

Average: ~245ms
Median (p50): 200ms (half of requests are faster)
p95: 300ms (95% of requests are faster)
p99: 450ms (99% of requests are faster)
Maximum: 2500ms (worst case)

The p95 (300ms) shows that most users have a reasonably consistent experience and a low timeframe, and the jump to p99 (450ms) indicates performance degradation for some requests. We also notice a stark difference between p99 (450ms) and the maximum (2500ms), which indicates we should perform an investigation into the root cause.

Investigate whether there are any patterns in slow responses

Once you have your results we can start looking into patterns, some easy spot checks are:

The time of day
The called endpoint
The payload size
The user’s location

* Throughput
** How many requests per second can we handle
** At how many requests do we start having drops or instability
** How well do our systems scale under user load

* Error rates
** Are there any error rates above 1 percent
** Grouping errors by type can help us identify patterns
** Look for correlations between errors and increased load

* Resource utilization
** We should verify whether our CPU usage indicates a bottleneck
** Look into your memory growth for indications of a memory leak

* What are our service level agreements, and do they align with our user expectations

Investigate your I/O wait times

We should look into read/write patterns and possible bottlenecks for our disk activity. Network-wise latency, bandwidth and connection pool usage.

Common performance bottlenecks

When looking at our results patterns can indicate specific types of issues.

Database Bottlenecks

An often overlooked cause of applicational slowdowns is your database.

Symptoms and Indicators

– **Response Time Patterns**:
– Consistent delays at specific load thresholds
– Step-function increases in response times
– Sudden spikes during concurrent operations
– Plateaus in throughput despite increased load

Common Issues

– **Connection Management**:
– Pool exhaustion leading to request queuing
– Maximum concurrent connection limits
– Connection leak detection
– Connection acquisition timeouts

– **Query Performance**:
– N+1 query patterns showing linear scaling with data volume
– Inefficient query plans under load
– Index fragmentation impacts
– Buffer cache utilization issues

A Guide to Performance Testing: From Results to CI Pipeline - AD 4nXd25

Monitoring Metrics

– **Key Indicators**:
– Active connections vs. pool size
– Query execution time trends
– Lock contention rates
– Cache hit ratios
– Transaction throughput

Memory

We all know the alleged Bill Gates quote `640K ought to be enough for anybody.` but we certainly shouldn’t ignore our memory consumption.

A Guide to Performance Testing: From Results to CI Pipeline - AD 4nXde0lna2WwtU2okaOsxx uIg jiL GjIM71wPC7Q sh5fZ3jbPiF5cugLMdmGxXnZRAZAm2cAtko3ADRH h 1M9AJtXxu aaafq059m

Observable Patterns

– **Response Time Degradation**:
– Progressive slow down over time
– Periodic spikes during garbage collection
– Non-linear scaling with load increase

Memory Utilization

– **Heap Management**:
– Continuous heap growth without recovery
– Memory leak detection patterns
– Old generation saturation
– Direct memory buffer allocation

Performance Impacts

– **System Behavior**:
– Increased GC frequency
– Extended GC pause durations
– Memory fragmentation
– Swap space utilization

Network

Our testing can expose significant issues on the network level, which can be especially impactful if low latency is key or if we have a wide geographical distribution.

Latency Patterns

– **Response Characteristics**:
– Variable response times
– Increased network round-trip times
– Connection establishment delays
– Timeout frequency

Bandwidth Utilization

– **Resource Constraints**:
– Network interface saturation
– Packet loss rates
– TCP retransmission counts
– Buffer overflow events

Connection Management

– **System Limits**:
– TCP connection pool exhaustion
– Keep-alive connection management
– Connection timeout patterns
– Load balancer capacity limits

Setting performance baselines

It’s not just about Service Level Agreements (SLAs), there should also be Service Level Objectives (SLOs).

We first need to perform initial benchmarking which entails:

* Run tests during normal circumstances
* Document typical response times
* Record resource utilization patterns
* Note the normal behavior of our dependencies

We can then use this information to determine our acceptance criteria.

* What are the upper and lower bounds for our key metrics
* What are our business requirements
* Account for peak/business hours versus quiet hours
* Include special considerations if applicable (holiday sales for example)

Red flags and warning signs

Properly observing our systems can help us tackle issues early on, prevent a lot of challenges later on, and avoid damaged relations further down the line.

Generally, I like to look for the following items:

Critical red flags

Response time anomalies

– **Sudden increases**:
– Unexpected latency spikes
– Response time deviation from the baseline
– Service-specific delays
– API endpoint slowdowns

Error rate monitoring

– **Critical Indicators**:
– HTTP 5xx error rate increases
– Failed transactions surge
– Connection failures
– Authentication/Authorization errors
– Database timeout frequencies

Resource consumption spikes

– **System Resources**:
– CPU utilization jumps
– Memory usage spikes
– Disk I/O saturation
– Network bandwidth surges
– Database connection pool exhaustion

Timeout patterns

– **Service Timeouts**:
– API request timeouts
– Database query timeouts
– Network connection timeouts
– Load balancer timeouts
– Cache access delays

Gradual system degradation

Performance deterioration

– **Response time trends**:
– Steadily increasing latency
– Growing processing times
– Slower database queries
– Extended queue processing times

Resource uilization trends

– **Consumption patterns**:
– Memory leak indicators
– Disk space usage growth
– Database size expansion
– Log file accumulation
– Cache size increases

Throughput issues

– **Performance metrics**:
– Decreasing requests per second
– Reduced transaction rates
– Lower concurrent user capacity
– Diminishing queue processing rates

Garbage collection impact

– **JVM Health**:
– Increasing GC frequency
– Longer GC pause times
– Growing heap usage
– Old generation growth
– Reduced memory reclamation

Pattern Changes

Error pattern analysis

– **New error types**:
– Unexpected exception types
– Novel error messages
– Unfamiliar stack traces
– Changed error frequencies
– New dependency failures

Resource usage anomalies

– **Unexpected patterns**:
– Off-hours resource consumption
– Unusual scaling patterns
– Abnormal CPU/memory ratios
– Unexpected I/O patterns
– Network traffic anomalies

Performance spikes

– **Unexplained variations**:
– Random performance spikes
– Periodic latency increases
– Intermittent resource usage
– Sporadic error bursts
– Unexpected service delays

Throughput variations

– **Pattern changes**:
– Unusual traffic patterns
– Changed peak hours
– Unexpected quiet periods
– Modified usage distributions
– Altered concurrent user patterns

Monitoring and response strategies

Proactive monitoring

– **Implementation**:
– Real-time metrics tracking
– Automated alerting systems
– Pattern recognition algorithms
– Baseline deviation detection
– Trend analysis tools

Alert thresholds

– **Configuration**:
– Dynamic threshold adjustment
– Multiple severity levels
– Compound alert conditions
– Time-based thresholds
– Pattern-based triggers

Response procedures

– **Action Plans**:
– Immediate investigation triggers
– Escalation protocols
– Emergency response procedures
– Recovery playbooks
– Post-incident analysis

Documentation requirements

– **Record keeping**:
– Incident logs
– Pattern change documentation
– Resolution tracking
– Root cause analysis
– Preventive measure implementation

Taking action

Now that we have a better understanding of our results, we can start tackling the real challenge: prioritizing and resolving the issues.

Prioritizing performance issues

Not all performance issues have the same (immediate) impact, and we only have so much time so it is important to properly prioritize the issues.

To help asses this I recommend following these steps:

* Assess the impact
** How many users/transactions are impacted,and how critical are they?
** How significant is the degradation
** What is the impact on the overall user experience
** Could this lead to compliancy violations or revenue loss?

* Perform an effort versus benefit analysis””””
** How much time will it take to resolve the issue?
** Does it address a recurring, or systemic problem?
** What are the long-term maintenance costs?
** Will there be performance gains?

* Consider the technical debt
** Is this a short- or long-term solution?
** Should this be tackled as part of a larger activity?
** What are the involved risks?
** Will it impact future development and/or scalability?

* Impact on the user experience
** Are there viable workarounds?
** Are critical paths impacted?
** Are there mitigation options such as aching?
** How noticeable is it for the end consumer?

Documenting

To get proper insights into the results, and the evolution over time it’s important that we properly document our procedures and results.

* Test scenario details
** Describe the test cases and their objects
** Describe the basic premise such as load profile, client behaviour and the data set(s)
** Track assumptions, and environmental constraints

* Environment configuration
** Note the hardware, software (including the patch levels!) and network configuration
** It’s important to be aware of differences between the used environment and production
** Keep track of external dependencies

* Known limitations
** Are there known risks, or things we are uncertain of
** Document things that are key to properly interpret the results
** Are there real-world conditions we cannot full mimic

* Action items and recommendations
** List the performance issues by priority
** Detail the needed steps to tackle each issue
** Provide an estimate of the involved benefits, effort and potential risks
** Collaborate on a timeline to resolve these issues

Communicating results to the stakeholders

It’s important to effectively communicate the data to our target audience, and thus tailor it.

There’s a time, and place for everything.

Overall there are a couple of forms that are commonly used:

* Executive summaries**
Lead with your key findings, and the impact on business processes
** Correlate your metrics with business Key Performance Indicators
** Keep your key audience in mind when deciding how to convey information
** Provide a clear cost/benefit breakdown
** Focus on graphs, and high level numbers, not detailed breakdowns

* Technical deep-dives
** Include detailed analysis of bottlenecks to properly discuss the findings
** Refer to specific test scenarios, and the conditions under which issues happened
** Include detailed metrics and graphs
** Discuss technical solutions and the (dis)advantages of them

* Visual representations
** Clearly label everything
** Highlight critical thresholds and SLA/SLO breaches
** Show trends over time
** Include before and after comparisons
** Provide heat maps for response time distributions

* Risk assessments
** Identify (potential) performance risks
** Specify the impact on system stability
** Cover mitigation strategies
** Tackle both short-term and long-term

Creating performance improvement plans

// This is not the PIP you’re dreading!

Once we have a clear insight into where the challenges lie, and their respective importance we can start defining actions.

* Short-term fixes
** Quick wins that can be implemented “easily”
** Emergency mitigations for critical issues
** Configuration changes such as cache size
** Temporary workarounds while a long-term solution is developed

* Long-term optimization
** Scalability improvements
** Code refactoring/modernization
** Infrastructure upgrades
** Architectural enhancements
** Review caching strategies

* Resource allocation
** Budget
** Infrastructure resources
** External requirements (third-party services for example)
** Team engagement

* Timeline planning
** Phased implementation
** Realistic deadlines, with consideration for cross-team dependencies
** Account for testing, validation, and system integration testing
** Define a rollback plan

Observability Integration

As applications become more distributed and complex, traditional monitoring approaches often fall short in providing the necessary insights.

Integrating modern observability platforms can greatly enhance our performance testing efforts and provide richer visibility into your systems.

Observability Platforms

Some major players we can leverage are:

**Prometheus**: Open-source monitoring and alerting system, ideal for containerized environments.
**Grafana**: Data visualization and dashboard platform, seamlessly integrates with Prometheus.
**Jaeger**: Open-source, end-to-end distributed tracing system.
**Elastic Stack (ELK)**: Powerful log aggregation, search, and analysis platform.
**Datadog**: Comprehensive observability platform with metrics, tracing, and log management.
**New Relic**: Fullstack observability solution with application performance monitoring (APM).

These platforms can provide deeper insights into our application’s behavior, infrastructure performance, and the end-to-end user experience.

Integrating OpenTelemetry

OpenTelemetry is an open-source observability framework that provides a vendor-neutral API for collecting and sending telemetry data.

By using OpenTelemetry, we can instrument our application once and send data to various observability backends.

We can also leverage this to move observability forward in our development lifecycle and inspect the impact of our changes using certain tools such as https://digma.ai[Digma].

Key Benefits of OpenTelemetry:

**Vendor Neutrality**: Avoid lock-in to a specific observability platform.
**Unified Data Model**: Consistent approach to metrics, traces, and logs.
**Language Support**: Instrumentation libraries for multiple programming languages.
**Automated Instrumentation**: Reduce manual instrumentation efforts.
**Distributed Tracing**: End-to-end visibility across services and transactions.

Integrating OpenTelemetry into Your Pipeline

1. **Instrument Your Application**: Use OpenTelemetry SDKs to instrument your application code, libraries, and frameworks.

2. **Configure Data Exporters**: Set up exporters to send telemetry data to your chosen observability platform(s).

3. **Integrate into CI/CD**: Incorporate OpenTelemetry data collection into your test execution workflows.

4. **Leverage Observability Dashboards**: Create custom dashboards to visualize performance metrics and traces.

5. **Set Up Alerting and Notifications**: Configure alerts based on performance thresholds and anomalies.

6. **Perform Root Cause Analysis**: Use distributed tracing to identify performance bottlenecks and errors.

By integrating observability platforms and leveraging standards like OpenTelemetry, we can gain deeper insights into your application’s performance, enabling us to make more informed decisions and optimize our systems more effectively.

Integrating with CI

Now that we’ve taken a look at what a performance plan can entail, and how we can convey it to our stakeholders let’s take a look at how we can make it part of our continuous integration (CI) process.

This way we can catch potential performance regression issues before they reach production.

Comprehensive test management for cloud resources

We need to properly manage our cloud resources, and select which tools we want to use.

Setting Up Our Tests

To properly set up, and manage our tests a lot of tools can come to our aid.

Tool Selection

– **Performance Testing Tools**:
– Load generators
– JMeter: Suitable for complex scenarios and distributed testing
– K6: Modern, developer-centric approach with JavaScript
– Gatling: Scala-based tool with excellent reporting
– Monitoring solutions
– Prometheus: Time-series metrics with powerful querying
– Grafana: Visualization dashboards for metrics
– Custom exporters for specific metrics
– APM tools
– New Relic: Full-stack observability
– Datadog: Infrastructure and application monitoring
– Dynatrace: Full-stack observability
– Log aggregation systems
– ELK Stack: Elasticsearch, Logstash, Kibana
– Splunk: Enterprise-grade log management
– Loki: Lightweight log aggregation

Pipeline configuration

When we’re defining our pipeline on our CI server, be it Azure DevOps, Jenkins, etc. there are some things to keep in mind:

– Automated test triggers such as:
– On schedule (daily/weekly performance tests)
– Post-deployment verification
– Pre-release validation
– On specific branch merges
– Resource provisioning steps
– invest in Infrastructure as Code deployment
– Environment configuration
– Data seeding
– Service dependencies setup
– Environment validation checks
– Connectivity verification
– Service health checks
– Resource availability
– Configuration validation
– Test execution workflows
– Warm-up period management
– Load profile execution
– Data collection
– Results analysis
– Cleanup procedures
– Resource termination
– Data cleanup
– Environment reset
– Cost optimization

----
name: Performance Tests

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  performance-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Setup Infrastructure
        run: |
          terraform init
          terraform apply -auto-approve
          
      - name: Validate Environment
        run: ./scripts/validate-env.sh
        
      - name: Run Performance Tests
        run: |
          k6 run load-tests/main.js \
            --out json=results.json \
            --out influxdb=http://localhost:8086/k6
            
      - name: Analyze Results
        run: |
          ./scripts/analyze-results.sh \
            --input results.json \
            --thresholds thresholds.yml
            
      - name: Cleanup Resources
        if: always()
        run: terraform destroy -auto-approve
----

Test data management

To have clear datasets and reproducible results we need to properly consider how we manage our test data.
Some considerations are:
– Test database provisioning
– Containerized databases
– Cloud-managed instances
– Local development databases
– Data seeding mechanisms
– Synthetic data generation
– Anonymized production data
– Fixed test datasets
– State reset procedures
– Database cleanup
– Cache invalidation
– Queue purging
– File system cleanup
– Backup and restore protocols
– Snapshot management
– Point-in-time recovery
– Data versioning
– Rollback procedure

Environment preparation

As we’ve already touched upon, properly managing our environment is vital to be certain of the quality and consistency of our results.

– Infrastructure-as-Code templates
– Terraform configurations
– CloudFormation templates
– Pulumi scripts
– Ansible playbooks
– Configuration management
– Environment variables
– Config maps
– Secrets management
– Feature flags
– Security group setup
– Access control rules
– Network policies
– Service accounts
– Role bindings
– Network configuration
– VPC setup
– Subnet allocation
– Load balancer configuration
– Service mesh rules
– Service dependencies
– (External) service mocks to prevent undue influences
– Database replicas
– Cache instances
– **Setup Procedures**:
– Infrastructure-as-Code templates
– Configuration management
– Security group setup
– Network configuration
– Service dependencies

Test execution strategy

It’s important to have a thorough checklist to ascertain we didn’t forget anything

1. **Pre-Test Phase**

– Environment validation
– Warm-up period execution
– Baseline metrics collection
– Resource availability check

2. **Test Execution**

– Progressive load increase
– Metric collection
– Error logging
– Real-time monitoring

3. **Post-Test Analysis**

– Results aggregation
– Threshold comparison
– Report generation
– Notification dispatch

4. **Resource Management**

– Environment cleanup
– Cost calculation
– Resource termination
– State preservation

Defining our criteria

It’s important to clearly define the criteria we will be applying for our tests.

Threshold settings

– **Performance Metrics**:
– Response time limits
– Throughput requirements
– Error rate thresholds
– Resource utilization caps
– Concurrency limits

SLA compliance

– **Service Level Requirements**:
– Availability targets
– Response time guarantees
– Transaction throughput
– Recovery time objectives
– Error budget allocation

Acceptable deviation

– **Variance Tolerance**:
– Statistical significance levels
– Performance degradation limits
– Baseline comparison ranges
– Seasonal adjustment factors
– Environmental variations

Performance budgets

– **Resource Allocation**:
– CPU utilization targets
– Memory consumption limits
– Network bandwidth allocation
– Storage I/O thresholds
– Cost constraints

Automated reporting

Automated reporting is ideally configured to continuously be atop all developments.

Metric collection

– **Data Gathering**:
– Real-time performance metrics
– Resource utilization statistics
– Error logs and traces
– Business KPIs
– Cost analytics

Trend analysis

– **Performance Patterns**:
– Historical comparison
– Regression detection
– Capacity forecasting
– Seasonal patterns
– Anomaly detection

Trend analysis

– **Performance Patterns**:
– Historical comparison
– Regression detection
– Capacity forecasting
– Seasonal patterns
– Anomaly detection

Alert configuration

– **Notification System**:
– Threshold-based alerts
– Trend-based warnings
– Error rate monitoring
– Resource exhaustion alerts
– Cost overrun notifications

Dashboard creation

– **Visualization**:
– Real-time monitoring boards
– Historical trend displays
– Resource utilization panels
– Cost analysis views
– SLA compliance tracking

Managing test data

Data generation strategies

– **Test data creation**:
– Synthetic data generation
– Production data sampling
– Anonymization procedures
– Scale factor management
– Data distribution patterns

Data cleanup

– **Maintenance procedures**:
– Automated cleanup jobs
– Data retention policies
– Storage optimization
– Archive procedures
– Deletion verification

Version control

– **Data management**:
– Dataset versioning
– Schema version control
– Migration scripts
– Rollback procedures
– Change tracking

Sensitive data handling

– **Security measures**:
– Data masking
– Encryption protocols
– Access control
– Audit logging
– Compliance validation

Handling test failures

Failure analysis

– **Investigation process**:
– Root cause analysis
– Error pattern identification
– Performance regression detection
– Environment validation
– Configuration verification

Retry strategies

– **Recovery attempts**:
– Automatic retry policies
– Backoff algorithms
– Failure thresholds
– Alternative paths
– Fallback procedures

Notification system

– **Communication flow**:
– Alert routing rules
– Escalation paths
– Status updates
– Report distribution
– Stakeholder communication

Recovery procedure

– **Restoration steps**:
– Environment reset
– Data cleanup
– Configuration restore
– Service restart procedures
– Verification checks

Best Practices to keep your tests running smoothly:

There are a couple of best practices I can recommend to keep your tests running smoothly:

1. **Version control**

– Keep test scripts, configurations, and infrastructure code in version control
– Use semantic versioning for test scenarios
– Tag test data snapshots

2. **Resource management**

– Implement automatic cleanup procedures
– Use resource tagging for cost allocation
– Set up budget alerts

3. **Documentation**

– Maintain detailed test scenarios
– Document environment requirements
– Keep your runbooks updated
– Track changes and decisions

4. **Security**

– Rotate test credentials regularly
– Encrypt sensitive test data
– Audit access to test environments
– Follow the least privilege principle

Procedures

I recommend performing the following tasks the enhance the repeatability of your tests.

Pre-Test Setup

– **Document resource baseline**: Capture a detailed inventory of cloud resources, configurations, and dependencies before each test. This provides a clear starting point for comparison and prevents later doubts.

– **Establish cleanup checkpoints**: Define specific checkpoints during the test lifecycle where resource cleanup/actualization should occur, such as before scaling up load, between test iterations, and at the end.

– **Define resource lifecycle policies**: Establish clear rules for resource (de)provisioning and retention based on test requirements and cost optimization strategies.

– **Create rollback procedures**: Prepare step-by-step instructions for quickly reverting the environment to a known good state in case of issues during testing.

Post-Test Procedures

– **Verify complete resource termination**: Audit the environment to ensure all temporary or test-specific resources have been cleaned up.

– **Validate billing impact**: Verify whether the actual costs align with the expectations.

– **Archive test results and metrics**: Secure all the performance data, logs and other generated artifacts for future usage.

– **Update resource inventory**: Refresh the documentation with our findings, and any newly used resources during our testing.

Emergency Procedures

**Define emergency cleanup protocols**
– **Maintain cleanup script repository**
– **Document manual intervention steps**
– **Establish escalation paths**

Test environment consistency

To have reliable and useful measurements it’s important to have a consistent server, application and load balancer setup.

So ideally we’d ensure that we account for the following:

– **Standardized Configurations**: We should use infrastructure-as-code (IaC) tools to provision test environments with well-defined, reproducible configurations.

– **Versioned Dependencies**: Track and manage the specific versions of your operating system(s), runtimes, libraries, and other dependencies used in the test environment.

– **Automated Provisioning**: Implement automated scripts or workflows to set up the test environment, reducing the potential for manual errors or configuration drift.

– **Immutable Infrastructure**: Treat the test environment as disposable and recreate it from a known good state for each test run, rather than relying on in-place modifications.

Version control

We should use a version control system like GIT to manage all artifacts related to our performance testing and load balancing including:

– **Test Scripts**: Maintain test suites, load profiles, and supporting scripts.

– **Configuration Files**: Track all infrastructure-as-code templates, parameter files, and environment definitions.

– **Cleanup Automation**: Store resource cleanup scripts, workflows, and related documentation in the version control system.

– **Documentation**: Capture procedures, best practices, and other relevant information in a version-controlled documentation repository.

Resource management

Effective resource management is crucial for reliable performance testing.

There are a couple of facets we need to keep in mind.

It’s important to decide if we’ll self-host our load generators, using a service for it, using cloud instances, etc.

Load generator scaling

These are core to our testing infrastructure, we need to properly manage our generators to ascertain accuracy, reliability and to keep the costs under control.

There are a few considerations one has to make:

* What are our data residency requirements
* Do we need multi-regional load
* Are certain services only available in a specific region
Also when not self-hosting there are a couple of things to keep in mind when contracting for a server:
* Negotiate flexible scaling to align with our testing needs
* Make sure burst capacity is available
* Reserved capacity agreements can save you a lot of money if you have a predictable testing schedule
* Providers offering auto-scaling can enable a lot of flexibility

Cloud resource optimization

Effective cloud resource management is essential for testing infrastructure to maintain cost efficiency while ensuring performance. Below are three key approaches to optimize cloud spending:

Cost control

I always advise enabling billing control, but there are also certain things we can do beforehand contract-wise.

Reserved Instances

– **Optimal Use Case**: Testing environments with predictable, long-term workloads
– **Cost Reduction**: 30-60% savings compared to on-demand pricing
– **Commitment Period**: 1 or 3 year terms available
– **Risk Management**: Consider convertible reserved instances to maintain flexibility in resource allocation

Spot Instances

– **Optimal Use Case**: Non-production testing environments with flexible timing
– **Cost Reduction**: Up to 90% savings compared to on-demand pricing
– **Key Consideration**: Subject to termination with minimal notice
– **Recommended Application**: Suitable for preliminary testing phases and non-critical workloads

Enterprise Agreements

– **Structure**: Volume-based discount programs tied to committed annual spend
– **Duration**: Annual commitment required
– **Additional Benefits**:
– Enhanced support options
– Supplementary services
– Increased contract flexibility compared to standard offerings

Cleanup procedures

Proper cleanup procedures are key for managing our cloud resources both during and after our performance and load-testing activities.

This helps us prevent resource leakage and prevents unnecessary costs while maintaining our environment in a proper state.

Automated Cleanup

Given ideally our performance testing becomes part of our continuous integration and deployment, we should also automate the resource management.

Test Environment Teardown

– **Post-Test Cleanup**:
– Terminate temporary load balancers and auto-scaling groups
– Remove test-specific security group rules
– Delete temporary subnets and routing configurations
– Clean up test data from storage volumes
– Release elastic IPs and network interfaces

Resource Tagging

– **Mandatory Tags**:
– Environment designation (e.g., perf-test, load-test)
– Expiration timestamp
– Test case identifier
– Owner/team contact
– Project reference

Monitoring and Verification

We should also monitor and verify our environment, to make sure nothing peculiar is going on and the ascertain the reliability of our results.

We can for example integrate with Prometheus by leveraging the recording rules:

----
groups:
  - name: PerformanceTests
    rules:
      - record: ci:test_duration_seconds
        expr: rate(test_duration_seconds_sum[5m])
      - record: ci:error_rate
        expr: sum(rate(http_requests_total{status=~"5.."}[5m])) 
              / sum(rate(http_requests_total[5m]))
      - record: ci:response_time_p95
        expr: histogram_quantile(0.95, sum(rate(http_duration_seconds_bucket[5m])) 
              by (le))
----

Resource Auditing

– **Daily Checks**:
– Orphaned volumes and snapshots
– Idle load balancers
– Unused elastic IPs
– Stale DNS entries
– Dormant auto-scaling groups

Cost Control Measures

– **Implementation**:
– Set up automatic resource termination based on tags
– Configure budget alerts and thresholds
– Enable automatic snapshot cleanup after a defined retention period
– Implement maximum lifetime policies for test resources

Test result storage

* Data retention policies
* Storage optimization
* Access control
* Backup strategies

Maintenance

Not only our applications, but also our (test) infrastructure, metrics and monitoring systems require frequent review.

It is vital for ensuring reliable performance testing outcomes and accurate load balancing assessments.

Service Level Indicators (SLIs)

As our applications evolve over time, some aspects might become more or less critical for our clients.

Thus we need to perform regular reviews of our SLIs.

Regular Review

– **Validation Schedule**:
– Monthly review of metric relevance
– Quarterly calibration of thresholds
– Semi-annual alignment with business objectives
– Annual comprehensive revision

Metric Maintenance

– **Key Areas**:
– Latency measurements
– Error rate calculations
– Throughput metrics
– Resource utilization thresholds
– System availability calculations

Service Level Objectives (SLOs)

Clients, both internal and external have evolving needs, especially in todays swiftly moving market.

So we need to make sure we’re monitoring the key aspects.

Periodic Assessment

– **Review Components**:
– Target achievement rates
– Error budget consumption
– Historical performance trends
– Seasonal impact analysis
– Capacity planning adjustments

Adjustment Procedures

– **Update Protocol**:
– Document performance baselines
– Analyze deviation patterns
– Adjust thresholds based on data
– Update monitoring rules
– Revise alerting criteria

Service Level Agreements (SLAs)

During contract reviews/internal alignments, we need to make sure our SLAs still reflect the wants and needs (and potentially sales promises).

Contract Maintenance

– **Regular Activities**:
– Review compliance metrics
– Update performance guarantees
– Align with current capabilities
– Adjust penalty clauses
– Revise service credits

Documentation Updates

– **Key Elements**:
– Measurement methodologies
– Reporting procedures
– Escalation paths
– Recovery time objectives
– Incident response procedures

Test Suite Maintenance

As our applications evolve, and use cases shift over time, we need to make sure our test suites reflect this.

Script Updates

– **Regular Tasks**:
– Update test data sets
– Validate API endpoints
– Review authentication tokens
– Update load patterns
– Maintain test user credentials

Infrastructure Verification

– **Periodic Checks**:
– Load generator health
– Monitoring tool accuracy
– Logging system capacity
– Database cleanup procedures
– Network configuration validity

Reporting and Analysis

As we become more familiar with our landscape, we can extract more data from our reporting.

We need to make sure it’s always easily consumable.

Maintenance Documentation

– **Required Records**:
– Change history logs
– Performance trend analysis
– Capacity planning reports
– Cost optimization reviews
– Resource utilization patterns

Review Procedures

– **Schedule**:
– Weekly metric validation
– Monthly trend analysis
– Quarterly capacity review
– Annual infrastructure assessment

Conclusion

As you can see, leveraging our performance tests as part of our CI pipelines can have quite the impact.

Additional resources

* https://docs.gradle.org/current/userguide/build_cache.html[Gradle build cache]
* https://dpeuniversity.gradle.com/[DPE university] – a good resource to learn more about Developer Productivity Engineering, and Gradle (build cache)
* https://digma.ai/[Digma] – integrate observability data within your IDE

This article was contributed to the Digma blog by Simon Verhoeven, a Senior Software Engineer and Java Consultant with a particular focus on Cloud quality and Maintainability.

Check out some of Simon’s previous articles:

Load & performance testing tools

* https://jmeter.apache.org/[JMeter]
* https://gatling.io/[Gatling]
* https://k6.io/[K6]
* https://loader.io[Loader.io]
* https://locust.io[Locust.io]

Glossary

[unordered]
APM (Application Performance Management):: Tools for monitoring application performance and health.
Automated Testing:: Running tests automatically through scripts rather than manually.
Baseline:: Reference measurements for comparing system performance over time.
Bottleneck:: Point in a system that limits overall performance or capacity.
Cache Hit Ratio:: Percentage of requests served from cache versus original data source.
CI (Continuous Integration):: Practice of automatically integrating code changes into shared repository.
Connection Pool:: Cache of database connections maintained for reuse.
Database Throughput:: Rate at which database can process transactions.
Error Rate:: Percentage of failed requests or operations.
Garbage Collection:: Automatic process of freeing unused memory.
Heap:: Memory portion used for dynamic program data allocation.
Infrastructure as Code:: Managing infrastructure through code instead of manual processes.
I/O Wait:: Time spent waiting for input/output operations.
KPI (Key Performance Indicator):: Quantifiable measure used to evaluate success.
Latency:: Time delay between action and response.
Load Balancer:: Device that distributes network traffic across servers.
Load Generator:: Tool creating synthetic traffic for testing.
Memory Leak:: Resource leak from incorrect memory management.
N+1 Query:: Performance issue where application makes additional queries for each record retrieved.
Performance Budget:: Quantitative limits on performance metrics.
Resource Utilization:: Proportion of available resources being used.
Response Time:: Total time taken to respond to a request.
SLA (Service Level Agreement):: Commitment between service provider and client.
SLI (Service Level Indicator):: Metric used to measure service performance.
SLO (Service Level Objective):: Target values for service metrics.
Spot Instance:: Cloud instances available at lower price but may terminate with short notice.
Throughput:: Amount of work processed in given time period.
Version Control:: System for managing changes to code and documents.

Table of Contents

Understanding your test results

Response times: beyond the averages

Averages can be misleading

Percentile analysis

Investigate whether there are any patterns in slow responses

Investigate your I/O wait times

Common performance bottlenecks

Database Bottlenecks

Symptoms and Indicators

Common Issues

Monitoring Metrics

Memory

Observable Patterns

Memory Utilization

Performance Impacts

Network

Latency Patterns

Bandwidth Utilization

Connection Management

Setting performance baselines

Red flags and warning signs

Critical red flags

Response time anomalies

Error rate monitoring

Resource consumption spikes

Timeout patterns

Gradual system degradation

Performance deterioration

Resource uilization trends

Throughput issues

Garbage collection impact

Pattern Changes

Error pattern analysis

Resource usage anomalies

Performance spikes

Throughput variations

Monitoring and response strategies

Proactive monitoring

Alert thresholds

Response procedures

Documentation requirements

Taking action

Prioritizing performance issues

Documenting

Communicating results to the stakeholders

Creating performance improvement plans

Observability Integration

Observability Platforms

Integrating OpenTelemetry

Key Benefits of OpenTelemetry:

Integrating OpenTelemetry into Your Pipeline

Integrating with CI

Comprehensive test management for cloud resources

Setting Up Our Tests

Tool Selection

Pipeline configuration

Test data management

Environment preparation

Test execution strategy

Defining our criteria

Threshold settings

SLA compliance

Acceptable deviation

Performance budgets

Automated reporting

Metric collection

Trend analysis

Trend analysis

Alert configuration

Dashboard creation

Managing test data

Data generation strategies

Data cleanup

Version control

Sensitive data handling

Handling test failures

Failure analysis

Retry strategies

Notification system