Building Resilient Go Applications with Circuit Breakers: Advanced Fault Tolerance Patterns

31 min read 6331 words

Table of Contents

In today’s distributed systems landscape, failures are not just possible—they’re inevitable. Network partitions, service outages, resource exhaustion, and unexpected load spikes can cascade through interconnected services, transforming localized failures into system-wide catastrophes. For Go developers building distributed applications, implementing robust fault tolerance mechanisms isn’t optional—it’s essential for maintaining system reliability and user experience.

Circuit breakers stand as one of the most powerful patterns in the resilience engineering toolkit. Inspired by their electrical counterparts, software circuit breakers prevent cascading failures by detecting when remote dependencies are failing and temporarily halting further requests, giving overloaded systems time to recover. In Go’s microservices ecosystem, properly implemented circuit breakers can mean the difference between graceful degradation and complete system collapse.

This comprehensive guide explores advanced circuit breaker patterns and fault tolerance mechanisms in Go. We’ll progress from fundamental concepts to sophisticated implementations, covering state management, failure detection algorithms, recovery strategies, and integration patterns with real-world Go microservices. By the end, you’ll have the knowledge to build truly resilient Go applications capable of withstanding the unpredictable nature of distributed environments.


Understanding Circuit Breaker Patterns

Before diving into implementation details, it’s crucial to understand the core concepts behind circuit breakers and how they fit into a broader resilience strategy.

The Circuit Breaker State Machine

At its core, a circuit breaker is a state machine with three distinct states:

package circuitbreaker

// State represents the current state of the circuit breaker
type State int

const (
    // Closed means the circuit breaker is allowing requests through
    Closed State = iota
    
    // Open means the circuit breaker is preventing requests from going through
    Open
    
    // HalfOpen means the circuit breaker is allowing a limited number of test requests
    HalfOpen
)

// CircuitBreaker represents a basic circuit breaker
type CircuitBreaker struct {
    state           State
    failureCount    int
    failureThreshold int
    resetTimeout    time.Duration
    lastFailureTime time.Time
    mutex           sync.RWMutex
}

The circuit breaker operates as follows:

  1. Closed State: The default state where requests flow normally. The circuit breaker monitors for failures.
  2. Open State: When failures exceed a threshold, the circuit “trips” to the open state, fast-failing requests without attempting to execute them.
  3. Half-Open State: After a timeout period, the circuit transitions to half-open, allowing a limited number of test requests to check if the underlying system has recovered.

Failure Detection Strategies

Effective circuit breakers must accurately detect failures, which can be more nuanced than simple error returns:

package circuitbreaker

// FailureDetector determines if a response should be considered a failure
type FailureDetector interface {
    IsFailure(err error, response interface{}, duration time.Duration) bool
}

// StandardFailureDetector considers any error as a failure
type StandardFailureDetector struct{}

func (d *StandardFailureDetector) IsFailure(err error, _ interface{}, _ time.Duration) bool {
    return err != nil
}

// AdvancedFailureDetector considers errors, status codes, and timeouts
type AdvancedFailureDetector struct {
    TimeoutThreshold time.Duration
    ErrorCodes       map[int]bool // HTTP status codes considered failures
}

func (d *AdvancedFailureDetector) IsFailure(err error, response interface{}, duration time.Duration) bool {
    // Check for timeout
    if duration > d.TimeoutThreshold {
        return true
    }
    
    // Check for error
    if err != nil {
        return true
    }
    
    // Check for HTTP status code if response is an HTTP response
    if httpResp, ok := response.(*http.Response); ok {
        if d.ErrorCodes[httpResp.StatusCode] {
            return true
        }
    }
    
    return false
}

Circuit Breaker in the Resilience Stack

Circuit breakers don’t operate in isolation but form part of a comprehensive resilience strategy:

package resilience

// ResilienceStack represents a complete resilience strategy
type ResilienceStack struct {
    // Timeout prevents requests from hanging indefinitely
    Timeout time.Duration
    
    // Retry attempts to recover from transient failures
    RetryStrategy RetryStrategy
    
    // CircuitBreaker prevents overwhelming failing services
    CircuitBreaker *circuitbreaker.CircuitBreaker
    
    // Bulkhead limits concurrent requests
    Bulkhead *bulkhead.Bulkhead
    
    // Fallback provides alternative response when all else fails
    Fallback FallbackFunc
}

// Execute runs a function with the complete resilience stack
func (s *ResilienceStack) Execute(ctx context.Context, operation func(context.Context) (interface{}, error)) (interface{}, error) {
    // Apply timeout
    timeoutCtx, cancel := context.WithTimeout(ctx, s.Timeout)
    defer cancel()
    
    // Check circuit breaker
    if !s.CircuitBreaker.AllowRequest() {
        return s.Fallback(ctx, circuitbreaker.ErrCircuitOpen)
    }
    
    // Apply bulkhead
    if !s.Bulkhead.Acquire() {
        return s.Fallback(ctx, bulkhead.ErrBulkheadFull)
    }
    defer s.Bulkhead.Release()
    
    // Execute with retry
    result, err := s.RetryStrategy.Execute(timeoutCtx, operation)
    
    // Record result in circuit breaker
    s.CircuitBreaker.RecordResult(err == nil)
    
    // Apply fallback if needed
    if err != nil {
        return s.Fallback(ctx, err)
    }
    
    return result, nil
}

This layered approach provides defense in depth, with each mechanism addressing different failure modes.


Implementing Basic Circuit Breakers

Let’s start by implementing a simple but effective circuit breaker in Go.

Simple Counter-Based Circuit Breaker

The most straightforward implementation uses a counter to track failures:

package circuitbreaker

import (
    "errors"
    "sync"
    "time"
)

var ErrCircuitOpen = errors.New("circuit breaker is open")

// SimpleCircuitBreaker implements a basic counter-based circuit breaker
type SimpleCircuitBreaker struct {
    failureThreshold int
    resetTimeout     time.Duration
    
    failureCount     int
    lastFailureTime  time.Time
    state            State
    mutex            sync.RWMutex
}

// NewSimpleCircuitBreaker creates a new circuit breaker
func NewSimpleCircuitBreaker(failureThreshold int, resetTimeout time.Duration) *SimpleCircuitBreaker {
    return &SimpleCircuitBreaker{
        failureThreshold: failureThreshold,
        resetTimeout:     resetTimeout,
        state:            Closed,
    }
}

// Execute runs the given function with circuit breaker protection
func (cb *SimpleCircuitBreaker) Execute(fn func() error) error {
    if !cb.AllowRequest() {
        return ErrCircuitOpen
    }
    
    err := fn()
    cb.RecordResult(err == nil)
    return err
}

// AllowRequest checks if a request should be allowed through
func (cb *SimpleCircuitBreaker) AllowRequest() bool {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    
    switch cb.state {
    case Closed:
        return true
    case Open:
        // Check if reset timeout has elapsed
        if time.Since(cb.lastFailureTime) > cb.resetTimeout {
            // Transition to half-open state must be done with a write lock
            cb.mutex.RUnlock()
            cb.mutex.Lock()
            defer cb.mutex.Unlock()
            
            // Double-check state after acquiring write lock
            if cb.state == Open {
                cb.state = HalfOpen
            }
            return cb.state == HalfOpen
        }
        return false
    case HalfOpen:
        // In half-open state, allow only one request through
        cb.mutex.RUnlock()
        cb.mutex.Lock()
        defer cb.mutex.Unlock()
        
        // Only the first request should be allowed
        if cb.state == HalfOpen {
            return true
        }
        return false
    default:
        return false
    }
}

// RecordResult records the result of a request
func (cb *SimpleCircuitBreaker) RecordResult(success bool) {
    cb.mutex.Lock()
    defer cb.mutex.Unlock()
    
    if success {
        switch cb.state {
        case HalfOpen:
            // On success in half-open state, reset and close the circuit
            cb.failureCount = 0
            cb.state = Closed
        case Closed:
            // Reset failure count on success
            cb.failureCount = 0
        }
    } else {
        cb.lastFailureTime = time.Now()
        
        switch cb.state {
        case HalfOpen:
            // On failure in half-open state, reopen the circuit
            cb.state = Open
        case Closed:
            // Increment failure count and check threshold
            cb.failureCount++
            if cb.failureCount >= cb.failureThreshold {
                cb.state = Open
            }
        }
    }
}

// State returns the current state of the circuit breaker
func (cb *SimpleCircuitBreaker) State() State {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    return cb.state
}

Using the Simple Circuit Breaker

Here’s how to use our simple circuit breaker:

package main

import (
    "fmt"
    "net/http"
    "time"
    
    "example.com/circuitbreaker"
)

func main() {
    // Create a circuit breaker that trips after 5 failures and resets after 10 seconds
    cb := circuitbreaker.NewSimpleCircuitBreaker(5, 10*time.Second)
    
    // Create an HTTP client with the circuit breaker
    client := &http.Client{
        Timeout: 5 * time.Second,
    }
    
    // Function to make HTTP request with circuit breaker protection
    makeRequest := func(url string) (*http.Response, error) {
        var resp *http.Response
        
        err := cb.Execute(func() error {
            var err error
            resp, err = client.Get(url)
            return err
        })
        
        return resp, err
    }
    
    // Example usage
    for i := 0; i < 10; i++ {
        resp, err := makeRequest("https://api.example.com/endpoint")
        
        if err != nil {
            if errors.Is(err, circuitbreaker.ErrCircuitOpen) {
                fmt.Println("Circuit is open, skipping request")
                continue
            }
            fmt.Printf("Request failed: %v\n", err)
            continue
        }
        
        fmt.Printf("Request succeeded with status: %d\n", resp.StatusCode)
        resp.Body.Close()
    }
}

Sliding Window Circuit Breaker

A more sophisticated approach uses a sliding window to track failure rates:

package circuitbreaker

import (
    "errors"
    "sync"
    "time"
)

// Result represents the outcome of a request
type Result struct {
    Success bool
    Time    time.Time
}

// SlidingWindowCircuitBreaker implements a circuit breaker with a sliding window
type SlidingWindowCircuitBreaker struct {
    windowSize      time.Duration
    failureRateThreshold float64
    minimumRequests int
    resetTimeout    time.Duration
    
    results         []Result
    state           State
    lastStateChange time.Time
    mutex           sync.RWMutex
}

// NewSlidingWindowCircuitBreaker creates a new sliding window circuit breaker
func NewSlidingWindowCircuitBreaker(
    windowSize time.Duration,
    failureRateThreshold float64,
    minimumRequests int,
    resetTimeout time.Duration,
) *SlidingWindowCircuitBreaker {
    return &SlidingWindowCircuitBreaker{
        windowSize:          windowSize,
        failureRateThreshold: failureRateThreshold,
        minimumRequests:     minimumRequests,
        resetTimeout:        resetTimeout,
        state:               Closed,
        results:             make([]Result, 0, 100), // Pre-allocate some capacity
    }
}

// Execute runs the given function with circuit breaker protection
func (cb *SlidingWindowCircuitBreaker) Execute(fn func() error) error {
    if !cb.AllowRequest() {
        return ErrCircuitOpen
    }
    
    err := fn()
    cb.RecordResult(err == nil)
    return err
}

// AllowRequest checks if a request should be allowed through
func (cb *SlidingWindowCircuitBreaker) AllowRequest() bool {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    
    switch cb.state {
    case Closed:
        return true
    case Open:
        if time.Since(cb.lastStateChange) > cb.resetTimeout {
            cb.mutex.RUnlock()
            cb.mutex.Lock()
            defer cb.mutex.Unlock()
            
            if cb.state == Open {
                cb.state = HalfOpen
                cb.lastStateChange = time.Now()
            }
            return cb.state == HalfOpen
        }
        return false
    case HalfOpen:
        // In half-open state, allow limited requests
        return true
    default:
        return false
    }
}

// RecordResult records the result of a request
func (cb *SlidingWindowCircuitBreaker) RecordResult(success bool) {
    cb.mutex.Lock()
    defer cb.mutex.Unlock()
    
    now := time.Now()
    
    // Add the new result
    cb.results = append(cb.results, Result{
        Success: success,
        Time:    now,
    })
    
    // Remove results outside the window
    cutoff := now.Add(-cb.windowSize)
    newStart := 0
    for i, result := range cb.results {
        if result.Time.After(cutoff) {
            newStart = i
            break
        }
    }
    cb.results = cb.results[newStart:]
    
    // Calculate failure rate
    if len(cb.results) >= cb.minimumRequests {
        failures := 0
        for _, result := range cb.results {
            if !result.Success {
                failures++
            }
        }
        
        failureRate := float64(failures) / float64(len(cb.results))
        
        switch cb.state {
        case Closed:
            if failureRate >= cb.failureRateThreshold {
                cb.state = Open
                cb.lastStateChange = now
            }
        case HalfOpen:
            if !success {
                cb.state = Open
                cb.lastStateChange = now
            } else if len(cb.results) >= cb.minimumRequests && failureRate < cb.failureRateThreshold {
                cb.state = Closed
                cb.lastStateChange = now
            }
        }
    }
}

// State returns the current state of the circuit breaker
func (cb *SlidingWindowCircuitBreaker) State() State {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    return cb.state
}

Advanced Circuit Breaker Strategies

Now let’s explore more sophisticated circuit breaker implementations and strategies.

Adaptive Circuit Breaker

An adaptive circuit breaker adjusts its parameters based on observed behavior:

package circuitbreaker

import (
    "math"
    "sync"
    "time"
)

// AdaptiveCircuitBreaker dynamically adjusts its thresholds based on traffic patterns
type AdaptiveCircuitBreaker struct {
    baseFailureThreshold int
    minFailureThreshold  int
    maxFailureThreshold  int
    
    baseResetTimeout     time.Duration
    minResetTimeout      time.Duration
    maxResetTimeout      time.Duration
    
    currentFailureThreshold int
    currentResetTimeout     time.Duration
    
    consecutiveSuccesses int
    consecutiveFailures  int
    
    state                State
    lastStateChange      time.Time
    failureCount         int
    mutex                sync.RWMutex
}

// NewAdaptiveCircuitBreaker creates a new adaptive circuit breaker
func NewAdaptiveCircuitBreaker(
    baseFailureThreshold int,
    minFailureThreshold int,
    maxFailureThreshold int,
    baseResetTimeout time.Duration,
    minResetTimeout time.Duration,
    maxResetTimeout time.Duration,
) *AdaptiveCircuitBreaker {
    return &AdaptiveCircuitBreaker{
        baseFailureThreshold:    baseFailureThreshold,
        minFailureThreshold:     minFailureThreshold,
        maxFailureThreshold:     maxFailureThreshold,
        baseResetTimeout:        baseResetTimeout,
        minResetTimeout:         minResetTimeout,
        maxResetTimeout:         maxResetTimeout,
        currentFailureThreshold: baseFailureThreshold,
        currentResetTimeout:     baseResetTimeout,
        state:                   Closed,
    }
}

// AllowRequest checks if a request should be allowed through
func (cb *AdaptiveCircuitBreaker) AllowRequest() bool {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    
    switch cb.state {
    case Closed:
        return true
    case Open:
        if time.Since(cb.lastStateChange) > cb.currentResetTimeout {
            cb.mutex.RUnlock()
            cb.mutex.Lock()
            defer cb.mutex.Unlock()
            
            if cb.state == Open {
                cb.state = HalfOpen
                cb.lastStateChange = time.Now()
            }
            return cb.state == HalfOpen
        }
        return false
    case HalfOpen:
        return true
    default:
        return false
    }
}

// RecordResult records the result of a request and adjusts thresholds
func (cb *AdaptiveCircuitBreaker) RecordResult(success bool) {
    cb.mutex.Lock()
    defer cb.mutex.Unlock()
    
    now := time.Now()
    
    if success {
        cb.consecutiveSuccesses++
        cb.consecutiveFailures = 0
        
        // Adjust thresholds on consecutive successes
        if cb.consecutiveSuccesses >= 5 {
            // Increase failure threshold (more tolerant)
            cb.currentFailureThreshold = int(math.Min(
                float64(cb.maxFailureThreshold),
                float64(cb.currentFailureThreshold+1),
            ))
            
            // Decrease reset timeout (faster recovery)
            cb.currentResetTimeout = time.Duration(math.Max(
                float64(cb.minResetTimeout),
                float64(cb.currentResetTimeout)*0.9,
            ))
            
            cb.consecutiveSuccesses = 0
        }
        
        switch cb.state {
        case HalfOpen:
            // On success in half-open state, reset and close the circuit
            cb.failureCount = 0
            cb.state = Closed
            cb.lastStateChange = now
        case Closed:
            // Reset failure count on success
            cb.failureCount = 0
        }
    } else {
        cb.consecutiveFailures++
        cb.consecutiveSuccesses = 0
        
        // Adjust thresholds on consecutive failures
        if cb.consecutiveFailures >= 3 {
            // Decrease failure threshold (less tolerant)
            cb.currentFailureThreshold = int(math.Max(
                float64(cb.minFailureThreshold),
                float64(cb.currentFailureThreshold-1),
            ))
            
            // Increase reset timeout (slower recovery)
            cb.currentResetTimeout = time.Duration(math.Min(
                float64(cb.maxResetTimeout),
                float64(cb.currentResetTimeout)*1.1,
            ))
            
            cb.consecutiveFailures = 0
        }
        
        switch cb.state {
        case HalfOpen:
            // On failure in half-open state, reopen the circuit
            cb.state = Open
            cb.lastStateChange = now
        case Closed:
            // Increment failure count and check threshold
            cb.failureCount++
            if cb.failureCount >= cb.currentFailureThreshold {
                cb.state = Open
                cb.lastStateChange = now
            }
        }
    }
}

// Execute runs the given function with circuit breaker protection
func (cb *AdaptiveCircuitBreaker) Execute(fn func() error) error {
    if !cb.AllowRequest() {
        return ErrCircuitOpen
    }
    
    err := fn()
    cb.RecordResult(err == nil)
    return err
}

// State returns the current state of the circuit breaker
func (cb *AdaptiveCircuitBreaker) State() State {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    return cb.state
}

// CurrentThresholds returns the current adaptive thresholds
func (cb *AdaptiveCircuitBreaker) CurrentThresholds() (int, time.Duration) {
    cb.mutex.RLock()
    defer cb.mutex.RUnlock()
    return cb.currentFailureThreshold, cb.currentResetTimeout
}

Composite Circuit Breaker

For complex systems, we can create a composite circuit breaker that manages multiple downstream dependencies:

package circuitbreaker

import (
    "errors"
    "sync"
)

// CompositeCircuitBreaker manages multiple circuit breakers for different dependencies
type CompositeCircuitBreaker struct {
    breakers map[string]CircuitBreaker
    mutex    sync.RWMutex
}

// CircuitBreaker interface defines the methods a circuit breaker must implement
type CircuitBreaker interface {
    AllowRequest() bool
    RecordResult(success bool)
    Execute(fn func() error) error
    State() State
}

// NewCompositeCircuitBreaker creates a new composite circuit breaker
func NewCompositeCircuitBreaker() *CompositeCircuitBreaker {
    return &CompositeCircuitBreaker{
        breakers: make(map[string]CircuitBreaker),
    }
}

// AddBreaker adds a circuit breaker for a specific dependency
func (c *CompositeCircuitBreaker) AddBreaker(name string, breaker CircuitBreaker) {
    c.mutex.Lock()
    defer c.mutex.Unlock()
    c.breakers[name] = breaker
}

// Execute runs a function for a specific dependency with circuit breaker protection
func (c *CompositeCircuitBreaker) Execute(name string, fn func() error) error {
    c.mutex.RLock()
    breaker, exists := c.breakers[name]
    c.mutex.RUnlock()
    
    if !exists {
        return errors.New("circuit breaker not found for: " + name)
    }
    
    return breaker.Execute(fn)
}

// AllowRequest checks if a request to a specific dependency should be allowed
func (c *CompositeCircuitBreaker) AllowRequest(name string) bool {
    c.mutex.RLock()
    defer c.mutex.RUnlock()
    
    breaker, exists := c.breakers[name]
    if !exists {
        return false
    }
    
    return breaker.AllowRequest()
}

// RecordResult records the result of a request to a specific dependency
func (c *CompositeCircuitBreaker) RecordResult(name string, success bool) {
    c.mutex.RLock()
    breaker, exists := c.breakers[name]
    c.mutex.RUnlock()
    
    if exists {
        breaker.RecordResult(success)
    }
}

// State returns the state of a specific circuit breaker
func (c *CompositeCircuitBreaker) State(name string) (State, bool) {
    c.mutex.RLock()
    defer c.mutex.RUnlock()
    
    breaker, exists := c.breakers[name]
    if !exists {
        return Closed, false
    }
    
    return breaker.State(), true
}

// States returns the states of all circuit breakers
func (c *CompositeCircuitBreaker) States() map[string]State {
    c.mutex.RLock()
    defer c.mutex.RUnlock()
    
    states := make(map[string]State, len(c.breakers))
    for name, breaker := range c.breakers {
        states[name] = breaker.State()
    }
    
    return states
}

Bulkhead Pattern Implementation

The bulkhead pattern complements circuit breakers by limiting concurrent requests:

package bulkhead

import (
    "errors"
    "sync"
)

var ErrBulkheadFull = errors.New("bulkhead is full")

// Bulkhead limits the number of concurrent requests
type Bulkhead struct {
    maxConcurrent int
    current       int
    mutex         sync.Mutex
}

// NewBulkhead creates a new bulkhead
func NewBulkhead(maxConcurrent int) *Bulkhead {
    return &Bulkhead{
        maxConcurrent: maxConcurrent,
    }
}

// Acquire attempts to acquire a slot in the bulkhead
func (b *Bulkhead) Acquire() bool {
    b.mutex.Lock()
    defer b.mutex.Unlock()
    
    if b.current >= b.maxConcurrent {
        return false
    }
    
    b.current++
    return true
}

// Release releases a slot in the bulkhead
func (b *Bulkhead) Release() {
    b.mutex.Lock()
    defer b.mutex.Unlock()
    
    if b.current > 0 {
        b.current--
    }
}

// Execute runs a function with bulkhead protection
func (b *Bulkhead) Execute(fn func() error) error {
    if !b.Acquire() {
        return ErrBulkheadFull
    }
    defer b.Release()
    
    return fn()
}

// CurrentLoad returns the current number of active requests
func (b *Bulkhead) CurrentLoad() int {
    b.mutex.Lock()
    defer b.mutex.Unlock()
    return b.current
}

Integration with Go Microservices

Now let’s explore how to integrate circuit breakers with common Go microservice patterns.

HTTP Client Integration

Integrating circuit breakers with HTTP clients is a common use case:

package resilience

import (
    "context"
    "net/http"
    "time"
    
    "example.com/circuitbreaker"
)

// ResilienceClient wraps an HTTP client with resilience patterns
type ResilienceClient struct {
    client        *http.Client
    circuitBreaker circuitbreaker.CircuitBreaker
    bulkhead      *bulkhead.Bulkhead
    timeout       time.Duration
}

// NewResilienceClient creates a new resilient HTTP client
func NewResilienceClient(
    client *http.Client,
    cb circuitbreaker.CircuitBreaker,
    bh *bulkhead.Bulkhead,
    timeout time.Duration,
) *ResilienceClient {
    return &ResilienceClient{
        client:        client,
        circuitBreaker: cb,
        bulkhead:      bh,
        timeout:       timeout,
    }
}

// Do performs an HTTP request with resilience patterns
func (c *ResilienceClient) Do(req *http.Request) (*http.Response, error) {
    // Apply timeout
    ctx, cancel := context.WithTimeout(req.Context(), c.timeout)
    defer cancel()
    req = req.WithContext(ctx)
    
    // Check circuit breaker
    if !c.circuitBreaker.AllowRequest() {
        return nil, circuitbreaker.ErrCircuitOpen
    }
    
    // Apply bulkhead
    if !c.bulkhead.Acquire() {
        return nil, bulkhead.ErrBulkheadFull
    }
    defer c.bulkhead.Release()
    
    // Execute request
    var resp *http.Response
    var err error
    
    err = c.circuitBreaker.Execute(func() error {
        resp, err = c.client.Do(req)
        if err != nil {
            return err
        }
        
        // Consider certain status codes as failures
        if resp.StatusCode >= 500 {
            return &httpError{statusCode: resp.StatusCode}
        }
        
        return nil
    })
    
    return resp, err
}

// httpError represents an HTTP error with status code
type httpError struct {
    statusCode int
}

func (e *httpError) Error() string {
    return fmt.Sprintf("HTTP error: %d", e.statusCode)
}

gRPC Client Integration

Similarly, we can integrate circuit breakers with gRPC clients:

package resilience

import (
    "context"
    "time"
    "strings"
    "sync"
    
    "example.com/circuitbreaker"
    "google.golang.org/grpc"
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

// ResilienceInterceptor creates a gRPC client interceptor with resilience patterns
func ResilienceInterceptor(
    cb circuitbreaker.CircuitBreaker,
    bh *bulkhead.Bulkhead,
    timeout time.Duration,
) grpc.UnaryClientInterceptor {
    return func(
        ctx context.Context,
        method string,
        req, reply interface{},
        cc *grpc.ClientConn,
        invoker grpc.UnaryInvoker,
        opts ...grpc.CallOption,
    ) error {
        // Apply timeout
        timeoutCtx, cancel := context.WithTimeout(ctx, timeout)
        defer cancel()
        
        // Check circuit breaker
        if !cb.AllowRequest() {
            return status.Error(codes.Unavailable, "circuit breaker is open")
        }
        
        // Apply bulkhead
        if !bh.Acquire() {
            return status.Error(codes.ResourceExhausted, "bulkhead is full")
        }
        defer bh.Release()
        
        // Execute request with circuit breaker
        var err error
        err = cb.Execute(func() error {
            return invoker(timeoutCtx, method, req, reply, cc, opts...)
        })
        
        // Record result in circuit breaker based on error type
        if err != nil {
            // Only certain error types should trip the circuit breaker
            s, ok := status.FromError(err)
            if ok {
                // Only server errors should trip the circuit breaker
                switch s.Code() {
                case codes.Unavailable, codes.DeadlineExceeded, codes.ResourceExhausted, codes.Internal, codes.Unknown:
                    cb.RecordResult(false)
                default:
                    // Client errors should not trip the circuit breaker
                    cb.RecordResult(true)
                }
            } else {
                // Non-status errors (like context cancellation) should trip the circuit breaker
                cb.RecordResult(false)
            }
            return err
        }
        
        // Record success
        cb.RecordResult(true)
        return nil
    }
}

// ServiceMethod represents a gRPC service method
type ServiceMethod struct {
    Service string
    Method  string
}

// PerServiceCircuitBreaker manages circuit breakers per gRPC service
type PerServiceCircuitBreaker struct {
    breakers map[ServiceMethod]circuitbreaker.CircuitBreaker
    factory  func() circuitbreaker.CircuitBreaker
    mutex    sync.RWMutex
}

// NewPerServiceCircuitBreaker creates a new per-service circuit breaker
func NewPerServiceCircuitBreaker(factory func() circuitbreaker.CircuitBreaker) *PerServiceCircuitBreaker {
    return &PerServiceCircuitBreaker{
        breakers: make(map[ServiceMethod]circuitbreaker.CircuitBreaker),
        factory:  factory,
    }
}

// UnaryInterceptor creates a gRPC interceptor that uses per-service circuit breakers
func (p *PerServiceCircuitBreaker) UnaryInterceptor() grpc.UnaryClientInterceptor {
    return func(
        ctx context.Context,
        method string,
        req, reply interface{},
        cc *grpc.ClientConn,
        invoker grpc.UnaryInvoker,
        opts ...grpc.CallOption,
    ) error {
        // Parse service and method from full method name
        parts := strings.Split(method, "/")
        if len(parts) != 3 {
            return invoker(ctx, method, req, reply, cc, opts...)
        }
        
        serviceMethod := ServiceMethod{
            Service: parts[1],
            Method:  parts[2],
        }
        
        // Get or create circuit breaker for this service method
        p.mutex.RLock()
        cb, exists := p.breakers[serviceMethod]
        p.mutex.RUnlock()
        
        if !exists {
            p.mutex.Lock()
            cb, exists = p.breakers[serviceMethod]
            if !exists {
                cb = p.factory()
                p.breakers[serviceMethod] = cb
            }
            p.mutex.Unlock()
        }
        
        // Check circuit breaker
        if !cb.AllowRequest() {
            return status.Error(codes.Unavailable, "circuit breaker is open")
        }
        
        // Execute request with circuit breaker
        err := cb.Execute(func() error {
            return invoker(ctx, method, req, reply, cc, opts...)
        })
        
        return err
    }
}

Database Connection Circuit Breaker

Circuit breakers are also valuable for database connections:

package database

import (
    "context"
    "database/sql"
    "time"
    
    "example.com/circuitbreaker"
)

// ResilienceDB wraps a database with resilience patterns
type ResilienceDB struct {
    db            *sql.DB
    circuitBreaker circuitbreaker.CircuitBreaker
    timeout       time.Duration
}

// NewResilienceDB creates a new resilient database wrapper
func NewResilienceDB(
    db *sql.DB,
    cb circuitbreaker.CircuitBreaker,
    timeout time.Duration,
) *ResilienceDB {
    return &ResilienceDB{
        db:            db,
        circuitBreaker: cb,
        timeout:       timeout,
    }
}

// Query executes a query with resilience patterns
func (r *ResilienceDB) Query(query string, args ...interface{}) (*sql.Rows, error) {
    // Check circuit breaker
    if !r.circuitBreaker.AllowRequest() {
        return nil, circuitbreaker.ErrCircuitOpen
    }
    
    // Create context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
    defer cancel()
    
    // Execute query with circuit breaker
    var rows *sql.Rows
    var err error
    
    err = r.circuitBreaker.Execute(func() error {
        rows, err = r.db.QueryContext(ctx, query, args...)
        return err
    })
    
    return rows, err
}

// Exec executes a statement with resilience patterns
func (r *ResilienceDB) Exec(query string, args ...interface{}) (sql.Result, error) {
    // Check circuit breaker
    if !r.circuitBreaker.AllowRequest() {
        return nil, circuitbreaker.ErrCircuitOpen
    }
    
    // Create context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
    defer cancel()
    
    // Execute statement with circuit breaker
    var result sql.Result
    var err error
    
    err = r.circuitBreaker.Execute(func() error {
        result, err = r.db.ExecContext(ctx, query, args...)
        return err
    })
    
    return result, err
}

// Transaction executes a transaction with resilience patterns
func (r *ResilienceDB) Transaction(fn func(*sql.Tx) error) error {
    // Check circuit breaker
    if !r.circuitBreaker.AllowRequest() {
        return circuitbreaker.ErrCircuitOpen
    }
    
    // Create context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
    defer cancel()
    
    // Execute transaction with circuit breaker
    return r.circuitBreaker.Execute(func() error {
        tx, err := r.db.BeginTx(ctx, nil)
        if err != nil {
            return err
        }
        
        // If fn returns an error, the transaction will be rolled back
        if err := fn(tx); err != nil {
            tx.Rollback()
            return err
        }
        
        // Commit the transaction
        return tx.Commit()
    })
}

Monitoring and Observability

Implementing circuit breakers is only half the battle—you also need to monitor their behavior to ensure they’re working correctly and to tune their parameters.

Circuit Breaker Metrics

Collecting metrics from circuit breakers provides valuable insights:

package metrics

import (
    "sync"
    "time"
    
    "example.com/circuitbreaker"
)

// CircuitBreakerMetrics collects metrics about circuit breaker behavior
type CircuitBreakerMetrics struct {
    name                 string
    successCount         int64
    failureCount         int64
    rejectedCount        int64
    stateTransitions     map[circuitbreaker.State]int64
    lastStateTransition  time.Time
    currentState         circuitbreaker.State
    mutex                sync.RWMutex
}

// NewCircuitBreakerMetrics creates a new metrics collector
func NewCircuitBreakerMetrics(name string) *CircuitBreakerMetrics {
    return &CircuitBreakerMetrics{
        name:             name,
        stateTransitions: make(map[circuitbreaker.State]int64),
        currentState:     circuitbreaker.Closed,
        lastStateTransition: time.Now(),
    }
}

// RecordSuccess records a successful request
func (m *CircuitBreakerMetrics) RecordSuccess() {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    m.successCount++
}

// RecordFailure records a failed request
func (m *CircuitBreakerMetrics) RecordFailure() {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    m.failureCount++
}

// RecordRejected records a rejected request due to open circuit
func (m *CircuitBreakerMetrics) RecordRejected() {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    m.rejectedCount++
}

// RecordStateTransition records a state transition
func (m *CircuitBreakerMetrics) RecordStateTransition(newState circuitbreaker.State) {
    m.mutex.Lock()
    defer m.mutex.Unlock()
    
    if m.currentState != newState {
        m.stateTransitions[newState]++
        m.lastStateTransition = time.Now()
        m.currentState = newState
    }
}

// GetMetrics returns the current metrics
func (m *CircuitBreakerMetrics) GetMetrics() map[string]interface{} {
    m.mutex.RLock()
    defer m.mutex.RUnlock()
    
    metrics := map[string]interface{}{
        "name":                 m.name,
        "success_count":        m.successCount,
        "failure_count":        m.failureCount,
        "rejected_count":       m.rejectedCount,
        "current_state":        m.currentState.String(),
        "last_state_transition": m.lastStateTransition,
        "state_transitions":    make(map[string]int64),
    }
    
    for state, count := range m.stateTransitions {
        metrics["state_transitions"].(map[string]int64)[state.String()] = count
    }
    
    return metrics
}

// CalculateErrorRate calculates the current error rate
func (m *CircuitBreakerMetrics) CalculateErrorRate() float64 {
    m.mutex.RLock()
    defer m.mutex.RUnlock()
    
    total := m.successCount + m.failureCount
    if total == 0 {
        return 0.0
    }
    
    return float64(m.failureCount) / float64(total)
}

// InstrumentedCircuitBreaker wraps a circuit breaker with metrics
type InstrumentedCircuitBreaker struct {
    circuitBreaker circuitbreaker.CircuitBreaker
    metrics        *CircuitBreakerMetrics
}

// NewInstrumentedCircuitBreaker creates a new instrumented circuit breaker
func NewInstrumentedCircuitBreaker(
    cb circuitbreaker.CircuitBreaker,
    metrics *CircuitBreakerMetrics,
) *InstrumentedCircuitBreaker {
    return &InstrumentedCircuitBreaker{
        circuitBreaker: cb,
        metrics:        metrics,
    }
}

// AllowRequest checks if a request should be allowed through
func (i *InstrumentedCircuitBreaker) AllowRequest() bool {
    allowed := i.circuitBreaker.AllowRequest()
    if !allowed {
        i.metrics.RecordRejected()
    }
    return allowed
}

// RecordResult records the result of a request
func (i *InstrumentedCircuitBreaker) RecordResult(success bool) {
    i.circuitBreaker.RecordResult(success)
    
    if success {
        i.metrics.RecordSuccess()
    } else {
        i.metrics.RecordFailure()
    }
    
    // Record state changes
    i.metrics.RecordStateTransition(i.circuitBreaker.State())
}

// Execute runs the given function with circuit breaker protection
func (i *InstrumentedCircuitBreaker) Execute(fn func() error) error {
    if !i.AllowRequest() {
        return circuitbreaker.ErrCircuitOpen
    }
    
    err := fn()
    i.RecordResult(err == nil)
    return err
}

// State returns the current state of the circuit breaker
func (i *InstrumentedCircuitBreaker) State() circuitbreaker.State {
    return i.circuitBreaker.State()
}

Prometheus Integration

Exposing circuit breaker metrics to Prometheus enables powerful monitoring and alerting:

package metrics

import (
    "net/http"
    
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

// PrometheusCircuitBreakerCollector collects circuit breaker metrics for Prometheus
type PrometheusCircuitBreakerCollector struct {
    metrics map[string]*CircuitBreakerMetrics
    
    successCounter *prometheus.CounterVec
    failureCounter *prometheus.CounterVec
    rejectedCounter *prometheus.CounterVec
    stateGauge *prometheus.GaugeVec
    errorRateGauge *prometheus.GaugeVec
}

// NewPrometheusCircuitBreakerCollector creates a new Prometheus collector
func NewPrometheusCircuitBreakerCollector() *PrometheusCircuitBreakerCollector {
    collector := &PrometheusCircuitBreakerCollector{
        metrics: make(map[string]*CircuitBreakerMetrics),
        
        successCounter: prometheus.NewCounterVec(
            prometheus.CounterOpts{
                Name: "circuit_breaker_success_total",
                Help: "Total number of successful requests",
            },
            []string{"name"},
        ),
        
        failureCounter: prometheus.NewCounterVec(
            prometheus.CounterOpts{
                Name: "circuit_breaker_failure_total",
                Help: "Total number of failed requests",
            },
            []string{"name"},
        ),
        
        rejectedCounter: prometheus.NewCounterVec(
            prometheus.CounterOpts{
                Name: "circuit_breaker_rejected_total",
                Help: "Total number of rejected requests due to open circuit",
            },
            []string{"name"},
        ),
        
        stateGauge: prometheus.NewGaugeVec(
            prometheus.GaugeOpts{
                Name: "circuit_breaker_state",
                Help: "Current state of the circuit breaker (0=closed, 1=half-open, 2=open)",
            },
            []string{"name"},
        ),
        
        errorRateGauge: prometheus.NewGaugeVec(
            prometheus.GaugeOpts{
                Name: "circuit_breaker_error_rate",
                Help: "Current error rate of the circuit breaker",
            },
            []string{"name"},
        ),
    }
    
    // Register metrics with Prometheus
    prometheus.MustRegister(
        collector.successCounter,
        collector.failureCounter,
        collector.rejectedCounter,
        collector.stateGauge,
        collector.errorRateGauge,
    )
    
    return collector
}

// RegisterMetrics registers circuit breaker metrics
func (c *PrometheusCircuitBreakerCollector) RegisterMetrics(name string, metrics *CircuitBreakerMetrics) {
    c.metrics[name] = metrics
}

// UpdateMetrics updates Prometheus metrics from circuit breaker metrics
func (c *PrometheusCircuitBreakerCollector) UpdateMetrics() {
    for name, metrics := range c.metrics {
        m := metrics.GetMetrics()
        
        c.successCounter.WithLabelValues(name).Add(float64(m["success_count"].(int64)))
        c.failureCounter.WithLabelValues(name).Add(float64(m["failure_count"].(int64)))
        c.rejectedCounter.WithLabelValues(name).Add(float64(m["rejected_count"].(int64)))
        
        // Map state to numeric value
        var stateValue float64
        switch m["current_state"].(string) {
        case "Closed":
            stateValue = 0
        case "HalfOpen":
            stateValue = 1
        case "Open":
            stateValue = 2
        }
        c.stateGauge.WithLabelValues(name).Set(stateValue)
        
        // Set error rate
        c.errorRateGauge.WithLabelValues(name).Set(metrics.CalculateErrorRate())
    }
}

// SetupPrometheusHandler sets up an HTTP handler for Prometheus metrics
func SetupPrometheusHandler() {
    http.Handle("/metrics", promhttp.Handler())
}

Health Checks and Circuit Breaker Status

Exposing circuit breaker status through health checks helps with operational visibility:

package health

import (
    "encoding/json"
    "net/http"
    
    "example.com/circuitbreaker"
)

// CircuitBreakerHealth represents the health status of circuit breakers
type CircuitBreakerHealth struct {
    breakers map[string]circuitbreaker.CircuitBreaker
}

// NewCircuitBreakerHealth creates a new health check handler
func NewCircuitBreakerHealth() *CircuitBreakerHealth {
    return &CircuitBreakerHealth{
        breakers: make(map[string]circuitbreaker.CircuitBreaker),
    }
}

// RegisterCircuitBreaker registers a circuit breaker for health checks
func (h *CircuitBreakerHealth) RegisterCircuitBreaker(name string, cb circuitbreaker.CircuitBreaker) {
    h.breakers[name] = cb
}

// ServeHTTP implements the http.Handler interface
func (h *CircuitBreakerHealth) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    status := make(map[string]interface{})
    
    allHealthy := true
    for name, cb := range h.breakers {
        state := cb.State()
        
        status[name] = map[string]interface{}{
            "state": state.String(),
            "healthy": state == circuitbreaker.Closed,
        }
        
        if state != circuitbreaker.Closed {
            allHealthy = false
        }
    }
    
    w.Header().Set("Content-Type", "application/json")
    
    if !allHealthy {
        w.WriteHeader(http.StatusServiceUnavailable)
    }
    
    json.NewEncoder(w).Encode(map[string]interface{}{
        "status": status,
        "healthy": allHealthy,
    })
}

// SetupHealthCheckHandler sets up an HTTP handler for health checks
func SetupHealthCheckHandler(health *CircuitBreakerHealth) {
    http.Handle("/health/circuit-breakers", health)
}

Production Deployment and Best Practices

Deploying circuit breakers in production requires careful consideration of configuration, testing, and operational practices.

Configuration Best Practices

Circuit breakers should be configurable to adapt to different environments:

package config

import (
    "encoding/json"
    "os"
    "time"
    
    "example.com/circuitbreaker"
)

// CircuitBreakerConfig represents configuration for a circuit breaker
type CircuitBreakerConfig struct {
    FailureThreshold     int           `json:"failureThreshold"`
    ResetTimeout         string        `json:"resetTimeout"`
    FailureRateThreshold float64       `json:"failureRateThreshold"`
    MinimumRequests      int           `json:"minimumRequests"`
    WindowSize           string        `json:"windowSize"`
    Timeout              string        `json:"timeout"`
}

// ResilienceConfig represents configuration for resilience patterns
type ResilienceConfig struct {
    CircuitBreakers map[string]CircuitBreakerConfig `json:"circuitBreakers"`
    Bulkheads       map[string]int                  `json:"bulkheads"`
}

// LoadConfig loads configuration from a file
func LoadConfig(filename string) (*ResilienceConfig, error) {
    file, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    defer file.Close()
    
    config := &ResilienceConfig{}
    if err := json.NewDecoder(file).Decode(config); err != nil {
        return nil, err
    }
    
    return config, nil
}

// CreateCircuitBreaker creates a circuit breaker from configuration
func CreateCircuitBreaker(config CircuitBreakerConfig) (circuitbreaker.CircuitBreaker, error) {
    resetTimeout, err := time.ParseDuration(config.ResetTimeout)
    if err != nil {
        return nil, err
    }
    
    windowSize, err := time.ParseDuration(config.WindowSize)
    if err != nil {
        return nil, err
    }
    
    timeout, err := time.ParseDuration(config.Timeout)
    if err != nil {
        return nil, err
    }
    
    // Choose the appropriate circuit breaker implementation based on configuration
    if config.FailureRateThreshold > 0 {
        return circuitbreaker.NewSlidingWindowCircuitBreaker(
            windowSize,
            config.FailureRateThreshold,
            config.MinimumRequests,
            resetTimeout,
        ), nil
    }
    
    return circuitbreaker.NewSimpleCircuitBreaker(
        config.FailureThreshold,
        resetTimeout,
    ), nil
}

Example configuration file:

{
  "circuitBreakers": {
    "userService": {
      "failureThreshold": 5,
      "resetTimeout": "30s",
      "timeout": "2s"
    },
    "paymentService": {
      "failureRateThreshold": 0.25,
      "minimumRequests": 10,
      "windowSize": "1m",
      "resetTimeout": "1m",
      "timeout": "5s"
    },
    "inventoryService": {
      "failureThreshold": 3,
      "resetTimeout": "15s",
      "timeout": "1s"
    }
  },
  "bulkheads": {
    "userService": 50,
    "paymentService": 20,
    "inventoryService": 30
  }
}

Testing Circuit Breakers

Testing circuit breakers requires simulating failure scenarios:

package testing

import (
    "errors"
    "testing"
    "time"
    "math/rand"
    
    "example.com/circuitbreaker"
)

// TestSimpleCircuitBreaker tests the basic functionality of a circuit breaker
func TestSimpleCircuitBreaker(t *testing.T) {
    // Create a circuit breaker that trips after 3 failures and resets after 100ms
    cb := circuitbreaker.NewSimpleCircuitBreaker(3, 100*time.Millisecond)
    
    // Function that always fails
    failingFunc := func() error {
        return errors.New("simulated failure")
    }
    
    // Function that always succeeds
    successFunc := func() error {
        return nil
    }
    
    // Test initial state
    if cb.State() != circuitbreaker.Closed {
        t.Errorf("Initial state should be Closed, got %v", cb.State())
    }
    
    // Test that circuit opens after threshold failures
    for i := 0; i < 3; i++ {
        err := cb.Execute(failingFunc)
        if err == nil {
            t.Errorf("Expected error, got nil")
        }
    }
    
    // Circuit should now be open
    if cb.State() != circuitbreaker.Open {
        t.Errorf("State should be Open after failures, got %v", cb.State())
    }
    
    // Test that requests are rejected when circuit is open
    err := cb.Execute(successFunc)
    if err != circuitbreaker.ErrCircuitOpen {
        t.Errorf("Expected ErrCircuitOpen, got %v", err)
    }
    
    // Wait for reset timeout
    time.Sleep(150 * time.Millisecond)
    
    // Circuit should now be half-open
    if !cb.AllowRequest() {
        t.Errorf("Circuit should allow a request after reset timeout")
    }
    
    // Test that circuit closes after success in half-open state
    err = cb.Execute(successFunc)
    if err != nil {
        t.Errorf("Expected success, got %v", err)
    }
    
    // Circuit should now be closed
    if cb.State() != circuitbreaker.Closed {
        t.Errorf("State should be Closed after success, got %v", cb.State())
    }
}

// TestCircuitBreakerWithChaos tests circuit breaker behavior with chaos testing
func TestCircuitBreakerWithChaos(t *testing.T) {
    if testing.Short() {
        t.Skip("Skipping chaos test in short mode")
    }
    
    // Create a sliding window circuit breaker
    cb := circuitbreaker.NewSlidingWindowCircuitBreaker(
        1*time.Second,  // 1 second window
        0.5,            // 50% failure threshold
        10,             // Minimum 10 requests
        500*time.Millisecond, // 500ms reset timeout
    )
    
    // Create a function with variable failure rate
    var failureRate float64 = 0.0
    testFunc := func() error {
        if rand.Float64() < failureRate {
            return errors.New("simulated failure")
        }
        return nil
    }
    
    // Run test for 10 seconds
    start := time.Now()
    end := start.Add(10 * time.Second)
    
    // Track statistics
    requests := 0
    failures := 0
    rejections := 0
    
    for time.Now().Before(end) {
        // Adjust failure rate over time to simulate service degradation and recovery
        elapsed := time.Since(start).Seconds()
        switch {
        case elapsed < 2:
            failureRate = 0.1 // 10% failures initially
        case elapsed < 4:
            failureRate = 0.6 // 60% failures - should trip circuit
        case elapsed < 6:
            failureRate = 0.7 // 70% failures - circuit should stay open
        case elapsed < 8:
            failureRate = 0.2 // 20% failures - circuit should recover
        default:
            failureRate = 0.1 // 10% failures - circuit should stay closed
        }
        
        requests++
        err := cb.Execute(testFunc)
        
        if err != nil {
            if errors.Is(err, circuitbreaker.ErrCircuitOpen) {
                rejections++
            } else {
                failures++
            }
        }
        
        // Small delay between requests
        time.Sleep(10 * time.Millisecond)
    }
    
    // Log statistics
    t.Logf("Total requests: %d", requests)
    t.Logf("Failures: %d (%.2f%%)", failures, float64(failures)/float64(requests)*100)
    t.Logf("Rejections: %d (%.2f%%)", rejections, float64(rejections)/float64(requests)*100)
    
    // Verify circuit breaker protected the system during high failure rates
    if rejections == 0 {
        t.Errorf("Circuit breaker should have rejected some requests during high failure rates")
    }
}

Deployment Strategies

Deploying circuit breakers requires careful consideration of dependencies and failure modes:

package main

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
    
    "example.com/circuitbreaker"
    "example.com/config"
    "example.com/health"
    "example.com/metrics"
)

func main() {
    // Load configuration
    cfg, err := config.LoadConfig("resilience.json")
    if err != nil {
        log.Fatalf("Failed to load configuration: %v", err)
    }
    
    // Create circuit breakers
    circuitBreakers := make(map[string]circuitbreaker.CircuitBreaker)
    for name, cbConfig := range cfg.CircuitBreakers {
        cb, err := config.CreateCircuitBreaker(cbConfig)
        if err != nil {
            log.Fatalf("Failed to create circuit breaker %s: %v", name, err)
        }
        circuitBreakers[name] = cb
    }
    
    // Set up metrics
    promCollector := metrics.NewPrometheusCircuitBreakerCollector()
    
    // Instrument circuit breakers
    instrumentedBreakers := make(map[string]circuitbreaker.CircuitBreaker)
    for name, cb := range circuitBreakers {
        metrics := metrics.NewCircuitBreakerMetrics(name)
        promCollector.RegisterMetrics(name, metrics)
        instrumentedBreakers[name] = metrics.NewInstrumentedCircuitBreaker(cb, metrics)
    }
    
    // Set up health checks
    healthCheck := health.NewCircuitBreakerHealth()
    for name, cb := range instrumentedBreakers {
        healthCheck.RegisterCircuitBreaker(name, cb)
    }
    
    // Set up HTTP handlers
    metrics.SetupPrometheusHandler()
    health.SetupHealthCheckHandler(healthCheck)
    
    // Start metrics updater
    go func() {
        ticker := time.NewTicker(15 * time.Second)
        defer ticker.Stop()
        
        for range ticker.C {
            promCollector.UpdateMetrics()
        }
    }()
    
    // Start HTTP server
    server := &http.Server{
        Addr: ":8080",
    }
    
    // Graceful shutdown
    go func() {
        signals := make(chan os.Signal, 1)
        signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
        <-signals
        
        log.Println("Shutting down...")
        
        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        
        if err := server.Shutdown(ctx); err != nil {
            log.Printf("HTTP server shutdown error: %v", err)
        }
    }()
    
    // Start server
    log.Println("Starting server on :8080")
    if err := server.ListenAndServe(); err != http.ErrServerClosed {
        log.Fatalf("HTTP server error: %v", err)
    }
}

Production Best Practices

Here are key best practices for using circuit breakers in production:

  1. Start Conservative: Begin with higher failure thresholds and shorter reset timeouts, then adjust based on observed behavior.

  2. Granular Circuit Breakers: Use separate circuit breakers for different dependencies and even different operations on the same dependency.

  3. Fallbacks: Always implement fallbacks for when the circuit is open:

// Example fallback strategy
func getUserWithFallback(userID string) (*User, error) {
    // Try primary data source with circuit breaker
    var user *User
    err := userServiceCB.Execute(func() error {
        var err error
        user, err = userService.GetUser(userID)
        return err
    })
    
    // If circuit is open or request failed, try fallback
    if err != nil {
        // Try cache
        cachedUser, cacheErr := userCache.Get(userID)
        if cacheErr == nil {
            return cachedUser, nil
        }
        
        // Try default
        return &User{
            ID:      userID,
            Name:    "Unknown User",
            IsGuest: true,
        }, nil
    }
    
    return user, nil
}
  1. Monitor and Alert: Set up alerts for circuit breaker state changes and high rejection rates:

// Example Prometheus alert rule
groups:
- name: CircuitBreakerAlerts
  rules:
  - alert: CircuitBreakerOpen
    expr: circuit_breaker_state{name=~".*"} > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Circuit breaker {{ $labels.name }} is open"
      description: "The circuit breaker {{ $labels.name }} has been open for 5 minutes."
  
  - alert: HighRejectionRate
    expr: rate(circuit_breaker_rejected_total{name=~".*"}[5m]) > 10
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High rejection rate for {{ $labels.name }}"
      description: "Circuit breaker {{ $labels.name }} is rejecting more than 10 requests per second."
  1. Tune Parameters: Adjust circuit breaker parameters based on observed behavior:
// Example adaptive parameter tuning
func tuneCircuitBreakerParameters(cb *AdaptiveCircuitBreaker, metrics *CircuitBreakerMetrics) {
    // Check metrics every minute
    ticker := time.NewTicker(1 * time.Minute)
    defer ticker.Stop()
    
    for range ticker.C {
        errorRate := metrics.CalculateErrorRate()
        rejectionRate := float64(metrics.rejectedCount) / float64(metrics.successCount + metrics.failureCount + metrics.rejectedCount)
        
        // If error rate is very low but rejection rate is high, make the circuit breaker more lenient
        if errorRate < 0.01 && rejectionRate > 0.1 {
            threshold, timeout := cb.CurrentThresholds()
            cb.SetThresholds(threshold+1, timeout*0.9)
            log.Printf("Tuned circuit breaker to be more lenient: threshold=%d, timeout=%v", threshold+1, timeout*0.9)
        }
        
        // If error rate is very high, make the circuit breaker more strict
        if errorRate > 0.3 {
            threshold, timeout := cb.CurrentThresholds()
            cb.SetThresholds(threshold-1, timeout*1.1)
            log.Printf("Tuned circuit breaker to be more strict: threshold=%d, timeout=%v", threshold-1, timeout*1.1)
        }
    }
}
  1. Graceful Degradation: Design systems to function with reduced capabilities when dependencies are unavailable:
// Example service with graceful degradation
type RecommendationService struct {
    productService      *ProductService
    productServiceCB    circuitbreaker.CircuitBreaker
    userService         *UserService
    userServiceCB       circuitbreaker.CircuitBreaker
    analyticsService    *AnalyticsService
    analyticsServiceCB  circuitbreaker.CircuitBreaker
}

// GetRecommendations returns product recommendations for a user
func (s *RecommendationService) GetRecommendations(userID string) ([]Product, error) {
    // Try to get personalized recommendations using all services
    if s.userServiceCB.AllowRequest() && s.analyticsServiceCB.AllowRequest() {
        var user *User
        var userHistory []PurchaseHistory
        
        // Get user data
        userErr := s.userServiceCB.Execute(func() error {
            var err error
            user, err = s.userService.GetUser(userID)
            return err
        })
        
        // Get user history
        historyErr := s.analyticsServiceCB.Execute(func() error {
            var err error
            userHistory, err = s.analyticsService.GetUserHistory(userID)
            return err
        })
        
        // If both succeeded, generate personalized recommendations
        if userErr == nil && historyErr == nil {
            return s.generatePersonalizedRecommendations(user, userHistory)
        }
    }
    
    // Fallback to category-based recommendations
    if s.productServiceCB.AllowRequest() {
        var categories []string
        
        // Try to get user's preferred categories if user service is available
        if s.userServiceCB.AllowRequest() {
            s.userServiceCB.Execute(func() error {
                var err error
                categories, err = s.userService.GetUserPreferredCategories(userID)
                return err
            })
        }
        
        // If we have categories, use them; otherwise use popular categories
        if len(categories) == 0 {
            categories = []string{"popular", "trending"}
        }
        
        // Get recommendations by category
        var products []Product
        err := s.productServiceCB.Execute(func() error {
            var err error
            products, err = s.productService.GetProductsByCategories(categories, 10)
            return err
        })
        
        if err == nil {
            return products, nil
        }
    }
    
    // Ultimate fallback: return hardcoded popular products
    return s.getHardcodedPopularProducts(), nil
}
  1. Avoid Cascading Circuit Breakers: Be careful with circuit breakers that depend on each other:
// Example of problematic cascading circuit breakers
func problematicDesign() {
    // Service A calls Service B which calls Service C
    serviceC := NewService("C")
    serviceCCircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
    
    serviceB := NewService("B")
    serviceBCircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
    
    // Problem: If C fails, B's circuit breaker will open, then A's circuit breaker will open
    // This creates a cascading effect where failures propagate upward
    
    // Better design: Use different thresholds and timeouts at different levels
    serviceCCircuitBreaker = circuitbreaker.NewSimpleCircuitBreaker(3, 10*time.Second)
    serviceBCircuitBreaker = circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
    serviceACircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(7, 60*time.Second)
    
    // Even better: Implement fallbacks at each level
}
  1. Circuit Breaker Patterns by Dependency Type: Tailor circuit breaker configurations to the type of dependency:
Dependency TypeFailure ThresholdReset TimeoutNotes
Critical databaseHigher (5-10)Shorter (10-30s)Essential service, try more aggressively to reconnect
External APILower (3-5)Longer (30-60s)Less control, be more conservative
Cache serviceVery low (1-2)Very short (5-10s)Non-critical, fail fast and recover quickly
Background jobHigher (8-10)Medium (20-40s)Can tolerate more failures before breaking

What This Means

Circuit breakers are a critical component in the resilience toolkit for Go developers building distributed systems. By implementing these patterns, you can protect your applications from cascading failures, improve user experience during partial outages, and give failing dependencies time to recover.

In this guide, we’ve explored the fundamentals of circuit breaker patterns, implemented various circuit breaker strategies in Go, and examined how to integrate them with common microservice components. We’ve also covered advanced topics like monitoring, configuration, and production best practices.

Remember that circuit breakers are most effective when combined with other resilience patterns like timeouts, retries, bulkheads, and fallbacks. Together, these patterns form a comprehensive approach to building truly resilient Go applications that can withstand the unpredictable nature of distributed environments.

As you implement circuit breakers in your own applications, start with simple implementations and gradually add more sophisticated features as you observe their behavior in production. Monitor their performance, tune their parameters, and continuously refine your approach to achieve the optimal balance between reliability and resource utilization.

By mastering these advanced fault tolerance patterns, you’ll be well-equipped to build Go applications that not only survive in the face of failures but continue to provide value to users even when parts of the system are degraded.

Andrew
Andrew

Andrew is a visionary software engineer and DevOps expert with a proven track record of delivering cutting-edge solutions that drive innovation at Ataiva.com. As a leader on numerous high-profile projects, Andrew brings his exceptional technical expertise and collaborative leadership skills to the table, fostering a culture of agility and excellence within the team. With a passion for architecting scalable systems, automating workflows, and empowering teams, Andrew is a sought-after authority in the field of software development and DevOps.

Tags

Recent Posts