In today’s distributed systems landscape, failures are not just possible—they’re inevitable. Network partitions, service outages, resource exhaustion, and unexpected load spikes can cascade through interconnected services, transforming localized failures into system-wide catastrophes. For Go developers building distributed applications, implementing robust fault tolerance mechanisms isn’t optional—it’s essential for maintaining system reliability and user experience.
Circuit breakers stand as one of the most powerful patterns in the resilience engineering toolkit. Inspired by their electrical counterparts, software circuit breakers prevent cascading failures by detecting when remote dependencies are failing and temporarily halting further requests, giving overloaded systems time to recover. In Go’s microservices ecosystem, properly implemented circuit breakers can mean the difference between graceful degradation and complete system collapse.
This comprehensive guide explores advanced circuit breaker patterns and fault tolerance mechanisms in Go. We’ll progress from fundamental concepts to sophisticated implementations, covering state management, failure detection algorithms, recovery strategies, and integration patterns with real-world Go microservices. By the end, you’ll have the knowledge to build truly resilient Go applications capable of withstanding the unpredictable nature of distributed environments.
Understanding Circuit Breaker Patterns
Before diving into implementation details, it’s crucial to understand the core concepts behind circuit breakers and how they fit into a broader resilience strategy.
The Circuit Breaker State Machine
At its core, a circuit breaker is a state machine with three distinct states:
package circuitbreaker
// State represents the current state of the circuit breaker
type State int
const (
// Closed means the circuit breaker is allowing requests through
Closed State = iota
// Open means the circuit breaker is preventing requests from going through
Open
// HalfOpen means the circuit breaker is allowing a limited number of test requests
HalfOpen
)
// CircuitBreaker represents a basic circuit breaker
type CircuitBreaker struct {
state State
failureCount int
failureThreshold int
resetTimeout time.Duration
lastFailureTime time.Time
mutex sync.RWMutex
}
The circuit breaker operates as follows:
- Closed State: The default state where requests flow normally. The circuit breaker monitors for failures.
- Open State: When failures exceed a threshold, the circuit “trips” to the open state, fast-failing requests without attempting to execute them.
- Half-Open State: After a timeout period, the circuit transitions to half-open, allowing a limited number of test requests to check if the underlying system has recovered.
Failure Detection Strategies
Effective circuit breakers must accurately detect failures, which can be more nuanced than simple error returns:
package circuitbreaker
// FailureDetector determines if a response should be considered a failure
type FailureDetector interface {
IsFailure(err error, response interface{}, duration time.Duration) bool
}
// StandardFailureDetector considers any error as a failure
type StandardFailureDetector struct{}
func (d *StandardFailureDetector) IsFailure(err error, _ interface{}, _ time.Duration) bool {
return err != nil
}
// AdvancedFailureDetector considers errors, status codes, and timeouts
type AdvancedFailureDetector struct {
TimeoutThreshold time.Duration
ErrorCodes map[int]bool // HTTP status codes considered failures
}
func (d *AdvancedFailureDetector) IsFailure(err error, response interface{}, duration time.Duration) bool {
// Check for timeout
if duration > d.TimeoutThreshold {
return true
}
// Check for error
if err != nil {
return true
}
// Check for HTTP status code if response is an HTTP response
if httpResp, ok := response.(*http.Response); ok {
if d.ErrorCodes[httpResp.StatusCode] {
return true
}
}
return false
}
Circuit Breaker in the Resilience Stack
Circuit breakers don’t operate in isolation but form part of a comprehensive resilience strategy:
package resilience
// ResilienceStack represents a complete resilience strategy
type ResilienceStack struct {
// Timeout prevents requests from hanging indefinitely
Timeout time.Duration
// Retry attempts to recover from transient failures
RetryStrategy RetryStrategy
// CircuitBreaker prevents overwhelming failing services
CircuitBreaker *circuitbreaker.CircuitBreaker
// Bulkhead limits concurrent requests
Bulkhead *bulkhead.Bulkhead
// Fallback provides alternative response when all else fails
Fallback FallbackFunc
}
// Execute runs a function with the complete resilience stack
func (s *ResilienceStack) Execute(ctx context.Context, operation func(context.Context) (interface{}, error)) (interface{}, error) {
// Apply timeout
timeoutCtx, cancel := context.WithTimeout(ctx, s.Timeout)
defer cancel()
// Check circuit breaker
if !s.CircuitBreaker.AllowRequest() {
return s.Fallback(ctx, circuitbreaker.ErrCircuitOpen)
}
// Apply bulkhead
if !s.Bulkhead.Acquire() {
return s.Fallback(ctx, bulkhead.ErrBulkheadFull)
}
defer s.Bulkhead.Release()
// Execute with retry
result, err := s.RetryStrategy.Execute(timeoutCtx, operation)
// Record result in circuit breaker
s.CircuitBreaker.RecordResult(err == nil)
// Apply fallback if needed
if err != nil {
return s.Fallback(ctx, err)
}
return result, nil
}
This layered approach provides defense in depth, with each mechanism addressing different failure modes.
Implementing Basic Circuit Breakers
Let’s start by implementing a simple but effective circuit breaker in Go.
Simple Counter-Based Circuit Breaker
The most straightforward implementation uses a counter to track failures:
package circuitbreaker
import (
"errors"
"sync"
"time"
)
var ErrCircuitOpen = errors.New("circuit breaker is open")
// SimpleCircuitBreaker implements a basic counter-based circuit breaker
type SimpleCircuitBreaker struct {
failureThreshold int
resetTimeout time.Duration
failureCount int
lastFailureTime time.Time
state State
mutex sync.RWMutex
}
// NewSimpleCircuitBreaker creates a new circuit breaker
func NewSimpleCircuitBreaker(failureThreshold int, resetTimeout time.Duration) *SimpleCircuitBreaker {
return &SimpleCircuitBreaker{
failureThreshold: failureThreshold,
resetTimeout: resetTimeout,
state: Closed,
}
}
// Execute runs the given function with circuit breaker protection
func (cb *SimpleCircuitBreaker) Execute(fn func() error) error {
if !cb.AllowRequest() {
return ErrCircuitOpen
}
err := fn()
cb.RecordResult(err == nil)
return err
}
// AllowRequest checks if a request should be allowed through
func (cb *SimpleCircuitBreaker) AllowRequest() bool {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
switch cb.state {
case Closed:
return true
case Open:
// Check if reset timeout has elapsed
if time.Since(cb.lastFailureTime) > cb.resetTimeout {
// Transition to half-open state must be done with a write lock
cb.mutex.RUnlock()
cb.mutex.Lock()
defer cb.mutex.Unlock()
// Double-check state after acquiring write lock
if cb.state == Open {
cb.state = HalfOpen
}
return cb.state == HalfOpen
}
return false
case HalfOpen:
// In half-open state, allow only one request through
cb.mutex.RUnlock()
cb.mutex.Lock()
defer cb.mutex.Unlock()
// Only the first request should be allowed
if cb.state == HalfOpen {
return true
}
return false
default:
return false
}
}
// RecordResult records the result of a request
func (cb *SimpleCircuitBreaker) RecordResult(success bool) {
cb.mutex.Lock()
defer cb.mutex.Unlock()
if success {
switch cb.state {
case HalfOpen:
// On success in half-open state, reset and close the circuit
cb.failureCount = 0
cb.state = Closed
case Closed:
// Reset failure count on success
cb.failureCount = 0
}
} else {
cb.lastFailureTime = time.Now()
switch cb.state {
case HalfOpen:
// On failure in half-open state, reopen the circuit
cb.state = Open
case Closed:
// Increment failure count and check threshold
cb.failureCount++
if cb.failureCount >= cb.failureThreshold {
cb.state = Open
}
}
}
}
// State returns the current state of the circuit breaker
func (cb *SimpleCircuitBreaker) State() State {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
return cb.state
}
Using the Simple Circuit Breaker
Here’s how to use our simple circuit breaker:
package main
import (
"fmt"
"net/http"
"time"
"example.com/circuitbreaker"
)
func main() {
// Create a circuit breaker that trips after 5 failures and resets after 10 seconds
cb := circuitbreaker.NewSimpleCircuitBreaker(5, 10*time.Second)
// Create an HTTP client with the circuit breaker
client := &http.Client{
Timeout: 5 * time.Second,
}
// Function to make HTTP request with circuit breaker protection
makeRequest := func(url string) (*http.Response, error) {
var resp *http.Response
err := cb.Execute(func() error {
var err error
resp, err = client.Get(url)
return err
})
return resp, err
}
// Example usage
for i := 0; i < 10; i++ {
resp, err := makeRequest("https://api.example.com/endpoint")
if err != nil {
if errors.Is(err, circuitbreaker.ErrCircuitOpen) {
fmt.Println("Circuit is open, skipping request")
continue
}
fmt.Printf("Request failed: %v\n", err)
continue
}
fmt.Printf("Request succeeded with status: %d\n", resp.StatusCode)
resp.Body.Close()
}
}
Sliding Window Circuit Breaker
A more sophisticated approach uses a sliding window to track failure rates:
package circuitbreaker
import (
"errors"
"sync"
"time"
)
// Result represents the outcome of a request
type Result struct {
Success bool
Time time.Time
}
// SlidingWindowCircuitBreaker implements a circuit breaker with a sliding window
type SlidingWindowCircuitBreaker struct {
windowSize time.Duration
failureRateThreshold float64
minimumRequests int
resetTimeout time.Duration
results []Result
state State
lastStateChange time.Time
mutex sync.RWMutex
}
// NewSlidingWindowCircuitBreaker creates a new sliding window circuit breaker
func NewSlidingWindowCircuitBreaker(
windowSize time.Duration,
failureRateThreshold float64,
minimumRequests int,
resetTimeout time.Duration,
) *SlidingWindowCircuitBreaker {
return &SlidingWindowCircuitBreaker{
windowSize: windowSize,
failureRateThreshold: failureRateThreshold,
minimumRequests: minimumRequests,
resetTimeout: resetTimeout,
state: Closed,
results: make([]Result, 0, 100), // Pre-allocate some capacity
}
}
// Execute runs the given function with circuit breaker protection
func (cb *SlidingWindowCircuitBreaker) Execute(fn func() error) error {
if !cb.AllowRequest() {
return ErrCircuitOpen
}
err := fn()
cb.RecordResult(err == nil)
return err
}
// AllowRequest checks if a request should be allowed through
func (cb *SlidingWindowCircuitBreaker) AllowRequest() bool {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
switch cb.state {
case Closed:
return true
case Open:
if time.Since(cb.lastStateChange) > cb.resetTimeout {
cb.mutex.RUnlock()
cb.mutex.Lock()
defer cb.mutex.Unlock()
if cb.state == Open {
cb.state = HalfOpen
cb.lastStateChange = time.Now()
}
return cb.state == HalfOpen
}
return false
case HalfOpen:
// In half-open state, allow limited requests
return true
default:
return false
}
}
// RecordResult records the result of a request
func (cb *SlidingWindowCircuitBreaker) RecordResult(success bool) {
cb.mutex.Lock()
defer cb.mutex.Unlock()
now := time.Now()
// Add the new result
cb.results = append(cb.results, Result{
Success: success,
Time: now,
})
// Remove results outside the window
cutoff := now.Add(-cb.windowSize)
newStart := 0
for i, result := range cb.results {
if result.Time.After(cutoff) {
newStart = i
break
}
}
cb.results = cb.results[newStart:]
// Calculate failure rate
if len(cb.results) >= cb.minimumRequests {
failures := 0
for _, result := range cb.results {
if !result.Success {
failures++
}
}
failureRate := float64(failures) / float64(len(cb.results))
switch cb.state {
case Closed:
if failureRate >= cb.failureRateThreshold {
cb.state = Open
cb.lastStateChange = now
}
case HalfOpen:
if !success {
cb.state = Open
cb.lastStateChange = now
} else if len(cb.results) >= cb.minimumRequests && failureRate < cb.failureRateThreshold {
cb.state = Closed
cb.lastStateChange = now
}
}
}
}
// State returns the current state of the circuit breaker
func (cb *SlidingWindowCircuitBreaker) State() State {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
return cb.state
}
Advanced Circuit Breaker Strategies
Now let’s explore more sophisticated circuit breaker implementations and strategies.
Adaptive Circuit Breaker
An adaptive circuit breaker adjusts its parameters based on observed behavior:
package circuitbreaker
import (
"math"
"sync"
"time"
)
// AdaptiveCircuitBreaker dynamically adjusts its thresholds based on traffic patterns
type AdaptiveCircuitBreaker struct {
baseFailureThreshold int
minFailureThreshold int
maxFailureThreshold int
baseResetTimeout time.Duration
minResetTimeout time.Duration
maxResetTimeout time.Duration
currentFailureThreshold int
currentResetTimeout time.Duration
consecutiveSuccesses int
consecutiveFailures int
state State
lastStateChange time.Time
failureCount int
mutex sync.RWMutex
}
// NewAdaptiveCircuitBreaker creates a new adaptive circuit breaker
func NewAdaptiveCircuitBreaker(
baseFailureThreshold int,
minFailureThreshold int,
maxFailureThreshold int,
baseResetTimeout time.Duration,
minResetTimeout time.Duration,
maxResetTimeout time.Duration,
) *AdaptiveCircuitBreaker {
return &AdaptiveCircuitBreaker{
baseFailureThreshold: baseFailureThreshold,
minFailureThreshold: minFailureThreshold,
maxFailureThreshold: maxFailureThreshold,
baseResetTimeout: baseResetTimeout,
minResetTimeout: minResetTimeout,
maxResetTimeout: maxResetTimeout,
currentFailureThreshold: baseFailureThreshold,
currentResetTimeout: baseResetTimeout,
state: Closed,
}
}
// AllowRequest checks if a request should be allowed through
func (cb *AdaptiveCircuitBreaker) AllowRequest() bool {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
switch cb.state {
case Closed:
return true
case Open:
if time.Since(cb.lastStateChange) > cb.currentResetTimeout {
cb.mutex.RUnlock()
cb.mutex.Lock()
defer cb.mutex.Unlock()
if cb.state == Open {
cb.state = HalfOpen
cb.lastStateChange = time.Now()
}
return cb.state == HalfOpen
}
return false
case HalfOpen:
return true
default:
return false
}
}
// RecordResult records the result of a request and adjusts thresholds
func (cb *AdaptiveCircuitBreaker) RecordResult(success bool) {
cb.mutex.Lock()
defer cb.mutex.Unlock()
now := time.Now()
if success {
cb.consecutiveSuccesses++
cb.consecutiveFailures = 0
// Adjust thresholds on consecutive successes
if cb.consecutiveSuccesses >= 5 {
// Increase failure threshold (more tolerant)
cb.currentFailureThreshold = int(math.Min(
float64(cb.maxFailureThreshold),
float64(cb.currentFailureThreshold+1),
))
// Decrease reset timeout (faster recovery)
cb.currentResetTimeout = time.Duration(math.Max(
float64(cb.minResetTimeout),
float64(cb.currentResetTimeout)*0.9,
))
cb.consecutiveSuccesses = 0
}
switch cb.state {
case HalfOpen:
// On success in half-open state, reset and close the circuit
cb.failureCount = 0
cb.state = Closed
cb.lastStateChange = now
case Closed:
// Reset failure count on success
cb.failureCount = 0
}
} else {
cb.consecutiveFailures++
cb.consecutiveSuccesses = 0
// Adjust thresholds on consecutive failures
if cb.consecutiveFailures >= 3 {
// Decrease failure threshold (less tolerant)
cb.currentFailureThreshold = int(math.Max(
float64(cb.minFailureThreshold),
float64(cb.currentFailureThreshold-1),
))
// Increase reset timeout (slower recovery)
cb.currentResetTimeout = time.Duration(math.Min(
float64(cb.maxResetTimeout),
float64(cb.currentResetTimeout)*1.1,
))
cb.consecutiveFailures = 0
}
switch cb.state {
case HalfOpen:
// On failure in half-open state, reopen the circuit
cb.state = Open
cb.lastStateChange = now
case Closed:
// Increment failure count and check threshold
cb.failureCount++
if cb.failureCount >= cb.currentFailureThreshold {
cb.state = Open
cb.lastStateChange = now
}
}
}
}
// Execute runs the given function with circuit breaker protection
func (cb *AdaptiveCircuitBreaker) Execute(fn func() error) error {
if !cb.AllowRequest() {
return ErrCircuitOpen
}
err := fn()
cb.RecordResult(err == nil)
return err
}
// State returns the current state of the circuit breaker
func (cb *AdaptiveCircuitBreaker) State() State {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
return cb.state
}
// CurrentThresholds returns the current adaptive thresholds
func (cb *AdaptiveCircuitBreaker) CurrentThresholds() (int, time.Duration) {
cb.mutex.RLock()
defer cb.mutex.RUnlock()
return cb.currentFailureThreshold, cb.currentResetTimeout
}
Composite Circuit Breaker
For complex systems, we can create a composite circuit breaker that manages multiple downstream dependencies:
package circuitbreaker
import (
"errors"
"sync"
)
// CompositeCircuitBreaker manages multiple circuit breakers for different dependencies
type CompositeCircuitBreaker struct {
breakers map[string]CircuitBreaker
mutex sync.RWMutex
}
// CircuitBreaker interface defines the methods a circuit breaker must implement
type CircuitBreaker interface {
AllowRequest() bool
RecordResult(success bool)
Execute(fn func() error) error
State() State
}
// NewCompositeCircuitBreaker creates a new composite circuit breaker
func NewCompositeCircuitBreaker() *CompositeCircuitBreaker {
return &CompositeCircuitBreaker{
breakers: make(map[string]CircuitBreaker),
}
}
// AddBreaker adds a circuit breaker for a specific dependency
func (c *CompositeCircuitBreaker) AddBreaker(name string, breaker CircuitBreaker) {
c.mutex.Lock()
defer c.mutex.Unlock()
c.breakers[name] = breaker
}
// Execute runs a function for a specific dependency with circuit breaker protection
func (c *CompositeCircuitBreaker) Execute(name string, fn func() error) error {
c.mutex.RLock()
breaker, exists := c.breakers[name]
c.mutex.RUnlock()
if !exists {
return errors.New("circuit breaker not found for: " + name)
}
return breaker.Execute(fn)
}
// AllowRequest checks if a request to a specific dependency should be allowed
func (c *CompositeCircuitBreaker) AllowRequest(name string) bool {
c.mutex.RLock()
defer c.mutex.RUnlock()
breaker, exists := c.breakers[name]
if !exists {
return false
}
return breaker.AllowRequest()
}
// RecordResult records the result of a request to a specific dependency
func (c *CompositeCircuitBreaker) RecordResult(name string, success bool) {
c.mutex.RLock()
breaker, exists := c.breakers[name]
c.mutex.RUnlock()
if exists {
breaker.RecordResult(success)
}
}
// State returns the state of a specific circuit breaker
func (c *CompositeCircuitBreaker) State(name string) (State, bool) {
c.mutex.RLock()
defer c.mutex.RUnlock()
breaker, exists := c.breakers[name]
if !exists {
return Closed, false
}
return breaker.State(), true
}
// States returns the states of all circuit breakers
func (c *CompositeCircuitBreaker) States() map[string]State {
c.mutex.RLock()
defer c.mutex.RUnlock()
states := make(map[string]State, len(c.breakers))
for name, breaker := range c.breakers {
states[name] = breaker.State()
}
return states
}
Bulkhead Pattern Implementation
The bulkhead pattern complements circuit breakers by limiting concurrent requests:
package bulkhead
import (
"errors"
"sync"
)
var ErrBulkheadFull = errors.New("bulkhead is full")
// Bulkhead limits the number of concurrent requests
type Bulkhead struct {
maxConcurrent int
current int
mutex sync.Mutex
}
// NewBulkhead creates a new bulkhead
func NewBulkhead(maxConcurrent int) *Bulkhead {
return &Bulkhead{
maxConcurrent: maxConcurrent,
}
}
// Acquire attempts to acquire a slot in the bulkhead
func (b *Bulkhead) Acquire() bool {
b.mutex.Lock()
defer b.mutex.Unlock()
if b.current >= b.maxConcurrent {
return false
}
b.current++
return true
}
// Release releases a slot in the bulkhead
func (b *Bulkhead) Release() {
b.mutex.Lock()
defer b.mutex.Unlock()
if b.current > 0 {
b.current--
}
}
// Execute runs a function with bulkhead protection
func (b *Bulkhead) Execute(fn func() error) error {
if !b.Acquire() {
return ErrBulkheadFull
}
defer b.Release()
return fn()
}
// CurrentLoad returns the current number of active requests
func (b *Bulkhead) CurrentLoad() int {
b.mutex.Lock()
defer b.mutex.Unlock()
return b.current
}
Integration with Go Microservices
Now let’s explore how to integrate circuit breakers with common Go microservice patterns.
HTTP Client Integration
Integrating circuit breakers with HTTP clients is a common use case:
package resilience
import (
"context"
"net/http"
"time"
"example.com/circuitbreaker"
)
// ResilienceClient wraps an HTTP client with resilience patterns
type ResilienceClient struct {
client *http.Client
circuitBreaker circuitbreaker.CircuitBreaker
bulkhead *bulkhead.Bulkhead
timeout time.Duration
}
// NewResilienceClient creates a new resilient HTTP client
func NewResilienceClient(
client *http.Client,
cb circuitbreaker.CircuitBreaker,
bh *bulkhead.Bulkhead,
timeout time.Duration,
) *ResilienceClient {
return &ResilienceClient{
client: client,
circuitBreaker: cb,
bulkhead: bh,
timeout: timeout,
}
}
// Do performs an HTTP request with resilience patterns
func (c *ResilienceClient) Do(req *http.Request) (*http.Response, error) {
// Apply timeout
ctx, cancel := context.WithTimeout(req.Context(), c.timeout)
defer cancel()
req = req.WithContext(ctx)
// Check circuit breaker
if !c.circuitBreaker.AllowRequest() {
return nil, circuitbreaker.ErrCircuitOpen
}
// Apply bulkhead
if !c.bulkhead.Acquire() {
return nil, bulkhead.ErrBulkheadFull
}
defer c.bulkhead.Release()
// Execute request
var resp *http.Response
var err error
err = c.circuitBreaker.Execute(func() error {
resp, err = c.client.Do(req)
if err != nil {
return err
}
// Consider certain status codes as failures
if resp.StatusCode >= 500 {
return &httpError{statusCode: resp.StatusCode}
}
return nil
})
return resp, err
}
// httpError represents an HTTP error with status code
type httpError struct {
statusCode int
}
func (e *httpError) Error() string {
return fmt.Sprintf("HTTP error: %d", e.statusCode)
}
gRPC Client Integration
Similarly, we can integrate circuit breakers with gRPC clients:
package resilience
import (
"context"
"time"
"strings"
"sync"
"example.com/circuitbreaker"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
// ResilienceInterceptor creates a gRPC client interceptor with resilience patterns
func ResilienceInterceptor(
cb circuitbreaker.CircuitBreaker,
bh *bulkhead.Bulkhead,
timeout time.Duration,
) grpc.UnaryClientInterceptor {
return func(
ctx context.Context,
method string,
req, reply interface{},
cc *grpc.ClientConn,
invoker grpc.UnaryInvoker,
opts ...grpc.CallOption,
) error {
// Apply timeout
timeoutCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
// Check circuit breaker
if !cb.AllowRequest() {
return status.Error(codes.Unavailable, "circuit breaker is open")
}
// Apply bulkhead
if !bh.Acquire() {
return status.Error(codes.ResourceExhausted, "bulkhead is full")
}
defer bh.Release()
// Execute request with circuit breaker
var err error
err = cb.Execute(func() error {
return invoker(timeoutCtx, method, req, reply, cc, opts...)
})
// Record result in circuit breaker based on error type
if err != nil {
// Only certain error types should trip the circuit breaker
s, ok := status.FromError(err)
if ok {
// Only server errors should trip the circuit breaker
switch s.Code() {
case codes.Unavailable, codes.DeadlineExceeded, codes.ResourceExhausted, codes.Internal, codes.Unknown:
cb.RecordResult(false)
default:
// Client errors should not trip the circuit breaker
cb.RecordResult(true)
}
} else {
// Non-status errors (like context cancellation) should trip the circuit breaker
cb.RecordResult(false)
}
return err
}
// Record success
cb.RecordResult(true)
return nil
}
}
// ServiceMethod represents a gRPC service method
type ServiceMethod struct {
Service string
Method string
}
// PerServiceCircuitBreaker manages circuit breakers per gRPC service
type PerServiceCircuitBreaker struct {
breakers map[ServiceMethod]circuitbreaker.CircuitBreaker
factory func() circuitbreaker.CircuitBreaker
mutex sync.RWMutex
}
// NewPerServiceCircuitBreaker creates a new per-service circuit breaker
func NewPerServiceCircuitBreaker(factory func() circuitbreaker.CircuitBreaker) *PerServiceCircuitBreaker {
return &PerServiceCircuitBreaker{
breakers: make(map[ServiceMethod]circuitbreaker.CircuitBreaker),
factory: factory,
}
}
// UnaryInterceptor creates a gRPC interceptor that uses per-service circuit breakers
func (p *PerServiceCircuitBreaker) UnaryInterceptor() grpc.UnaryClientInterceptor {
return func(
ctx context.Context,
method string,
req, reply interface{},
cc *grpc.ClientConn,
invoker grpc.UnaryInvoker,
opts ...grpc.CallOption,
) error {
// Parse service and method from full method name
parts := strings.Split(method, "/")
if len(parts) != 3 {
return invoker(ctx, method, req, reply, cc, opts...)
}
serviceMethod := ServiceMethod{
Service: parts[1],
Method: parts[2],
}
// Get or create circuit breaker for this service method
p.mutex.RLock()
cb, exists := p.breakers[serviceMethod]
p.mutex.RUnlock()
if !exists {
p.mutex.Lock()
cb, exists = p.breakers[serviceMethod]
if !exists {
cb = p.factory()
p.breakers[serviceMethod] = cb
}
p.mutex.Unlock()
}
// Check circuit breaker
if !cb.AllowRequest() {
return status.Error(codes.Unavailable, "circuit breaker is open")
}
// Execute request with circuit breaker
err := cb.Execute(func() error {
return invoker(ctx, method, req, reply, cc, opts...)
})
return err
}
}
Database Connection Circuit Breaker
Circuit breakers are also valuable for database connections:
package database
import (
"context"
"database/sql"
"time"
"example.com/circuitbreaker"
)
// ResilienceDB wraps a database with resilience patterns
type ResilienceDB struct {
db *sql.DB
circuitBreaker circuitbreaker.CircuitBreaker
timeout time.Duration
}
// NewResilienceDB creates a new resilient database wrapper
func NewResilienceDB(
db *sql.DB,
cb circuitbreaker.CircuitBreaker,
timeout time.Duration,
) *ResilienceDB {
return &ResilienceDB{
db: db,
circuitBreaker: cb,
timeout: timeout,
}
}
// Query executes a query with resilience patterns
func (r *ResilienceDB) Query(query string, args ...interface{}) (*sql.Rows, error) {
// Check circuit breaker
if !r.circuitBreaker.AllowRequest() {
return nil, circuitbreaker.ErrCircuitOpen
}
// Create context with timeout
ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
defer cancel()
// Execute query with circuit breaker
var rows *sql.Rows
var err error
err = r.circuitBreaker.Execute(func() error {
rows, err = r.db.QueryContext(ctx, query, args...)
return err
})
return rows, err
}
// Exec executes a statement with resilience patterns
func (r *ResilienceDB) Exec(query string, args ...interface{}) (sql.Result, error) {
// Check circuit breaker
if !r.circuitBreaker.AllowRequest() {
return nil, circuitbreaker.ErrCircuitOpen
}
// Create context with timeout
ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
defer cancel()
// Execute statement with circuit breaker
var result sql.Result
var err error
err = r.circuitBreaker.Execute(func() error {
result, err = r.db.ExecContext(ctx, query, args...)
return err
})
return result, err
}
// Transaction executes a transaction with resilience patterns
func (r *ResilienceDB) Transaction(fn func(*sql.Tx) error) error {
// Check circuit breaker
if !r.circuitBreaker.AllowRequest() {
return circuitbreaker.ErrCircuitOpen
}
// Create context with timeout
ctx, cancel := context.WithTimeout(context.Background(), r.timeout)
defer cancel()
// Execute transaction with circuit breaker
return r.circuitBreaker.Execute(func() error {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return err
}
// If fn returns an error, the transaction will be rolled back
if err := fn(tx); err != nil {
tx.Rollback()
return err
}
// Commit the transaction
return tx.Commit()
})
}
Monitoring and Observability
Implementing circuit breakers is only half the battle—you also need to monitor their behavior to ensure they’re working correctly and to tune their parameters.
Circuit Breaker Metrics
Collecting metrics from circuit breakers provides valuable insights:
package metrics
import (
"sync"
"time"
"example.com/circuitbreaker"
)
// CircuitBreakerMetrics collects metrics about circuit breaker behavior
type CircuitBreakerMetrics struct {
name string
successCount int64
failureCount int64
rejectedCount int64
stateTransitions map[circuitbreaker.State]int64
lastStateTransition time.Time
currentState circuitbreaker.State
mutex sync.RWMutex
}
// NewCircuitBreakerMetrics creates a new metrics collector
func NewCircuitBreakerMetrics(name string) *CircuitBreakerMetrics {
return &CircuitBreakerMetrics{
name: name,
stateTransitions: make(map[circuitbreaker.State]int64),
currentState: circuitbreaker.Closed,
lastStateTransition: time.Now(),
}
}
// RecordSuccess records a successful request
func (m *CircuitBreakerMetrics) RecordSuccess() {
m.mutex.Lock()
defer m.mutex.Unlock()
m.successCount++
}
// RecordFailure records a failed request
func (m *CircuitBreakerMetrics) RecordFailure() {
m.mutex.Lock()
defer m.mutex.Unlock()
m.failureCount++
}
// RecordRejected records a rejected request due to open circuit
func (m *CircuitBreakerMetrics) RecordRejected() {
m.mutex.Lock()
defer m.mutex.Unlock()
m.rejectedCount++
}
// RecordStateTransition records a state transition
func (m *CircuitBreakerMetrics) RecordStateTransition(newState circuitbreaker.State) {
m.mutex.Lock()
defer m.mutex.Unlock()
if m.currentState != newState {
m.stateTransitions[newState]++
m.lastStateTransition = time.Now()
m.currentState = newState
}
}
// GetMetrics returns the current metrics
func (m *CircuitBreakerMetrics) GetMetrics() map[string]interface{} {
m.mutex.RLock()
defer m.mutex.RUnlock()
metrics := map[string]interface{}{
"name": m.name,
"success_count": m.successCount,
"failure_count": m.failureCount,
"rejected_count": m.rejectedCount,
"current_state": m.currentState.String(),
"last_state_transition": m.lastStateTransition,
"state_transitions": make(map[string]int64),
}
for state, count := range m.stateTransitions {
metrics["state_transitions"].(map[string]int64)[state.String()] = count
}
return metrics
}
// CalculateErrorRate calculates the current error rate
func (m *CircuitBreakerMetrics) CalculateErrorRate() float64 {
m.mutex.RLock()
defer m.mutex.RUnlock()
total := m.successCount + m.failureCount
if total == 0 {
return 0.0
}
return float64(m.failureCount) / float64(total)
}
// InstrumentedCircuitBreaker wraps a circuit breaker with metrics
type InstrumentedCircuitBreaker struct {
circuitBreaker circuitbreaker.CircuitBreaker
metrics *CircuitBreakerMetrics
}
// NewInstrumentedCircuitBreaker creates a new instrumented circuit breaker
func NewInstrumentedCircuitBreaker(
cb circuitbreaker.CircuitBreaker,
metrics *CircuitBreakerMetrics,
) *InstrumentedCircuitBreaker {
return &InstrumentedCircuitBreaker{
circuitBreaker: cb,
metrics: metrics,
}
}
// AllowRequest checks if a request should be allowed through
func (i *InstrumentedCircuitBreaker) AllowRequest() bool {
allowed := i.circuitBreaker.AllowRequest()
if !allowed {
i.metrics.RecordRejected()
}
return allowed
}
// RecordResult records the result of a request
func (i *InstrumentedCircuitBreaker) RecordResult(success bool) {
i.circuitBreaker.RecordResult(success)
if success {
i.metrics.RecordSuccess()
} else {
i.metrics.RecordFailure()
}
// Record state changes
i.metrics.RecordStateTransition(i.circuitBreaker.State())
}
// Execute runs the given function with circuit breaker protection
func (i *InstrumentedCircuitBreaker) Execute(fn func() error) error {
if !i.AllowRequest() {
return circuitbreaker.ErrCircuitOpen
}
err := fn()
i.RecordResult(err == nil)
return err
}
// State returns the current state of the circuit breaker
func (i *InstrumentedCircuitBreaker) State() circuitbreaker.State {
return i.circuitBreaker.State()
}
Prometheus Integration
Exposing circuit breaker metrics to Prometheus enables powerful monitoring and alerting:
package metrics
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// PrometheusCircuitBreakerCollector collects circuit breaker metrics for Prometheus
type PrometheusCircuitBreakerCollector struct {
metrics map[string]*CircuitBreakerMetrics
successCounter *prometheus.CounterVec
failureCounter *prometheus.CounterVec
rejectedCounter *prometheus.CounterVec
stateGauge *prometheus.GaugeVec
errorRateGauge *prometheus.GaugeVec
}
// NewPrometheusCircuitBreakerCollector creates a new Prometheus collector
func NewPrometheusCircuitBreakerCollector() *PrometheusCircuitBreakerCollector {
collector := &PrometheusCircuitBreakerCollector{
metrics: make(map[string]*CircuitBreakerMetrics),
successCounter: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "circuit_breaker_success_total",
Help: "Total number of successful requests",
},
[]string{"name"},
),
failureCounter: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "circuit_breaker_failure_total",
Help: "Total number of failed requests",
},
[]string{"name"},
),
rejectedCounter: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "circuit_breaker_rejected_total",
Help: "Total number of rejected requests due to open circuit",
},
[]string{"name"},
),
stateGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "circuit_breaker_state",
Help: "Current state of the circuit breaker (0=closed, 1=half-open, 2=open)",
},
[]string{"name"},
),
errorRateGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "circuit_breaker_error_rate",
Help: "Current error rate of the circuit breaker",
},
[]string{"name"},
),
}
// Register metrics with Prometheus
prometheus.MustRegister(
collector.successCounter,
collector.failureCounter,
collector.rejectedCounter,
collector.stateGauge,
collector.errorRateGauge,
)
return collector
}
// RegisterMetrics registers circuit breaker metrics
func (c *PrometheusCircuitBreakerCollector) RegisterMetrics(name string, metrics *CircuitBreakerMetrics) {
c.metrics[name] = metrics
}
// UpdateMetrics updates Prometheus metrics from circuit breaker metrics
func (c *PrometheusCircuitBreakerCollector) UpdateMetrics() {
for name, metrics := range c.metrics {
m := metrics.GetMetrics()
c.successCounter.WithLabelValues(name).Add(float64(m["success_count"].(int64)))
c.failureCounter.WithLabelValues(name).Add(float64(m["failure_count"].(int64)))
c.rejectedCounter.WithLabelValues(name).Add(float64(m["rejected_count"].(int64)))
// Map state to numeric value
var stateValue float64
switch m["current_state"].(string) {
case "Closed":
stateValue = 0
case "HalfOpen":
stateValue = 1
case "Open":
stateValue = 2
}
c.stateGauge.WithLabelValues(name).Set(stateValue)
// Set error rate
c.errorRateGauge.WithLabelValues(name).Set(metrics.CalculateErrorRate())
}
}
// SetupPrometheusHandler sets up an HTTP handler for Prometheus metrics
func SetupPrometheusHandler() {
http.Handle("/metrics", promhttp.Handler())
}
Health Checks and Circuit Breaker Status
Exposing circuit breaker status through health checks helps with operational visibility:
package health
import (
"encoding/json"
"net/http"
"example.com/circuitbreaker"
)
// CircuitBreakerHealth represents the health status of circuit breakers
type CircuitBreakerHealth struct {
breakers map[string]circuitbreaker.CircuitBreaker
}
// NewCircuitBreakerHealth creates a new health check handler
func NewCircuitBreakerHealth() *CircuitBreakerHealth {
return &CircuitBreakerHealth{
breakers: make(map[string]circuitbreaker.CircuitBreaker),
}
}
// RegisterCircuitBreaker registers a circuit breaker for health checks
func (h *CircuitBreakerHealth) RegisterCircuitBreaker(name string, cb circuitbreaker.CircuitBreaker) {
h.breakers[name] = cb
}
// ServeHTTP implements the http.Handler interface
func (h *CircuitBreakerHealth) ServeHTTP(w http.ResponseWriter, r *http.Request) {
status := make(map[string]interface{})
allHealthy := true
for name, cb := range h.breakers {
state := cb.State()
status[name] = map[string]interface{}{
"state": state.String(),
"healthy": state == circuitbreaker.Closed,
}
if state != circuitbreaker.Closed {
allHealthy = false
}
}
w.Header().Set("Content-Type", "application/json")
if !allHealthy {
w.WriteHeader(http.StatusServiceUnavailable)
}
json.NewEncoder(w).Encode(map[string]interface{}{
"status": status,
"healthy": allHealthy,
})
}
// SetupHealthCheckHandler sets up an HTTP handler for health checks
func SetupHealthCheckHandler(health *CircuitBreakerHealth) {
http.Handle("/health/circuit-breakers", health)
}
Production Deployment and Best Practices
Deploying circuit breakers in production requires careful consideration of configuration, testing, and operational practices.
Configuration Best Practices
Circuit breakers should be configurable to adapt to different environments:
package config
import (
"encoding/json"
"os"
"time"
"example.com/circuitbreaker"
)
// CircuitBreakerConfig represents configuration for a circuit breaker
type CircuitBreakerConfig struct {
FailureThreshold int `json:"failureThreshold"`
ResetTimeout string `json:"resetTimeout"`
FailureRateThreshold float64 `json:"failureRateThreshold"`
MinimumRequests int `json:"minimumRequests"`
WindowSize string `json:"windowSize"`
Timeout string `json:"timeout"`
}
// ResilienceConfig represents configuration for resilience patterns
type ResilienceConfig struct {
CircuitBreakers map[string]CircuitBreakerConfig `json:"circuitBreakers"`
Bulkheads map[string]int `json:"bulkheads"`
}
// LoadConfig loads configuration from a file
func LoadConfig(filename string) (*ResilienceConfig, error) {
file, err := os.Open(filename)
if err != nil {
return nil, err
}
defer file.Close()
config := &ResilienceConfig{}
if err := json.NewDecoder(file).Decode(config); err != nil {
return nil, err
}
return config, nil
}
// CreateCircuitBreaker creates a circuit breaker from configuration
func CreateCircuitBreaker(config CircuitBreakerConfig) (circuitbreaker.CircuitBreaker, error) {
resetTimeout, err := time.ParseDuration(config.ResetTimeout)
if err != nil {
return nil, err
}
windowSize, err := time.ParseDuration(config.WindowSize)
if err != nil {
return nil, err
}
timeout, err := time.ParseDuration(config.Timeout)
if err != nil {
return nil, err
}
// Choose the appropriate circuit breaker implementation based on configuration
if config.FailureRateThreshold > 0 {
return circuitbreaker.NewSlidingWindowCircuitBreaker(
windowSize,
config.FailureRateThreshold,
config.MinimumRequests,
resetTimeout,
), nil
}
return circuitbreaker.NewSimpleCircuitBreaker(
config.FailureThreshold,
resetTimeout,
), nil
}
Example configuration file:
{
"circuitBreakers": {
"userService": {
"failureThreshold": 5,
"resetTimeout": "30s",
"timeout": "2s"
},
"paymentService": {
"failureRateThreshold": 0.25,
"minimumRequests": 10,
"windowSize": "1m",
"resetTimeout": "1m",
"timeout": "5s"
},
"inventoryService": {
"failureThreshold": 3,
"resetTimeout": "15s",
"timeout": "1s"
}
},
"bulkheads": {
"userService": 50,
"paymentService": 20,
"inventoryService": 30
}
}
Testing Circuit Breakers
Testing circuit breakers requires simulating failure scenarios:
package testing
import (
"errors"
"testing"
"time"
"math/rand"
"example.com/circuitbreaker"
)
// TestSimpleCircuitBreaker tests the basic functionality of a circuit breaker
func TestSimpleCircuitBreaker(t *testing.T) {
// Create a circuit breaker that trips after 3 failures and resets after 100ms
cb := circuitbreaker.NewSimpleCircuitBreaker(3, 100*time.Millisecond)
// Function that always fails
failingFunc := func() error {
return errors.New("simulated failure")
}
// Function that always succeeds
successFunc := func() error {
return nil
}
// Test initial state
if cb.State() != circuitbreaker.Closed {
t.Errorf("Initial state should be Closed, got %v", cb.State())
}
// Test that circuit opens after threshold failures
for i := 0; i < 3; i++ {
err := cb.Execute(failingFunc)
if err == nil {
t.Errorf("Expected error, got nil")
}
}
// Circuit should now be open
if cb.State() != circuitbreaker.Open {
t.Errorf("State should be Open after failures, got %v", cb.State())
}
// Test that requests are rejected when circuit is open
err := cb.Execute(successFunc)
if err != circuitbreaker.ErrCircuitOpen {
t.Errorf("Expected ErrCircuitOpen, got %v", err)
}
// Wait for reset timeout
time.Sleep(150 * time.Millisecond)
// Circuit should now be half-open
if !cb.AllowRequest() {
t.Errorf("Circuit should allow a request after reset timeout")
}
// Test that circuit closes after success in half-open state
err = cb.Execute(successFunc)
if err != nil {
t.Errorf("Expected success, got %v", err)
}
// Circuit should now be closed
if cb.State() != circuitbreaker.Closed {
t.Errorf("State should be Closed after success, got %v", cb.State())
}
}
// TestCircuitBreakerWithChaos tests circuit breaker behavior with chaos testing
func TestCircuitBreakerWithChaos(t *testing.T) {
if testing.Short() {
t.Skip("Skipping chaos test in short mode")
}
// Create a sliding window circuit breaker
cb := circuitbreaker.NewSlidingWindowCircuitBreaker(
1*time.Second, // 1 second window
0.5, // 50% failure threshold
10, // Minimum 10 requests
500*time.Millisecond, // 500ms reset timeout
)
// Create a function with variable failure rate
var failureRate float64 = 0.0
testFunc := func() error {
if rand.Float64() < failureRate {
return errors.New("simulated failure")
}
return nil
}
// Run test for 10 seconds
start := time.Now()
end := start.Add(10 * time.Second)
// Track statistics
requests := 0
failures := 0
rejections := 0
for time.Now().Before(end) {
// Adjust failure rate over time to simulate service degradation and recovery
elapsed := time.Since(start).Seconds()
switch {
case elapsed < 2:
failureRate = 0.1 // 10% failures initially
case elapsed < 4:
failureRate = 0.6 // 60% failures - should trip circuit
case elapsed < 6:
failureRate = 0.7 // 70% failures - circuit should stay open
case elapsed < 8:
failureRate = 0.2 // 20% failures - circuit should recover
default:
failureRate = 0.1 // 10% failures - circuit should stay closed
}
requests++
err := cb.Execute(testFunc)
if err != nil {
if errors.Is(err, circuitbreaker.ErrCircuitOpen) {
rejections++
} else {
failures++
}
}
// Small delay between requests
time.Sleep(10 * time.Millisecond)
}
// Log statistics
t.Logf("Total requests: %d", requests)
t.Logf("Failures: %d (%.2f%%)", failures, float64(failures)/float64(requests)*100)
t.Logf("Rejections: %d (%.2f%%)", rejections, float64(rejections)/float64(requests)*100)
// Verify circuit breaker protected the system during high failure rates
if rejections == 0 {
t.Errorf("Circuit breaker should have rejected some requests during high failure rates")
}
}
Deployment Strategies
Deploying circuit breakers requires careful consideration of dependencies and failure modes:
package main
import (
"context"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
"example.com/circuitbreaker"
"example.com/config"
"example.com/health"
"example.com/metrics"
)
func main() {
// Load configuration
cfg, err := config.LoadConfig("resilience.json")
if err != nil {
log.Fatalf("Failed to load configuration: %v", err)
}
// Create circuit breakers
circuitBreakers := make(map[string]circuitbreaker.CircuitBreaker)
for name, cbConfig := range cfg.CircuitBreakers {
cb, err := config.CreateCircuitBreaker(cbConfig)
if err != nil {
log.Fatalf("Failed to create circuit breaker %s: %v", name, err)
}
circuitBreakers[name] = cb
}
// Set up metrics
promCollector := metrics.NewPrometheusCircuitBreakerCollector()
// Instrument circuit breakers
instrumentedBreakers := make(map[string]circuitbreaker.CircuitBreaker)
for name, cb := range circuitBreakers {
metrics := metrics.NewCircuitBreakerMetrics(name)
promCollector.RegisterMetrics(name, metrics)
instrumentedBreakers[name] = metrics.NewInstrumentedCircuitBreaker(cb, metrics)
}
// Set up health checks
healthCheck := health.NewCircuitBreakerHealth()
for name, cb := range instrumentedBreakers {
healthCheck.RegisterCircuitBreaker(name, cb)
}
// Set up HTTP handlers
metrics.SetupPrometheusHandler()
health.SetupHealthCheckHandler(healthCheck)
// Start metrics updater
go func() {
ticker := time.NewTicker(15 * time.Second)
defer ticker.Stop()
for range ticker.C {
promCollector.UpdateMetrics()
}
}()
// Start HTTP server
server := &http.Server{
Addr: ":8080",
}
// Graceful shutdown
go func() {
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
<-signals
log.Println("Shutting down...")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
log.Printf("HTTP server shutdown error: %v", err)
}
}()
// Start server
log.Println("Starting server on :8080")
if err := server.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("HTTP server error: %v", err)
}
}
Production Best Practices
Here are key best practices for using circuit breakers in production:
Start Conservative: Begin with higher failure thresholds and shorter reset timeouts, then adjust based on observed behavior.
Granular Circuit Breakers: Use separate circuit breakers for different dependencies and even different operations on the same dependency.
Fallbacks: Always implement fallbacks for when the circuit is open:
// Example fallback strategy
func getUserWithFallback(userID string) (*User, error) {
// Try primary data source with circuit breaker
var user *User
err := userServiceCB.Execute(func() error {
var err error
user, err = userService.GetUser(userID)
return err
})
// If circuit is open or request failed, try fallback
if err != nil {
// Try cache
cachedUser, cacheErr := userCache.Get(userID)
if cacheErr == nil {
return cachedUser, nil
}
// Try default
return &User{
ID: userID,
Name: "Unknown User",
IsGuest: true,
}, nil
}
return user, nil
}
Monitor and Alert: Set up alerts for circuit breaker state changes and high rejection rates:
// Example Prometheus alert rule
groups:
- name: CircuitBreakerAlerts
rules:
- alert: CircuitBreakerOpen
expr: circuit_breaker_state{name=~".*"} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Circuit breaker {{ $labels.name }} is open"
description: "The circuit breaker {{ $labels.name }} has been open for 5 minutes."
- alert: HighRejectionRate
expr: rate(circuit_breaker_rejected_total{name=~".*"}[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "High rejection rate for {{ $labels.name }}"
description: "Circuit breaker {{ $labels.name }} is rejecting more than 10 requests per second."
- Tune Parameters: Adjust circuit breaker parameters based on observed behavior:
// Example adaptive parameter tuning
func tuneCircuitBreakerParameters(cb *AdaptiveCircuitBreaker, metrics *CircuitBreakerMetrics) {
// Check metrics every minute
ticker := time.NewTicker(1 * time.Minute)
defer ticker.Stop()
for range ticker.C {
errorRate := metrics.CalculateErrorRate()
rejectionRate := float64(metrics.rejectedCount) / float64(metrics.successCount + metrics.failureCount + metrics.rejectedCount)
// If error rate is very low but rejection rate is high, make the circuit breaker more lenient
if errorRate < 0.01 && rejectionRate > 0.1 {
threshold, timeout := cb.CurrentThresholds()
cb.SetThresholds(threshold+1, timeout*0.9)
log.Printf("Tuned circuit breaker to be more lenient: threshold=%d, timeout=%v", threshold+1, timeout*0.9)
}
// If error rate is very high, make the circuit breaker more strict
if errorRate > 0.3 {
threshold, timeout := cb.CurrentThresholds()
cb.SetThresholds(threshold-1, timeout*1.1)
log.Printf("Tuned circuit breaker to be more strict: threshold=%d, timeout=%v", threshold-1, timeout*1.1)
}
}
}
- Graceful Degradation: Design systems to function with reduced capabilities when dependencies are unavailable:
// Example service with graceful degradation
type RecommendationService struct {
productService *ProductService
productServiceCB circuitbreaker.CircuitBreaker
userService *UserService
userServiceCB circuitbreaker.CircuitBreaker
analyticsService *AnalyticsService
analyticsServiceCB circuitbreaker.CircuitBreaker
}
// GetRecommendations returns product recommendations for a user
func (s *RecommendationService) GetRecommendations(userID string) ([]Product, error) {
// Try to get personalized recommendations using all services
if s.userServiceCB.AllowRequest() && s.analyticsServiceCB.AllowRequest() {
var user *User
var userHistory []PurchaseHistory
// Get user data
userErr := s.userServiceCB.Execute(func() error {
var err error
user, err = s.userService.GetUser(userID)
return err
})
// Get user history
historyErr := s.analyticsServiceCB.Execute(func() error {
var err error
userHistory, err = s.analyticsService.GetUserHistory(userID)
return err
})
// If both succeeded, generate personalized recommendations
if userErr == nil && historyErr == nil {
return s.generatePersonalizedRecommendations(user, userHistory)
}
}
// Fallback to category-based recommendations
if s.productServiceCB.AllowRequest() {
var categories []string
// Try to get user's preferred categories if user service is available
if s.userServiceCB.AllowRequest() {
s.userServiceCB.Execute(func() error {
var err error
categories, err = s.userService.GetUserPreferredCategories(userID)
return err
})
}
// If we have categories, use them; otherwise use popular categories
if len(categories) == 0 {
categories = []string{"popular", "trending"}
}
// Get recommendations by category
var products []Product
err := s.productServiceCB.Execute(func() error {
var err error
products, err = s.productService.GetProductsByCategories(categories, 10)
return err
})
if err == nil {
return products, nil
}
}
// Ultimate fallback: return hardcoded popular products
return s.getHardcodedPopularProducts(), nil
}
- Avoid Cascading Circuit Breakers: Be careful with circuit breakers that depend on each other:
// Example of problematic cascading circuit breakers
func problematicDesign() {
// Service A calls Service B which calls Service C
serviceC := NewService("C")
serviceCCircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
serviceB := NewService("B")
serviceBCircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
// Problem: If C fails, B's circuit breaker will open, then A's circuit breaker will open
// This creates a cascading effect where failures propagate upward
// Better design: Use different thresholds and timeouts at different levels
serviceCCircuitBreaker = circuitbreaker.NewSimpleCircuitBreaker(3, 10*time.Second)
serviceBCircuitBreaker = circuitbreaker.NewSimpleCircuitBreaker(5, 30*time.Second)
serviceACircuitBreaker := circuitbreaker.NewSimpleCircuitBreaker(7, 60*time.Second)
// Even better: Implement fallbacks at each level
}
- Circuit Breaker Patterns by Dependency Type: Tailor circuit breaker configurations to the type of dependency:
Dependency Type | Failure Threshold | Reset Timeout | Notes |
---|---|---|---|
Critical database | Higher (5-10) | Shorter (10-30s) | Essential service, try more aggressively to reconnect |
External API | Lower (3-5) | Longer (30-60s) | Less control, be more conservative |
Cache service | Very low (1-2) | Very short (5-10s) | Non-critical, fail fast and recover quickly |
Background job | Higher (8-10) | Medium (20-40s) | Can tolerate more failures before breaking |
What This Means
Circuit breakers are a critical component in the resilience toolkit for Go developers building distributed systems. By implementing these patterns, you can protect your applications from cascading failures, improve user experience during partial outages, and give failing dependencies time to recover.
In this guide, we’ve explored the fundamentals of circuit breaker patterns, implemented various circuit breaker strategies in Go, and examined how to integrate them with common microservice components. We’ve also covered advanced topics like monitoring, configuration, and production best practices.
Remember that circuit breakers are most effective when combined with other resilience patterns like timeouts, retries, bulkheads, and fallbacks. Together, these patterns form a comprehensive approach to building truly resilient Go applications that can withstand the unpredictable nature of distributed environments.
As you implement circuit breakers in your own applications, start with simple implementations and gradually add more sophisticated features as you observe their behavior in production. Monitor their performance, tune their parameters, and continuously refine your approach to achieve the optimal balance between reliability and resource utilization.
By mastering these advanced fault tolerance patterns, you’ll be well-equipped to build Go applications that not only survive in the face of failures but continue to provide value to users even when parts of the system are degraded.