In the realm of Go development, writing functional code is just the beginning. As applications scale and user expectations for performance increase, the ability to identify and eliminate bottlenecks becomes a critical skill. Go’s rich ecosystem of profiling and optimization tools provides developers with powerful capabilities to analyze and enhance application performance, but mastering these tools requires both technical knowledge and methodical approach.
This comprehensive guide explores advanced performance profiling and optimization techniques for Go applications. We’ll dive deep into Go’s profiling tools, analyze real-world performance bottlenecks, and implement proven optimization strategies that can dramatically improve your application’s efficiency. Whether you’re dealing with CPU-bound computations, memory-intensive operations, or concurrency challenges, these techniques will help you build Go applications that perform at their absolute best.
Understanding Go Performance Fundamentals
Before diving into specific profiling tools, it’s essential to understand the key factors that influence Go application performance and the metrics that matter most when optimizing.
Performance Metrics and Objectives
Performance optimization should always begin with clear objectives and metrics:
package main
import (
"fmt"
"time"
)
// PerformanceMetrics tracks key performance indicators
type PerformanceMetrics struct {
Latency time.Duration // Time to complete a single operation
Throughput int // Operations per second
MemoryUsage uint64 // Bytes allocated
CPUUsage float64 // CPU utilization percentage
GCPause time.Duration // Garbage collection pause time
ResponseTime time.Duration // Time to first byte (for servers)
ErrorRate float64 // Percentage of operations that fail
SaturationPoint int // Load at which performance degrades
}
// Example performance objectives for different application types
func definePerformanceObjectives() {
// Low-latency trading system
tradingSystemObjectives := PerformanceMetrics{
Latency: 100 * time.Microsecond, // 99th percentile
Throughput: 100000, // 100K trades per second
MemoryUsage: 1 * 1024 * 1024 * 1024, // 1GB max heap
CPUUsage: 80.0, // 80% max CPU utilization
GCPause: 1 * time.Millisecond, // 1ms max GC pause
ErrorRate: 0.0001, // 0.01% max error rate
SaturationPoint: 120000, // Handles 20% over target load
}
// Web API service
webAPIObjectives := PerformanceMetrics{
Latency: 50 * time.Millisecond, // 99th percentile
Throughput: 5000, // 5K requests per second
MemoryUsage: 2 * 1024 * 1024 * 1024, // 2GB max heap
ResponseTime: 20 * time.Millisecond, // 20ms time to first byte
ErrorRate: 0.001, // 0.1% max error rate
SaturationPoint: 7500, // Handles 50% over target load
}
// Batch processing system
batchProcessingObjectives := PerformanceMetrics{
Throughput: 10000, // 10K records per second
MemoryUsage: 8 * 1024 * 1024 * 1024, // 8GB max heap
CPUUsage: 95.0, // 95% max CPU utilization
ErrorRate: 0.0005, // 0.05% max error rate
}
fmt.Printf("Trading system 99th percentile latency target: %v\n",
tradingSystemObjectives.Latency)
fmt.Printf("Web API throughput target: %v requests/second\n",
webAPIObjectives.Throughput)
fmt.Printf("Batch processing memory usage target: %v bytes\n",
batchProcessingObjectives.MemoryUsage)
}
Performance Bottleneck Categories
Understanding the different types of bottlenecks helps guide your profiling approach:
package main
import (
"fmt"
"time"
)
// BottleneckCategory identifies the type of performance limitation
type BottleneckCategory string
const (
CPUBound BottleneckCategory = "CPU-bound"
MemoryBound BottleneckCategory = "Memory-bound"
IOBound BottleneckCategory = "I/O-bound"
NetworkBound BottleneckCategory = "Network-bound"
LockContention BottleneckCategory = "Lock contention"
GCPressure BottleneckCategory = "GC pressure"
)
// BottleneckSignature helps identify the type of bottleneck
type BottleneckSignature struct {
Category BottleneckCategory
Symptoms []string
ProfilingApproach []string
CommonCauses []string
}
// Define bottleneck signatures to guide profiling
func bottleneckCatalog() map[BottleneckCategory]BottleneckSignature {
return map[BottleneckCategory]BottleneckSignature{
CPUBound: {
Category: CPUBound,
Symptoms: []string{
"High CPU utilization",
"Performance scales with CPU cores",
"Low wait time in profiling",
"Response time degrades under load",
},
ProfilingApproach: []string{
"CPU profiling with pprof",
"Execution tracing",
"Benchmark hot functions",
},
CommonCauses: []string{
"Inefficient algorithms",
"Excessive type conversions",
"String concatenation in loops",
"Reflection-heavy code",
},
},
MemoryBound: {
Category: MemoryBound,
Symptoms: []string{
"High memory usage",
"Frequent GC cycles",
"Performance degrades over time",
"Out of memory errors",
},
ProfilingApproach: []string{
"Memory profiling with pprof",
"Heap analysis",
"GC trace analysis",
},
CommonCauses: []string{
"Memory leaks",
"Large object allocations",
"Excessive allocations in hot paths",
"Inefficient data structures",
},
},
IOBound: {
Category: IOBound,
Symptoms: []string{
"Low CPU utilization",
"High wait time in profiling",
"Performance doesn't scale with CPU",
"Blocking on file operations",
},
ProfilingApproach: []string{
"Block profiling",
"Execution tracing",
"I/O specific benchmarks",
},
CommonCauses: []string{
"Synchronous file operations",
"Inefficient I/O patterns",
"Missing buffering",
"File system limitations",
},
},
NetworkBound: {
Category: NetworkBound,
Symptoms: []string{
"Low CPU utilization",
"High wait time in profiling",
"Latency spikes",
"Connection pool exhaustion",
},
ProfilingApproach: []string{
"Network monitoring",
"Connection tracking",
"Request/response timing",
},
CommonCauses: []string{
"Excessive network requests",
"Large payload sizes",
"Connection pool misconfiguration",
"Network latency",
},
},
LockContention: {
Category: LockContention,
Symptoms: []string{
"CPU not fully utilized despite load",
"Goroutines blocked waiting for locks",
"Performance degrades with concurrency",
"Mutex hot spots in profiles",
},
ProfilingApproach: []string{
"Mutex profiling",
"Goroutine analysis",
"Execution tracing",
},
CommonCauses: []string{
"Coarse-grained locking",
"Long critical sections",
"Unnecessary synchronization",
"Lock ordering issues",
},
},
GCPressure: {
Category: GCPressure,
Symptoms: []string{
"Regular latency spikes",
"High GC CPU utilization",
"Performance degrades with memory usage",
"Stop-the-world pauses",
},
ProfilingApproach: []string{
"GC trace analysis",
"Memory profiling",
"Allocation analysis",
},
CommonCauses: []string{
"High allocation rate",
"Large working set",
"Pointer-heavy data structures",
"Finalizers and weak references",
},
},
}
}
// DiagnoseBottleneck attempts to identify the type of bottleneck
func DiagnoseBottleneck(
cpuUtilization float64,
memoryGrowth bool,
ioWaitTime time.Duration,
networkLatency time.Duration,
goroutineBlockTime time.Duration,
gcPauseTime time.Duration,
) BottleneckCategory {
// Simplified diagnostic logic
if cpuUtilization > 80 && ioWaitTime < 10*time.Millisecond {
return CPUBound
} else if memoryGrowth && gcPauseTime > 100*time.Millisecond {
return GCPressure
} else if goroutineBlockTime > 100*time.Millisecond {
return LockContention
} else if ioWaitTime > 100*time.Millisecond {
return IOBound
} else if networkLatency > 100*time.Millisecond {
return NetworkBound
} else if memoryGrowth {
return MemoryBound
}
return CPUBound // Default if no clear signal
}
func main() {
// Example usage
catalog := bottleneckCatalog()
// Diagnose a sample application
bottleneckType := DiagnoseBottleneck(
90.0, // 90% CPU utilization
false, // No memory growth
5*time.Millisecond, // Low I/O wait
20*time.Millisecond, // Low network latency
1*time.Millisecond, // Low goroutine block time
5*time.Millisecond, // Low GC pause time
)
// Get guidance for the identified bottleneck
signature := catalog[bottleneckType]
fmt.Printf("Diagnosed bottleneck: %s\n", signature.Category)
fmt.Println("Recommended profiling approaches:")
for _, approach := range signature.ProfilingApproach {
fmt.Printf("- %s\n", approach)
}
fmt.Println("Common causes to investigate:")
for _, cause := range signature.CommonCauses {
fmt.Printf("- %s\n", cause)
}
}
Go’s Execution Model and Performance
Understanding Go’s execution model is crucial for effective performance optimization:
package main
import (
"fmt"
"runtime"
"time"
)
func demonstrateExecutionModel() {
// Show Go's concurrency model
fmt.Printf("CPU cores available: %d\n", runtime.NumCPU())
fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
// Demonstrate goroutine scheduling
runtime.GOMAXPROCS(2) // Limit to 2 OS threads for demonstration
// Create work that will keep CPU busy
go func() {
start := time.Now()
// CPU-bound work
for i := 0; i < 1_000_000_000; i++ {
_ = i * i
}
fmt.Printf("CPU-bound goroutine finished in %v\n", time.Since(start))
}()
go func() {
start := time.Now()
// I/O-bound work (simulated)
for i := 0; i < 10; i++ {
time.Sleep(10 * time.Millisecond) // Simulate I/O wait
}
fmt.Printf("I/O-bound goroutine finished in %v\n", time.Since(start))
}()
// Demonstrate goroutine creation overhead
start := time.Now()
for i := 0; i < 10_000; i++ {
go func() {
// Do minimal work
runtime.Gosched() // Yield to scheduler
}()
}
fmt.Printf("Created 10,000 goroutines in %v\n", time.Since(start))
// Allow time for goroutines to complete
time.Sleep(2 * time.Second)
// Show scheduler statistics
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
fmt.Printf("Number of goroutines: %d\n", runtime.NumGoroutine())
fmt.Printf("Number of GC cycles: %d\n", stats.NumGC)
// Reset GOMAXPROCS
runtime.GOMAXPROCS(runtime.NumCPU())
}
func main() {
demonstrateExecutionModel()
}
CPU Profiling and Analysis
CPU profiling is one of the most powerful techniques for identifying performance bottlenecks in Go applications. Let’s explore how to effectively use Go’s CPU profiling tools.
Setting Up CPU Profiling
Go provides multiple ways to enable CPU profiling:
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
"time"
)
// CPU-intensive function we want to profile
func computeIntensive() {
// Simulate CPU-intensive work
total := 0
for i := 0; i < 100_000_000; i++ {
total += i % 2
}
fmt.Printf("Computation result: %d\n", total)
}
// Method 1: Manual profiling with runtime/pprof
func manualCPUProfiling() {
// Create CPU profile file
f, err := os.Create("cpu_profile.prof")
if err != nil {
log.Fatal("could not create CPU profile: ", err)
}
defer f.Close()
// Start CPU profiling
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
// Run the code we want to profile
fmt.Println("Running CPU-intensive task...")
computeIntensive()
fmt.Println("CPU profile written to cpu_profile.prof")
fmt.Println("Analyze with: go tool pprof cpu_profile.prof")
}
// Method 2: Using net/http/pprof for continuous profiling
func httpPprofDemo() {
// This would typically be added to your main HTTP server
fmt.Println("In a real application, you would import _ \"net/http/pprof\"")
fmt.Println("Then access profiles via http://localhost:8080/debug/pprof/")
fmt.Println("For CPU profile: http://localhost:8080/debug/pprof/profile")
fmt.Println("Download and analyze with: go tool pprof http://localhost:8080/debug/pprof/profile")
}
// Method 3: Using testing package for benchmark profiling
func benchmarkPprofDemo() {
fmt.Println("When running benchmarks, use:")
fmt.Println("go test -bench=. -cpuprofile=cpu.prof ./...")
fmt.Println("Then analyze with: go tool pprof cpu.prof")
}
// Method 4: Using runtime/trace for execution tracing
func traceDemo() {
// Create trace file
f, err := os.Create("trace.out")
if err != nil {
log.Fatal("could not create trace file: ", err)
}
defer f.Close()
// Start tracing
if err := runtime.StartTrace(); err != nil {
log.Fatal("could not start trace: ", err)
}
defer runtime.StopTrace()
// Capture trace data
trace := pprof.Lookup("trace")
trace.WriteTo(f, 0)
// Run the code we want to trace
computeIntensive()
fmt.Println("Trace written to trace.out")
fmt.Println("Analyze with: go tool trace trace.out")
}
func main() {
// Parse command line flags
cpuprofile := flag.String("cpuprofile", "", "write cpu profile to file")
memprofile := flag.String("memprofile", "", "write memory profile to file")
flag.Parse()
// CPU profiling via command line flag
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal("could not create CPU profile: ", err)
}
defer f.Close()
if err := pprof.StartCPUProfile(f); err != nil {
log.Fatal("could not start CPU profile: ", err)
}
defer pprof.StopCPUProfile()
}
// Run the demo functions
manualCPUProfiling()
httpPprofDemo()
benchmarkPprofDemo()
traceDemo()
// Memory profiling via command line flag
if *memprofile != "" {
f, err := os.Create(*memprofile)
if err != nil {
log.Fatal("could not create memory profile: ", err)
}
defer f.Close()
runtime.GC() // Get up-to-date statistics
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal("could not write memory profile: ", err)
}
}
}
Analyzing CPU Profiles
Once you’ve collected a CPU profile, the next step is to analyze it effectively:
package main
import (
"fmt"
"math/rand"
"os"
"runtime/pprof"
"time"
)
// Inefficient string concatenation function
func inefficientStringConcat(n int) string {
result := ""
for i := 0; i < n; i++ {
result += "x"
}
return result
}
// Inefficient sorting algorithm (bubble sort)
func inefficientSort(items []int) {
n := len(items)
for i := 0; i < n; i++ {
for j := 0; j < n-1-i; j++ {
if items[j] > items[j+1] {
items[j], items[j+1] = items[j+1], items[j]
}
}
}
}
// Recursive function with exponential complexity
func fibonacci(n int) int {
if n <= 1 {
return n
}
return fibonacci(n-1) + fibonacci(n-2)
}
// Function with unnecessary allocations
func createLotsOfGarbage(iterations int) {
for i := 0; i < iterations; i++ {
_ = make([]byte, 1024)
}
}
// Main function with various performance issues
func runProfilingDemo() {
// Create CPU profile
f, _ := os.Create("cpu_profile_analysis.prof")
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
// Run inefficient string concatenation
inefficientStringConcat(10000)
// Run inefficient sorting
data := make([]int, 5000)
for i := range data {
data[i] = rand.Intn(10000)
}
inefficientSort(data)
// Run exponential algorithm
fibonacci(30)
// Run allocation-heavy code
createLotsOfGarbage(100000)
}
func main() {
// Seed random number generator
rand.Seed(time.Now().UnixNano())
// Run the demo
runProfilingDemo()
fmt.Println("Profile generated. Analyze with:")
fmt.Println("go tool pprof -http=:8080 cpu_profile_analysis.prof")
fmt.Println("Common pprof commands:")
fmt.Println(" top10 - Show top 10 functions by CPU usage")
fmt.Println(" list functionName - Show source code with CPU usage")
fmt.Println(" web - Generate SVG graph visualization")
fmt.Println(" traces - Show execution traces")
fmt.Println(" peek regex - Show functions matching regex")
}
Optimizing CPU-Bound Code
After identifying CPU bottlenecks, here are techniques to optimize them:
package main
import (
"fmt"
"math/rand"
"strings"
"time"
)
// BEFORE: Inefficient string concatenation
func inefficientStringConcat(n int) string {
result := ""
for i := 0; i < n; i++ {
result += "x"
}
return result
}
// AFTER: Optimized string concatenation
func efficientStringConcat(n int) string {
var builder strings.Builder
builder.Grow(n) // Pre-allocate capacity
for i := 0; i < n; i++ {
builder.WriteByte('x')
}
return builder.String()
}
// BEFORE: Inefficient sorting (bubble sort)
func inefficientSort(items []int) {
n := len(items)
for i := 0; i < n; i++ {
for j := 0; j < n-1-i; j++ {
if items[j] > items[j+1] {
items[j], items[j+1] = items[j+1], items[j]
}
}
}
}
// AFTER: Optimized sorting (quicksort)
func efficientSort(items []int) {
quicksort(items, 0, len(items)-1)
}
func quicksort(items []int, low, high int) {
if low < high {
pivot := partition(items, low, high)
quicksort(items, low, pivot-1)
quicksort(items, pivot+1, high)
}
}
func partition(items []int, low, high int) int {
pivot := items[high]
i := low - 1
for j := low; j < high; j++ {
if items[j] <= pivot {
i++
items[i], items[j] = items[j], items[i]
}
}
items[i+1], items[high] = items[high], items[i+1]
return i + 1
}
// BEFORE: Recursive fibonacci with exponential complexity
func inefficientFibonacci(n int) int {
if n <= 1 {
return n
}
return inefficientFibonacci(n-1) + inefficientFibonacci(n-2)
}
// AFTER: Iterative fibonacci with linear complexity
func efficientFibonacci(n int) int {
if n <= 1 {
return n
}
a, b := 0, 1
for i := 2; i <= n; i++ {
a, b = b, a+b
}
return b
}
// BEFORE: Function with unnecessary allocations
func inefficientProcessing(data []int) []int {
result := make([]int, 0)
for _, v := range data {
// Create a new slice for each operation
temp := make([]int, 1)
temp[0] = v * v
result = append(result, temp...)
}
return result
}
// AFTER: Function with optimized allocations
func efficientProcessing(data []int) []int {
// Pre-allocate the result slice
result := make([]int, len(data))
for i, v := range data {
// Direct assignment, no temporary allocations
result[i] = v * v
}
return result
}
// Benchmark helper
func benchmark(name string, iterations int, fn func()) {
start := time.Now()
for i := 0; i < iterations; i++ {
fn()
}
elapsed := time.Since(start)
fmt.Printf("%-30s %10d iterations in %s (%s per op)\n",
name, iterations, elapsed, elapsed/time.Duration(iterations))
}
func main() {
// Seed random number generator
rand.Seed(time.Now().UnixNano())
// Benchmark string concatenation
benchmark("Inefficient string concat", 100, func() {
inefficientStringConcat(10000)
})
benchmark("Efficient string concat", 100, func() {
efficientStringConcat(10000)
})
// Benchmark sorting
benchmark("Inefficient sorting", 10, func() {
data := make([]int, 5000)
for i := range data {
data[i] = rand.Intn(10000)
}
inefficientSort(data)
})
benchmark("Efficient sorting", 10, func() {
data := make([]int, 5000)
for i := range data {
data[i] = rand.Intn(10000)
}
efficientSort(data)
})
// Benchmark fibonacci
benchmark("Inefficient fibonacci", 5, func() {
inefficientFibonacci(30)
})
benchmark("Efficient fibonacci", 5, func() {
efficientFibonacci(30)
})
// Benchmark processing
data := make([]int, 100000)
for i := range data {
data[i] = rand.Intn(100)
}
benchmark("Inefficient processing", 10, func() {
inefficientProcessing(data)
})
benchmark("Efficient processing", 10, func() {
efficientProcessing(data)
})
}
Memory Profiling and Optimization
Memory usage is often a critical factor in Go application performance, especially as it relates to garbage collection overhead.
Memory Profiling Techniques
Here’s how to effectively profile memory usage in Go applications:
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime"
"runtime/pprof"
)
// Function that allocates memory
func allocateMemory() {
// Allocate a large slice that will stay in memory
data := make([]byte, 100*1024*1024) // 100 MB
// Do something with the data to prevent compiler optimizations
for i := range data {
data[i] = byte(i % 256)
}
// Keep a reference to prevent garbage collection
// In a real app, this might be stored in a global variable or cache
keepReference(data)
}
// Prevent compiler from optimizing away our allocations
var reference []byte
func keepReference(data []byte) {
reference = data
}
// Function that leaks memory
func leakMemory() {
// This function simulates a memory leak by storing data in a global slice
// without ever cleaning it up
for i := 0; i < 1000; i++ {
// Each iteration leaks 1MB
leak := make([]byte, 1024*1024)
for j := range leak {
leak[j] = byte(j % 256)
}
leakedData = append(leakedData, leak)
}
}
// Global variable to simulate a memory leak
var leakedData [][]byte
func main() {
// Parse command line flags
memprofile := flag.String("memprofile", "", "write memory profile to file")
leakMode := flag.Bool("leak", false, "demonstrate memory leak")
flag.Parse()
// Allocate memory
allocateMemory()
// Optionally demonstrate a memory leak
if *leakMode {
fmt.Println("Demonstrating memory leak...")
leakMemory()
}
// Force garbage collection to get accurate memory stats
runtime.GC()
// Print memory statistics
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc = %v MiB", bToMb(m.Alloc))
fmt.Printf("\tTotalAlloc = %v MiB", bToMb(m.TotalAlloc))
fmt.Printf("\tSys = %v MiB", bToMb(m.Sys))
fmt.Printf("\tNumGC = %v\n", m.NumGC)
// Write memory profile if requested
if *memprofile != "" {
f, err := os.Create(*memprofile)
if err != nil {
log.Fatal("could not create memory profile: ", err)
}
defer f.Close()
// Write memory profile
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal("could not write memory profile: ", err)
}
fmt.Printf("Memory profile written to %s\n", *memprofile)
fmt.Println("Analyze with: go tool pprof -http=:8080", *memprofile)
}
// Show how to use net/http/pprof for continuous memory profiling
fmt.Println("\nFor continuous memory profiling in a web server:")
fmt.Println("1. Import _ \"net/http/pprof\"")
fmt.Println("2. Access heap profile at: http://localhost:8080/debug/pprof/heap")
fmt.Println("3. Download and analyze with: go tool pprof http://localhost:8080/debug/pprof/heap")
// Show how to use testing package for benchmark memory profiling
fmt.Println("\nFor memory profiling during benchmarks:")
fmt.Println("go test -bench=. -memprofile=mem.prof ./...")
fmt.Println("Then analyze with: go tool pprof -http=:8080 mem.prof")
}
// Convert bytes to megabytes
func bToMb(b uint64) uint64 {
return b / 1024 / 1024
}
Detecting and Fixing Memory Leaks
Memory leaks in Go often manifest as growing heap usage over time. Here’s how to detect and fix them:
package main
import (
"fmt"
"net/http"
_ "net/http/pprof" // Import for side effects
"os"
"runtime"
"time"
)
// Cache that doesn't clean up old entries (leak)
type LeakyCache struct {
data map[string][]byte
// Missing: expiration mechanism
}
func NewLeakyCache() *LeakyCache {
return &LeakyCache{
data: make(map[string][]byte),
}
}
func (c *LeakyCache) Set(key string, value []byte) {
c.data[key] = value
}
func (c *LeakyCache) Get(key string) []byte {
return c.data[key]
}
// Fixed cache with expiration
type FixedCache struct {
data map[string]cacheEntry
maxEntries int
}
type cacheEntry struct {
value []byte
expiration time.Time
}
func NewFixedCache(maxEntries int) *FixedCache {
cache := &FixedCache{
data: make(map[string]cacheEntry),
maxEntries: maxEntries,
}
// Start cleanup goroutine
go func() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
cache.cleanup()
}
}()
return cache
}
func (c *FixedCache) Set(key string, value []byte, ttl time.Duration) {
// Enforce maximum entries
if len(c.data) >= c.maxEntries {
c.evictOldest()
}
c.data[key] = cacheEntry{
value: value,
expiration: time.Now().Add(ttl),
}
}
func (c *FixedCache) Get(key string) ([]byte, bool) {
entry, found := c.data[key]
if !found {
return nil, false
}
// Check if expired
if time.Now().After(entry.expiration) {
delete(c.data, key)
return nil, false
}
return entry.value, true
}
func (c *FixedCache) cleanup() {
now := time.Now()
for key, entry := range c.data {
if now.After(entry.expiration) {
delete(c.data, key)
}
}
}
func (c *FixedCache) evictOldest() {
var oldestKey string
var oldestTime time.Time
// Find the oldest entry
first := true
for key, entry := range c.data {
if first || entry.expiration.Before(oldestTime) {
oldestKey = key
oldestTime = entry.expiration
first = false
}
}
// Remove oldest entry
if oldestKey != "" {
delete(c.data, oldestKey)
}
}
// Simulate a memory leak
func simulateMemoryLeak() {
// Create a leaky cache
cache := NewLeakyCache()
// Start HTTP server for pprof
go func() {
fmt.Println("Starting pprof server on :8080")
http.ListenAndServe(":8080", nil)
}()
// Continuously add data to the cache without cleanup
for i := 0; ; i++ {
key := fmt.Sprintf("key-%d", i)
value := make([]byte, 1024*1024) // 1MB per entry
cache.Set(key, value)
// Print memory stats every 100 iterations
if i%100 == 0 {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Iteration %d: Alloc = %v MiB, Sys = %v MiB\n",
i, bToMb(m.Alloc), bToMb(m.Sys))
time.Sleep(100 * time.Millisecond)
}
// Exit after some iterations in this example
if i >= 1000 {
break
}
}
}
// Demonstrate fixed cache
func demonstrateFixedCache() {
// Create a fixed cache with expiration
cache := NewFixedCache(100) // Max 100 entries
// Add data with expiration
for i := 0; i < 200; i++ {
key := fmt.Sprintf("key-%d", i)
value := make([]byte, 1024*1024) // 1MB per entry
cache.Set(key, value, 1*time.Minute)
// Print memory stats every 50 iterations
if i%50 == 0 {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Iteration %d: Alloc = %v MiB, Sys = %v MiB, Entries: %d\n",
i, bToMb(m.Alloc), bToMb(m.Sys), len(cache.data))
time.Sleep(100 * time.Millisecond)
}
}
// Force GC to see the effect
runtime.GC()
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("After GC: Alloc = %v MiB, Sys = %v MiB, Entries: %d\n",
bToMb(m.Alloc), bToMb(m.Sys), len(cache.data))
}
func main() {
if len(os.Args) > 1 && os.Args[1] == "leak" {
fmt.Println("Simulating memory leak...")
simulateMemoryLeak()
} else {
fmt.Println("Demonstrating fixed cache...")
demonstrateFixedCache()
}
fmt.Println("\nTo analyze memory usage:")
fmt.Println("1. Run with leak simulation: go run main.go leak")
fmt.Println("2. In another terminal: go tool pprof -http=:8081 http://localhost:8080/debug/pprof/heap")
}
Optimizing for Garbage Collection
Understanding and optimizing for Go’s garbage collector can significantly improve application performance:
package main
import (
"fmt"
"runtime"
"runtime/debug"
"time"
)
// Function that generates a lot of garbage
func generateGarbage() {
for i := 0; i < 10; i++ {
// Allocate 100MB
_ = make([]byte, 100*1024*1024)
}
}
// Function that demonstrates GC tuning
func demonstrateGCTuning() {
// Print initial GC settings
fmt.Println("Default GC settings:")
printGCStats()
// Run with default settings
fmt.Println("\nRunning with default GC settings...")
measureGCPause(func() {
generateGarbage()
})
// Tune GC to be less aggressive (higher memory usage, fewer GC cycles)
fmt.Println("\nSetting higher GC percentage (less frequent GC)...")
debug.SetGCPercent(500) // Default is 100
printGCStats()
// Run with tuned settings
fmt.Println("\nRunning with tuned GC settings...")
measureGCPause(func() {
generateGarbage()
})
// Reset GC to default
debug.SetGCPercent(100)
// Demonstrate manual GC control
fmt.Println("\nDemonstrating manual GC control...")
// Disable GC temporarily
fmt.Println("Disabling GC...")
debug.SetGCPercent(-1)
// Allocate memory without GC
fmt.Println("Allocating memory with GC disabled...")
for i := 0; i < 5; i++ {
_ = make([]byte, 100*1024*1024)
printMemStats()
time.Sleep(100 * time.Millisecond)
}
// Manually trigger GC
fmt.Println("\nManually triggering GC...")
runtime.GC()
printMemStats()
// Re-enable GC
fmt.Println("\nRe-enabling GC...")
debug.SetGCPercent(100)
printGCStats()
}
// Function to measure GC pause times
func measureGCPause(fn func()) {
// Get initial GC stats
var statsBefore runtime.MemStats
runtime.ReadMemStats(&statsBefore)
numGCBefore := statsBefore.NumGC
// Run the function
start := time.Now()
fn()
elapsed := time.Since(start)
// Force a GC to get accurate stats
runtime.GC()
// Get GC stats after
var statsAfter runtime.MemStats
runtime.ReadMemStats(&statsAfter)
// Calculate GC stats
numGC := statsAfter.NumGC - numGCBefore
totalPause := time.Duration(0)
// Calculate total pause time
// Note: PauseNs is a circular buffer of recent GC pause times
for i := numGCBefore; i < statsAfter.NumGC; i++ {
idx := i % uint32(len(statsAfter.PauseNs))
totalPause += time.Duration(statsAfter.PauseNs[idx])
}
// Print results
fmt.Printf("Execution time: %v\n", elapsed)
fmt.Printf("Number of GCs: %d\n", numGC)
if numGC > 0 {
fmt.Printf("Total GC pause: %v\n", totalPause)
fmt.Printf("Average GC pause: %v\n", totalPause/time.Duration(numGC))
fmt.Printf("GC pause percentage: %.2f%%\n",
float64(totalPause)/float64(elapsed)*100)
}
}
// Print current GC settings
func printGCStats() {
fmt.Printf("GC Percentage: %d%%\n", debug.SetGCPercent(debug.SetGCPercent(-1)))
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Next GC target: %v MiB\n", bToMb(m.NextGC))
}
// Print memory statistics
func printMemStats() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc: %v MiB, Sys: %v MiB, NumGC: %v\n",
bToMb(m.Alloc), bToMb(m.Sys), m.NumGC)
}
func main() {
// Set max memory to show GC behavior more clearly
debug.SetMemoryLimit(1024 * 1024 * 1024) // 1GB
// Demonstrate GC tuning
demonstrateGCTuning()
fmt.Println("\nGC Tuning Best Practices:")
fmt.Println("1. Monitor GC frequency and pause times in production")
fmt.Println("2. Use GOGC environment variable for system-wide tuning")
fmt.Println("3. Consider debug.SetGCPercent for application-specific tuning")
fmt.Println("4. For latency-sensitive applications, consider increasing GOGC")
fmt.Println("5. For memory-constrained environments, consider decreasing GOGC")
}
Advanced Profiling Techniques
Beyond basic CPU and memory profiling, Go offers several specialized profiling tools for specific performance issues.
Goroutine Profiling
Goroutine profiling helps identify concurrency issues and goroutine leaks:
package main
import (
"fmt"
"net/http"
_ "net/http/pprof"
"runtime"
"sync"
"time"
)
// Global WaitGroup that we'll intentionally never finish
var wg sync.WaitGroup
// Function that leaks goroutines
func leakGoroutines() {
fmt.Println("Starting to leak goroutines...")
// Leak goroutines by never calling wg.Done()
for i := 0; i < 10000; i++ {
wg.Add(1)
go func(id int) {
// This goroutine will never exit because we never call wg.Done()
fmt.Printf("Goroutine %d is leaked\n", id)
time.Sleep(1 * time.Hour) // Sleep for a long time
// wg.Done() is never called
}(i)
// Print stats every 1000 goroutines
if i > 0 && i%1000 == 0 {
fmt.Printf("Created %d goroutines, total: %d\n", i, runtime.NumGoroutine())
time.Sleep(100 * time.Millisecond)
}
}
}
// Function that demonstrates proper goroutine management
func properGoroutineManagement() {
fmt.Println("Demonstrating proper goroutine management...")
var localWg sync.WaitGroup
// Create goroutines with proper cleanup
for i := 0; i < 10000; i++ {
localWg.Add(1)
go func(id int) {
defer localWg.Done() // Ensure we always mark as done
// Do some work
time.Sleep(10 * time.Millisecond)
}(i)
// Print stats every 1000 goroutines
if i > 0 && i%1000 == 0 {
fmt.Printf("Created %d goroutines, total: %d\n", i, runtime.NumGoroutine())
}
}
// Wait for all goroutines to finish
fmt.Println("Waiting for all goroutines to finish...")
localWg.Wait()
fmt.Printf("All goroutines finished, remaining: %d\n", runtime.NumGoroutine())
}
// Function that demonstrates goroutine blocking
func demonstrateGoroutineBlocking() {
fmt.Println("Demonstrating goroutine blocking...")
// Create a channel without a buffer
ch := make(chan int)
// Start goroutines that will block on the channel
for i := 0; i < 100; i++ {
go func(id int) {
fmt.Printf("Goroutine %d waiting to send...\n", id)
ch <- id // This will block until someone receives
fmt.Printf("Goroutine %d sent value\n", id)
}(i)
}
// Let goroutines block for a while
time.Sleep(1 * time.Second)
fmt.Printf("After blocking: %d goroutines\n", runtime.NumGoroutine())
// Receive from some goroutines to unblock them
for i := 0; i < 10; i++ {
fmt.Printf("Received: %d\n", <-ch)
}
fmt.Printf("After receiving 10 values: %d goroutines\n", runtime.NumGoroutine())
}
func main() {
// Start pprof server
go func() {
fmt.Println("Starting pprof server on :8080")
http.ListenAndServe(":8080", nil)
}()
// Wait for pprof server to start
time.Sleep(100 * time.Millisecond)
// Print initial goroutine count
fmt.Printf("Initial goroutine count: %d\n", runtime.NumGoroutine())
// Demonstrate proper goroutine management
properGoroutineManagement()
// Demonstrate goroutine blocking
demonstrateGoroutineBlocking()
// Leak goroutines (comment out to avoid actual leaking)
// leakGoroutines()
fmt.Println("\nTo analyze goroutines:")
fmt.Println("1. View goroutine profile: go tool pprof http://localhost:8080/debug/pprof/goroutine")
fmt.Println("2. Get text listing: curl http://localhost:8080/debug/pprof/goroutine?debug=1")
fmt.Println("3. View full goroutine stack traces: curl http://localhost:8080/debug/pprof/goroutine?debug=2")
// Keep the program running to allow pprof access
fmt.Println("\nPress Ctrl+C to exit")
select {}
}
Mutex Profiling
Mutex profiling helps identify lock contention issues:
package main
import (
"fmt"
"net/http"
_ "net/http/pprof"
"runtime"
"sync"
"time"
)
// Global mutex for demonstration
var globalMutex sync.Mutex
// Counter protected by mutex
var counter int
// Function with high mutex contention
func highContentionFunction() {
var wg sync.WaitGroup
// Enable mutex profiling
runtime.SetMutexProfileFraction(5) // 1/5 of mutex events are sampled
fmt.Println("Running high contention scenario...")
// Create many goroutines that all try to access the same mutex
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Each goroutine acquires and releases the mutex 1000 times
for j := 0; j < 1000; j++ {
globalMutex.Lock()
counter++
globalMutex.Unlock()
}
}()
}
wg.Wait()
fmt.Printf("High contention counter: %d\n", counter)
}
// Function with optimized locking strategy
func lowContentionFunction() {
var wg sync.WaitGroup
counter = 0
fmt.Println("Running low contention scenario...")
// Create many goroutines with local counters
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Each goroutine maintains its own counter
localCounter := 0
for j := 0; j < 1000; j++ {
localCounter++
}
// Only lock once at the end to update the global counter
globalMutex.Lock()
counter += localCounter
globalMutex.Unlock()
}()
}
wg.Wait()
fmt.Printf("Low contention counter: %d\n", counter)
}
// Function demonstrating read-write mutex
func readWriteMutexDemo() {
var rwMutex sync.RWMutex
var data = make(map[int]int)
counter = 0
var wg sync.WaitGroup
fmt.Println("Running read-write mutex scenario...")
// Start writer goroutines (fewer)
for i := 0; i < 5; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
// Writers need exclusive access
for j := 0; j < 100; j++ {
rwMutex.Lock()
data[j] = id
rwMutex.Unlock()
// Simulate some processing time
time.Sleep(1 * time.Millisecond)
}
}(i)
}
// Start reader goroutines (many more)
for i := 0; i < 100; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// Readers can access concurrently
for j := 0; j < 1000; j++ {
rwMutex.RLock()
_ = data[j%100]
rwMutex.RUnlock()
// Count reads
globalMutex.Lock()
counter++
globalMutex.Unlock()
}
}()
}
wg.Wait()
fmt.Printf("Read-write mutex: %d reads performed\n", counter)
}
func main() {
// Start pprof server
go func() {
fmt.Println("Starting pprof server on :8080")
http.ListenAndServe(":8080", nil)
}()
// Wait for pprof server to start
time.Sleep(100 * time.Millisecond)
// Run high contention scenario
highContentionFunction()
// Run low contention scenario
lowContentionFunction()
// Run read-write mutex scenario
readWriteMutexDemo()
fmt.Println("\nTo analyze mutex contention:")
fmt.Println("1. View mutex profile: go tool pprof http://localhost:8080/debug/pprof/mutex")
fmt.Println("2. Get text listing: curl http://localhost:8080/debug/pprof/mutex?debug=1")
// Keep the program running to allow pprof access
fmt.Println("\nPress Ctrl+C to exit")
select {}
}
Block Profiling
Block profiling helps identify where goroutines spend time waiting:
package main
import (
"fmt"
"net/http"
_ "net/http/pprof"
"os"
"runtime"
"runtime/pprof"
"sync"
"time"
)
// Function demonstrating channel blocking
func channelBlockingDemo() {
fmt.Println("Running channel blocking demo...")
// Create a channel with small buffer
ch := make(chan int, 5)
var wg sync.WaitGroup
// Producer goroutines
for i := 0; i < 10; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := 0; j < 100; j++ {
// This will block when the channel is full
ch <- id*1000 + j
time.Sleep(1 * time.Millisecond)
}
}(i)
}
// Consumer goroutine - intentionally slow
wg.Add(1)
go func() {
defer wg.Done()
for i := 0; i < 1000; i++ {
val := <-ch
fmt.Printf("Received: %d\n", val)
time.Sleep(5 * time.Millisecond) // Slower than producers
}
}()
wg.Wait()
}
// Function demonstrating I/O blocking
func ioBlockingDemo() {
fmt.Println("Running I/O blocking demo...")
var wg sync.WaitGroup
// Create temporary files
tempFiles := make([]*os.File, 5)
for i := range tempFiles {
file, err := os.CreateTemp("", "block-profile-demo")
if err != nil {
fmt.Printf("Error creating temp file: %v\n", err)
continue
}
defer os.Remove(file.Name())
defer file.Close()
tempFiles[i] = file
}
// Goroutines performing file I/O
for i := 0; i < 10; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
// Choose a file
fileIndex := id % len(tempFiles)
file := tempFiles[fileIndex]
// Write data to file (may block)
data := make([]byte, 1024*1024) // 1MB
for j := range data {
data[j] = byte(id)
}
for j := 0; j < 10; j++ {
_, err := file.Write(data)
if err != nil {
fmt.Printf("Error writing to file: %v\n", err)
}
// Sync to disk (will block)
file.Sync()
}
}(i)
}
wg.Wait()
}
// Function demonstrating mutex blocking
func mutexBlockingDemo() {
fmt.Println("Running mutex blocking demo...")
var mu sync.Mutex
var wg sync.WaitGroup
// Goroutine that holds the lock for a long time
wg.Add(1)
go func() {
defer wg.Done()
for i := 0; i < 5; i++ {
mu.Lock()
fmt.Println("Long operation has the lock")
time.Sleep(100 * time.Millisecond) // Hold lock for a long time
mu.Unlock()
// Give other goroutines a chance
time.Sleep(10 * time.Millisecond)
}
}()
// Goroutines that need the lock frequently
for i := 0; i < 10; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
for j := 0; j < 10; j++ {
mu.Lock() // Will block waiting for the long operation
fmt.Printf("Goroutine %d got the lock\n", id)
time.Sleep(1 * time.Millisecond) // Short operation
mu.Unlock()
// Do some work without the lock
time.Sleep(5 * time.Millisecond)
}
}(i)
}
wg.Wait()
}
func main() {
// Enable block profiling
runtime.SetBlockProfileRate(1) // Sample every blocking event
// Start pprof server
go func() {
fmt.Println("Starting pprof server on :8080")
http.ListenAndServe(":8080", nil)
}()
// Wait for pprof server to start
time.Sleep(100 * time.Millisecond)
// Run demos
channelBlockingDemo()
ioBlockingDemo()
mutexBlockingDemo()
// Save block profile
f, err := os.Create("block.prof")
if err != nil {
fmt.Printf("Error creating profile file: %v\n", err)
} else {
pprof.Lookup("block").WriteTo(f, 0)
f.Close()
fmt.Println("Block profile written to block.prof")
}
fmt.Println("\nTo analyze blocking:")
fmt.Println("1. View block profile: go tool pprof http://localhost:8080/debug/pprof/block")
fmt.Println("2. Analyze saved profile: go tool pprof -http=:8081 block.prof")
// Keep the program running to allow pprof access
fmt.Println("\nPress Ctrl+C to exit")
select {}
}
Trace Analysis
Go’s execution tracer provides detailed insights into runtime behavior:
package main
import (
"context"
"fmt"
"os"
"runtime/trace"
"sync"
"time"
)
// Function demonstrating various activities for tracing
func runTracedActivities() {
// Create a trace file
f, err := os.Create("trace.out")
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to create trace file: %v\n", err)
return
}
defer f.Close()
// Start tracing
if err := trace.Start(f); err != nil {
fmt.Fprintf(os.Stderr, "Failed to start trace: %v\n", err)
return
}
defer trace.Stop()
// Create a context for tracing
ctx := context.Background()
// Trace a simple function
ctx, task := trace.NewTask(ctx, "main")
defer task.End()
// Trace a region within a function
trace.WithRegion(ctx, "initialization", func() {
fmt.Println("Initializing...")
time.Sleep(10 * time.Millisecond)
})
// Trace concurrent work
var wg sync.WaitGroup
// Start multiple goroutines with traced regions
for i := 0; i < 10; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
// Create a task for this goroutine
_, goroutineTask := trace.NewTask(ctx, fmt.Sprintf("goroutine-%d", id))
defer goroutineTask.End()
// Log an event
trace.Log(ctx, "goroutine-start", fmt.Sprintf("id=%d", id))
// Simulate different phases of work
trace.WithRegion(ctx, "computation", func() {
// CPU-bound work
sum := 0
for j := 0; j < 1000000; j++ {
sum += j
}
trace.Log(ctx, "computation-result", fmt.Sprintf("sum=%d", sum))
})
trace.WithRegion(ctx, "io-simulation", func() {
// Simulate I/O
time.Sleep(time.Duration(id+1) * 10 * time.Millisecond)
})
trace.Log(ctx, "goroutine-end", fmt.Sprintf("id=%d", id))
}(i)
}
// Demonstrate blocking on channel communication
trace.WithRegion(ctx, "channel-communication", func() {
ch := make(chan int)
// Sender
go func() {
senderCtx, task := trace.NewTask(ctx, "sender")
defer task.End()
trace.Log(senderCtx, "send-start", "preparing to send")
time.Sleep(20 * time.Millisecond) // Simulate preparation
trace.WithRegion(senderCtx, "send-operation", func() {
ch <- 42 // This will block until receiver is ready
})
trace.Log(senderCtx, "send-complete", "value sent")
}()
// Receiver (intentionally delayed)
go func() {
receiverCtx, task := trace.NewTask(ctx, "receiver")
defer task.End()
trace.Log(receiverCtx, "receive-delay", "waiting before receiving")
time.Sleep(100 * time.Millisecond) // Delay to demonstrate blocking
trace.WithRegion(receiverCtx, "receive-operation", func() {
val := <-ch
trace.Log(receiverCtx, "received-value", fmt.Sprintf("%d", val))
})
}()
})
// Wait for all goroutines to complete
wg.Wait()
}
func main() {
runTracedActivities()
fmt.Println("Trace complete. Analyze with:")
fmt.Println("go tool trace trace.out")
fmt.Println("\nTrace tool features:")
fmt.Println("1. Timeline view of goroutine execution")
fmt.Println("2. Synchronization events (channel operations, mutex locks)")
fmt.Println("3. System events (GC, scheduler)")
fmt.Println("4. User-defined regions and events")
}
Code-Level Optimization Strategies
Beyond profiling, there are numerous code-level optimizations that can significantly improve Go application performance.
Memory Allocation Optimization
Reducing memory allocations is one of the most effective ways to improve performance:
package main
import (
"fmt"
"runtime"
"strings"
"testing"
"time"
)
// BEFORE: Creates a new slice for each call
func inefficientAppend(slice []int, value int) []int {
return append(slice, value)
}
// AFTER: Preallocates capacity to avoid reallocations
func efficientAppend(slice []int, values ...int) []int {
if cap(slice)-len(slice) < len(values) {
// Need to reallocate
newSlice := make([]int, len(slice), len(slice)+len(values)+100) // Extra capacity
copy(newSlice, slice)
slice = newSlice
}
return append(slice, values...)
}
// BEFORE: Creates many small allocations
func inefficientStringJoin(items []string) string {
result := ""
for _, item := range items {
result += item + ","
}
return result
}
// AFTER: Uses strings.Builder to minimize allocations
func efficientStringJoin(items []string) string {
var builder strings.Builder
builder.Grow(len(items) * 8) // Estimate size
for i, item := range items {
builder.WriteString(item)
if i < len(items)-1 {
builder.WriteByte(',')
}
}
return builder.String()
}
// BEFORE: Allocates a map for each call
func inefficientCounter(text string) map[rune]int {
counts := make(map[rune]int)
for _, char := range text {
counts[char]++
}
return counts
}
// AFTER: Reuses a map to avoid allocations
func efficientCounter(text string, counts map[rune]int) {
// Clear the map
for k := range counts {
delete(counts, k)
}
// Count characters
for _, char := range text {
counts[char]++
}
}
// Benchmark helper
func benchmarkAllocation(b *testing.B, f func()) {
// Warm up
for i := 0; i < 5; i++ {
f()
}
// Reset memory stats
runtime.GC()
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
allocsBefore := stats.TotalAlloc
// Run benchmark
start := time.Now()
for i := 0; i < b.N; i++ {
f()
}
elapsed := time.Since(start)
// Get memory stats
runtime.ReadMemStats(&stats)
allocsAfter := stats.TotalAlloc
// Print results
fmt.Printf("Time: %v, Allocations: %v bytes\n", elapsed, allocsAfter-allocsBefore)
}
func main() {
// Benchmark slice append
fmt.Println("Slice append benchmark:")
benchmarkAllocation(testing.B{N: 10000}, func() {
slice := make([]int, 0)
for i := 0; i < 1000; i++ {
slice = inefficientAppend(slice, i)
}
})
benchmarkAllocation(testing.B{N: 10000}, func() {
slice := make([]int, 0, 1000) // Preallocate
for i := 0; i < 1000; i++ {
slice = append(slice, i) // Direct append is better
}
})
// Benchmark string join
fmt.Println("\nString join benchmark:")
items := make([]string, 1000)
for i := range items {
items[i] = fmt.Sprintf("item-%d", i)
}
benchmarkAllocation(testing.B{N: 100}, func() {
_ = inefficientStringJoin(items)
})
benchmarkAllocation(testing.B{N: 100}, func() {
_ = efficientStringJoin(items)
})
benchmarkAllocation(testing.B{N: 100}, func() {
_ = strings.Join(items, ",") // Built-in is even better
})
// Benchmark counter
fmt.Println("\nCounter benchmark:")
text := strings.Repeat("Go is a great language! ", 100)
benchmarkAllocation(testing.B{N: 1000}, func() {
_ = inefficientCounter(text)
})
benchmarkAllocation(testing.B{N: 1000}, func() {
counts := make(map[rune]int)
efficientCounter(text, counts)
})
}
Compiler Optimizations and Build Flags
Understanding Go’s compiler optimizations can help you write more efficient code:
package main
import (
"fmt"
"os"
"runtime"
"strings"
)
// Function that may be inlined by the compiler
func add(a, b int) int {
return a + b
}
// Function that's too complex to inline
func complexFunction(a, b int) int {
result := 0
for i := 0; i < a; i++ {
if i%2 == 0 {
result += i * b
} else {
result -= i * b
}
if result > 1000 {
result = 1000
}
}
return result
}
// Function with bounds check elimination opportunity
func sumArray(arr []int) int {
sum := 0
for i := 0; i < len(arr); i++ {
sum += arr[i]
}
return sum
}
// Function that demonstrates escape analysis
func createOnStack() int {
x := 42 // Will be allocated on stack
return x
}
func createOnHeap() *int {
x := 42 // Will escape to heap
return &x
}
func main() {
// Print Go version and compiler info
fmt.Printf("Go version: %s\n", runtime.Version())
fmt.Printf("GOOS: %s, GOARCH: %s\n", runtime.GOOS, runtime.GOARCH)
fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
// Print build flags
fmt.Println("\nBuild flags:")
fmt.Println("To enable inlining and bounds check elimination:")
fmt.Println("go build -gcflags=\"-m\" main.go")
fmt.Println("\nTo disable optimizations (for debugging):")
fmt.Println("go build -gcflags=\"-N -l\" main.go")
fmt.Println("\nTo see assembly output:")
fmt.Println("go build -gcflags=\"-S\" main.go")
// Demonstrate compiler optimizations
fmt.Println("\nCompiler optimization examples:")
// Function inlining
result := 0
for i := 0; i < 1000000; i++ {
result = add(result, i) // Likely to be inlined
}
fmt.Printf("Inlined function result: %d\n", result)
// Complex function (not inlined)
result = complexFunction(100, 5)
fmt.Printf("Complex function result: %d\n", result)
// Bounds check elimination
arr := make([]int, 1000)
for i := range arr {
arr[i] = i
}
sum := sumArray(arr)
fmt.Printf("Array sum: %d\n", sum)
// Escape analysis
stackVal := createOnStack()
heapVal := createOnHeap()
fmt.Printf("Stack value: %d, Heap value: %d\n", stackVal, *heapVal)
// Print instructions for viewing escape analysis
fmt.Println("\nTo view escape analysis decisions:")
fmt.Println("go build -gcflags=\"-m -m\" main.go")
// Print instructions for benchmarking with different flags
fmt.Println("\nTo benchmark with different compiler flags:")
fmt.Println("go test -bench=. -benchmem -gcflags=\"-N -l\" ./...")
fmt.Println("go test -bench=. -benchmem ./...")
}
Production Performance Monitoring
Monitoring performance in production is essential for maintaining optimal application efficiency over time.
Continuous Profiling Setup
Setting up continuous profiling allows you to monitor performance in production:
package main
import (
"fmt"
"log"
"net/http"
_ "net/http/pprof" // Import for side effects
"os"
"runtime"
"runtime/pprof"
"time"
)
// Configuration for profiling
type ProfilingConfig struct {
EnableHTTPProfiling bool
CPUProfileInterval time.Duration
MemProfileInterval time.Duration
BlockProfileRate int
MutexProfileRate int
OutputDir string
}
// ProfileManager handles periodic profiling
type ProfileManager struct {
config ProfilingConfig
stopCh chan struct{}
}
// NewProfileManager creates a new profile manager
func NewProfileManager(config ProfilingConfig) *ProfileManager {
return &ProfileManager{
config: config,
stopCh: make(chan struct{}),
}
}
// Start begins the profiling routines
func (pm *ProfileManager) Start() {
// Create output directory if it doesn't exist
if pm.config.OutputDir != "" {
if err := os.MkdirAll(pm.config.OutputDir, 0755); err != nil {
log.Printf("Failed to create profile output directory: %v", err)
return
}
}
// Configure profiling rates
if pm.config.BlockProfileRate > 0 {
runtime.SetBlockProfileRate(pm.config.BlockProfileRate)
}
if pm.config.MutexProfileRate > 0 {
runtime.SetMutexProfileFraction(pm.config.MutexProfileRate)
}
// Start HTTP server for pprof if enabled
if pm.config.EnableHTTPProfiling {
go func() {
log.Println("Starting pprof HTTP server on :6060")
if err := http.ListenAndServe(":6060", nil); err != nil {
log.Printf("pprof HTTP server failed: %v", err)
}
}()
}
// Start periodic CPU profiling if enabled
if pm.config.CPUProfileInterval > 0 {
go pm.startPeriodicCPUProfiling()
}
// Start periodic memory profiling if enabled
if pm.config.MemProfileInterval > 0 {
go pm.startPeriodicMemProfiling()
}
}
// Stop stops all profiling routines
func (pm *ProfileManager) Stop() {
close(pm.stopCh)
}
// startPeriodicCPUProfiling captures CPU profiles at regular intervals
func (pm *ProfileManager) startPeriodicCPUProfiling() {
ticker := time.NewTicker(pm.config.CPUProfileInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
pm.captureCPUProfile()
case <-pm.stopCh:
return
}
}
}
// startPeriodicMemProfiling captures memory profiles at regular intervals
func (pm *ProfileManager) startPeriodicMemProfiling() {
ticker := time.NewTicker(pm.config.MemProfileInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
pm.captureMemProfile()
case <-pm.stopCh:
return
}
}
}
// captureCPUProfile captures a CPU profile
func (pm *ProfileManager) captureCPUProfile() {
timestamp := time.Now().Format("20060102-150405")
filename := fmt.Sprintf("%s/cpu-%s.prof", pm.config.OutputDir, timestamp)
f, err := os.Create(filename)
if err != nil {
log.Printf("Failed to create CPU profile file: %v", err)
return
}
defer f.Close()
log.Printf("Capturing CPU profile to %s", filename)
if err := pprof.StartCPUProfile(f); err != nil {
log.Printf("Failed to start CPU profile: %v", err)
return
}
// Profile for 30 seconds
time.Sleep(30 * time.Second)
pprof.StopCPUProfile()
log.Printf("CPU profile captured")
}
// captureMemProfile captures a memory profile
func (pm *ProfileManager) captureMemProfile() {
timestamp := time.Now().Format("20060102-150405")
filename := fmt.Sprintf("%s/mem-%s.prof", pm.config.OutputDir, timestamp)
f, err := os.Create(filename)
if err != nil {
log.Printf("Failed to create memory profile file: %v", err)
return
}
defer f.Close()
log.Printf("Capturing memory profile to %s", filename)
// Run GC before profiling to get accurate memory usage
runtime.GC()
if err := pprof.WriteHeapProfile(f); err != nil {
log.Printf("Failed to write memory profile: %v", err)
return
}
log.Printf("Memory profile captured")
}
// simulateLoad generates some CPU and memory load
func simulateLoad() {
// CPU load
go func() {
for {
for i := 0; i < 1000000; i++ {
_ = i * i
}
time.Sleep(100 * time.Millisecond)
}
}()
// Memory load
go func() {
var slices [][]byte
for {
// Allocate memory
slice := make([]byte, 1024*1024) // 1MB
for i := range slice {
slice[i] = byte(i % 256)
}
slices = append(slices, slice)
// Release some memory occasionally
if len(slices) > 10 {
slices = slices[5:]
}
time.Sleep(500 * time.Millisecond)
}
}()
}
func main() {
// Configure profiling
config := ProfilingConfig{
EnableHTTPProfiling: true,
CPUProfileInterval: 5 * time.Minute,
MemProfileInterval: 5 * time.Minute,
BlockProfileRate: 1,
MutexProfileRate: 1,
OutputDir: "./profiles",
}
// Create and start profile manager
pm := NewProfileManager(config)
pm.Start()
// Simulate application load
simulateLoad()
// Keep the application running
fmt.Println("Application running with continuous profiling...")
fmt.Println("Access pprof web interface at http://localhost:6060/debug/pprof/")
fmt.Println("Press Ctrl+C to exit")
// Wait indefinitely
select {}
}
Performance Metrics Collection
Collecting and analyzing performance metrics helps identify trends and issues:
package main
import (
"fmt"
"log"
"math/rand"
"net/http"
"runtime"
"sync"
"time"
)
// simulateAPIEndpoint simulates an API endpoint with variable response times
func simulateAPIEndpoint(endpoint string, minLatency, maxLatency time.Duration, errorRate float64) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Simulate processing time
processingTime := minLatency + time.Duration(rand.Float64()*float64(maxLatency-minLatency))
time.Sleep(processingTime)
// Simulate errors
if rand.Float64() < errorRate {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
// Successful response
fmt.Fprintf(w, "Response from %s\n", endpoint)
// Log performance metrics
elapsed := time.Since(start)
log.Printf("%s - %s - %v", endpoint, r.Method, elapsed)
}
}
// collectRuntimeMetrics periodically collects and logs runtime metrics
func collectRuntimeMetrics(interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for range ticker.C {
var m runtime.MemStats
runtime.ReadMemStats(&m)
log.Printf("Goroutines: %d", runtime.NumGoroutine())
log.Printf("Memory: Alloc=%v MiB, Sys=%v MiB, NumGC=%v",
m.Alloc/1024/1024, m.Sys/1024/1024, m.NumGC)
}
}
// simulateLoad generates artificial load on the server
func simulateLoad(apiURL string, concurrency int, requestsPerSecond float64) {
// Calculate delay between requests
delay := time.Duration(float64(time.Second) / requestsPerSecond)
// Create a worker pool
var wg sync.WaitGroup
requestCh := make(chan struct{})
// Start workers
for i := 0; i < concurrency; i++ {
wg.Add(1)
go func() {
defer wg.Done()
client := &http.Client{
Timeout: 5 * time.Second,
}
for range requestCh {
resp, err := client.Get(apiURL)
if err != nil {
log.Printf("Request error: %v", err)
continue
}
resp.Body.Close()
}
}()
}
// Generate requests at the specified rate
ticker := time.NewTicker(delay)
defer ticker.Stop()
log.Printf("Generating load: %f requests/second with %d concurrent clients",
requestsPerSecond, concurrency)
for range ticker.C {
select {
case requestCh <- struct{}{}:
// Request sent to worker
default:
// All workers busy, skip this request
log.Println("Overloaded, skipping request")
}
}
close(requestCh)
wg.Wait()
}
func main() {
// Seed random number generator
rand.Seed(time.Now().UnixNano())
// Start runtime metrics collection
go collectRuntimeMetrics(5 * time.Second)
// Set up API endpoints
http.HandleFunc("/api/fast", simulateAPIEndpoint("fast", 10*time.Millisecond, 50*time.Millisecond, 0.01))
http.HandleFunc("/api/medium", simulateAPIEndpoint("medium", 50*time.Millisecond, 200*time.Millisecond, 0.05))
http.HandleFunc("/api/slow", simulateAPIEndpoint("slow", 200*time.Millisecond, 1000*time.Millisecond, 0.10))
// Start load generation
go simulateLoad("http://localhost:8080/api/fast", 10, 50)
go simulateLoad("http://localhost:8080/api/medium", 5, 20)
go simulateLoad("http://localhost:8080/api/slow", 2, 5)
// Start HTTP server
log.Println("Starting server on :8080")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatalf("Server failed: %v", err)
}
}
Takeaway Points
Performance profiling and optimization are essential skills for Go developers building applications that need to operate efficiently at scale. By mastering Go’s comprehensive suite of profiling tools and applying systematic optimization techniques, you can identify and eliminate bottlenecks before they impact your users.
The key takeaways from this guide include:
Start with clear performance objectives: Define specific, measurable performance goals before beginning optimization work.
Profile before optimizing: Use Go’s profiling tools to identify actual bottlenecks rather than optimizing based on assumptions.
Focus on the critical path: Optimize the parts of your code that have the greatest impact on overall performance.
Measure the impact: Quantify the effect of your optimizations through benchmarking and profiling.
Monitor in production: Set up continuous profiling and metrics collection to catch performance regressions early.
Remember that premature optimization can lead to more complex, harder-to-maintain code without meaningful performance benefits. The most effective approach is to write clean, idiomatic Go code first, then use profiling to guide targeted optimizations where they matter most.
By applying the techniques covered in this guide—from CPU and memory profiling to advanced concurrency patterns and compiler optimizations—you’ll be well-equipped to build Go applications that are not just functionally correct, but blazingly fast and resource-efficient.