Machine Learning with Rust: Performance and Safety for AI Applications

Andrew • Jun 15, 2025 • Rust , Machine Learning , AI , Programming

10 min read 2155 words

Machine learning has traditionally been dominated by languages like Python, which offer ease of use and a rich ecosystem of libraries. However, as models grow larger and performance requirements become more demanding, there’s increasing interest in alternatives that can provide better efficiency without sacrificing developer productivity. Rust, with its focus on performance, safety, and modern language features, is emerging as a compelling option for machine learning applications, particularly in production environments where speed and reliability are critical.

In this comprehensive guide, we’ll explore the landscape of machine learning in Rust, from low-level tensor operations to high-level frameworks and integrations with existing ecosystems. You’ll learn about the tools and libraries available, understand the advantages and challenges of using Rust for machine learning, and see practical examples of implementing and deploying ML models in Rust. By the end, you’ll have a solid foundation for incorporating Rust into your machine learning workflow, whether you’re building a simple model or a complex AI system.

Why Rust for Machine Learning?

Before diving into the technical details, let’s understand why Rust is gaining traction in the machine learning community:

Performance Without Compromise

Machine learning models, especially during inference, need to run efficiently. Rust delivers C/C++-level performance with additional safety guarantees:

use ndarray::{Array, Array2};

// Efficient matrix multiplication
fn matrix_multiply(a: &Array2<f32>, b: &Array2<f32>) -> Array2<f32> {
    let (m, n) = a.dim();
    let (_, p) = b.dim();
    
    let mut result = Array::zeros((m, p));
    
    for i in 0..m {
        for j in 0..p {
            let mut sum = 0.0;
            for k in 0..n {
                sum += a[[i, k]] * b[[k, j]];
            }
            result[[i, j]] = sum;
        }
    }
    
    result
}

Memory Safety for Complex Systems

Machine learning systems often involve complex data processing pipelines. Rust’s ownership model prevents common bugs:

struct DataPipeline {
    preprocessor: Box<dyn Preprocessor>,
    model: Box<dyn Model>,
    postprocessor: Box<dyn Postprocessor>,
}

impl DataPipeline {
    fn process(&self, input: &[f32]) -> Vec<f32> {
        // No use-after-free or double-free bugs possible
        // No data races when using multiple threads
        let preprocessed = self.preprocessor.preprocess(input);
        let predictions = self.model.predict(&preprocessed);
        self.postprocessor.postprocess(&predictions)
    }
}

Fearless Concurrency

Modern ML workloads benefit from parallelism. Rust makes concurrent programming safer:

use rayon::prelude::*;

fn batch_inference(model: &dyn Model, inputs: &[Vec<f32>]) -> Vec<Vec<f32>> {
    // Parallel processing of inputs
    inputs.par_iter()
        .map(|input| model.predict(input))
        .collect()
}

Interoperability with Existing Ecosystems

Rust can easily integrate with existing ML ecosystems like Python and C++:

use pyo3::prelude::*;
use pyo3::types::PyDict;

#[pyfunction]
fn predict(input: Vec<f32>) -> PyResult<Vec<f32>> {
    // Load Rust model
    let model = RustModel::load("model.bin")?;
    
    // Run inference in Rust
    let output = model.predict(&input);
    
    // Return result to Python
    Ok(output)
}

#[pymodule]
fn rust_ml(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(predict, m)?)?;
    Ok(())
}

Getting Started with Machine Learning in Rust

Let’s explore how to set up and start developing machine learning applications with Rust:

Setting Up the Development Environment

First, you’ll need to install Rust and some additional tools:

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install cargo-edit for easier dependency management
cargo install cargo-edit

Creating a New ML Project

Let’s create a simple machine learning project:

# Create a new project
cargo new --bin rust_ml
cd rust_ml

Edit the Cargo.toml file:

[package]
name = "rust_ml"
version = "0.1.0"
edition = "2021"

[dependencies]
ndarray = "0.15"
ndarray-rand = "0.14"
ndarray-stats = "0.5"
rand = "0.8"
linfa = "0.6"
linfa-linear = "0.6"
csv = "1.2"

Linear Regression Example

Now, let’s implement a simple linear regression model:

use linfa::prelude::*;
use linfa_linear::LinearRegression;
use ndarray::{Array, Array1, Array2};
use ndarray_rand::rand_distr::Uniform;
use ndarray_rand::RandomExt;
use rand::rngs::StdRng;
use rand::SeedableRng;

fn main() {
    // Generate synthetic data
    let mut rng = StdRng::seed_from_u64(42);
    let n_samples = 100;
    let n_features = 3;
    
    // Features: random values between 0 and 1
    let x = Array::random_using((n_samples, n_features), Uniform::new(0., 1.), &mut rng);
    
    // Coefficients
    let coefficients = array![2.0, -1.0, 0.5];
    
    // Target: linear combination of features with some noise
    let y = x.dot(&coefficients) + Array::random_using(n_samples, Uniform::new(-0.1, 0.1), &mut rng);
    
    // Split into training and testing sets
    let n_train = 80;
    let x_train = x.slice(s![0..n_train, ..]).to_owned();
    let y_train = y.slice(s![0..n_train]).to_owned();
    let x_test = x.slice(s![n_train.., ..]).to_owned();
    let y_test = y.slice(s![n_train..]).to_owned();
    
    // Create dataset
    let dataset = Dataset::new(x_train, y_train);
    
    // Train linear regression model
    let model = LinearRegression::default()
        .fit(&dataset)
        .expect("Failed to fit the model");
    
    // Make predictions
    let predictions = model.predict(&x_test);
    
    // Calculate mean squared error
    let mse = predictions
        .iter()
        .zip(y_test.iter())
        .map(|(pred, actual)| (pred - actual).powi(2))
        .sum::<f64>() / predictions.len() as f64;
    
    println!("Coefficients: {:?}", model.coefficients());
    println!("Intercept: {}", model.intercept());
    println!("Mean Squared Error: {}", mse);
}

This example demonstrates how to generate synthetic data, train a linear regression model, and evaluate its performance using the linfa crate.

Tensor Operations and Neural Networks

For more advanced machine learning, you’ll need efficient tensor operations and neural network capabilities:

Using ndarray for Tensor Operations

use ndarray::{array, Array, Array1, Array2};

fn main() {
    // Create a 2D array
    let a = array![[1., 2., 3.], [4., 5., 6.]];
    let b = array![[7., 8.], [9., 10.], [11., 12.]];
    
    // Matrix multiplication
    let c = a.dot(&b);
    println!("Matrix multiplication result:\n{:?}", c);
    
    // Element-wise operations
    let d = &a + 1.0;
    println!("Element-wise addition:\n{:?}", d);
    
    // Slicing
    let row = a.slice(s![0, ..]);
    println!("First row: {:?}", row);
    
    // Reshaping
    let reshaped = a.into_shape((3, 2)).unwrap();
    println!("Reshaped array:\n{:?}", reshaped);
    
    // Reduction operations
    let sum = a.sum();
    let mean = a.mean().unwrap();
    println!("Sum: {}, Mean: {}", sum, mean);
}

Building a Neural Network with burn

The burn crate provides a PyTorch-like API for building and training neural networks:

use burn::tensor::{Tensor, backend::Backend};
use burn::module::{Module, ModuleT};
use burn::nn::{Linear, LinearConfig, ReLU};
use burn::optim::{Adam, AdamConfig};
use burn::record::{Recorder, RecorderT};

// Define a simple neural network
#[derive(Module, Debug)]
struct MLP<B: Backend> {
    linear1: Linear<B>,
    linear2: Linear<B>,
    activation: ReLU,
}

impl<B: Backend> MLP<B> {
    fn new(in_features: usize, hidden_features: usize, out_features: usize) -> Self {
        Self {
            linear1: LinearConfig::new(in_features, hidden_features).init(),
            linear2: LinearConfig::new(hidden_features, out_features).init(),
            activation: ReLU::new(),
        }
    }
}

impl<B: Backend> ModuleT<Tensor<B, 2>> for MLP<B> {
    type Output = Tensor<B, 2>;
    
    fn forward(&self, input: Tensor<B, 2>) -> Self::Output {
        let x = self.linear1.forward(input);
        let x = self.activation.forward(x);
        self.linear2.forward(x)
    }
}

fn main() {
    // Create a model
    let model = MLP::<burn::backend::Cpu>::new(10, 50, 1);
    
    // Create some dummy data
    let x = Tensor::<burn::backend::Cpu, 2>::random([32, 10]);
    let y = Tensor::<burn::backend::Cpu, 2>::random([32, 1]);
    
    // Create an optimizer
    let mut optimizer = Adam::new(AdamConfig::default());
    
    // Training loop
    for epoch in 0..10 {
        // Forward pass
        let pred = model.forward(x.clone());
        
        // Compute loss
        let loss = pred.mse_loss(&y);
        
        // Backward pass and optimize
        let gradients = loss.backward();
        optimizer.step(&mut model, &gradients);
        
        println!("Epoch {}: Loss = {}", epoch, loss.into_scalar());
    }
}

Using tch-rs for PyTorch Integration

The tch-rs crate provides Rust bindings to the PyTorch C++ API:

use tch::{nn, nn::Module, nn::OptimizerConfig, Device, Tensor};

fn main() {
    // Check if CUDA is available
    let device = Device::cuda_if_available();
    println!("Using device: {:?}", device);
    
    // Define a simple neural network
    let vs = nn::VarStore::new(device);
    let net = nn::seq()
        .add(nn::linear(&vs.root(), 784, 128, Default::default()))
        .add_fn(|xs| xs.relu())
        .add(nn::linear(&vs.root(), 128, 10, Default::default()));
    
    // Create an optimizer
    let mut opt = nn::Adam::default().build(&vs, 1e-3).unwrap();
    
    // Create some dummy data
    let x = Tensor::rand(&[64, 784], (tch::Kind::Float, device));
    let y = Tensor::zeros(&[64], (tch::Kind::Int64, device));
    
    // Training loop
    for epoch in 1..11 {
        // Forward pass
        let prediction = net.forward(&x);
        
        // Compute loss
        let loss = prediction.cross_entropy_for_logits(&y);
        
        // Backward pass and optimize
        opt.zero_grad();
        loss.backward();
        opt.step();
        
        println!("Epoch: {}, Loss: {}", epoch, f64::from(&loss));
    }
    
    // Save the model
    vs.save("model.safetensors").unwrap();
}

Working with Data

Efficient data processing is crucial for machine learning:

Loading and Preprocessing Data

use csv::ReaderBuilder;
use ndarray::{Array, Array1, Array2};
use std::error::Error;
use std::fs::File;

fn load_csv(path: &str) -> Result<(Array2<f64>, Array1<f64>), Box<dyn Error>> {
    let file = File::open(path)?;
    let mut reader = ReaderBuilder::new().has_headers(true).from_reader(file);
    
    let headers = reader.headers()?.clone();
    let feature_count = headers.len() - 1; // Assuming last column is target
    
    let mut features = Vec::new();
    let mut targets = Vec::new();
    
    for result in reader.records() {
        let record = result?;
        
        // Parse features
        let mut row_features = Vec::with_capacity(feature_count);
        for i in 0..feature_count {
            let value = record[i].parse::<f64>()?;
            row_features.push(value);
        }
        features.push(row_features);
        
        // Parse target
        let target = record[feature_count].parse::<f64>()?;
        targets.push(target);
    }
    
    // Convert to ndarray
    let n_samples = features.len();
    let x = Array::from_shape_vec((n_samples, feature_count), features.into_iter().flatten().collect())?;
    let y = Array::from_shape_vec(n_samples, targets)?;
    
    Ok((x, y))
}

fn normalize(x: &mut Array2<f64>) {
    for col in 0..x.ncols() {
        let col_view = x.column(col);
        let mean = col_view.mean().unwrap();
        let std_dev = col_view.std(0.0);
        
        for mut row in x.columns_mut() {
            row[col] = (row[col] - mean) / std_dev;
        }
    }
}

Model Deployment and Inference

Once you’ve trained a model, you’ll want to deploy it for inference:

Serializing and Deserializing Models

use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::{BufReader, BufWriter};

#[derive(Serialize, Deserialize)]
struct LinearModel {
    weights: Vec<f64>,
    bias: f64,
}

impl LinearModel {
    fn new(weights: Vec<f64>, bias: f64) -> Self {
        Self { weights, bias }
    }
    
    fn predict(&self, features: &[f64]) -> f64 {
        assert_eq!(features.len(), self.weights.len());
        
        let mut result = self.bias;
        for (w, x) in self.weights.iter().zip(features.iter()) {
            result += w * x;
        }
        
        result
    }
    
    fn save(&self, path: &str) -> Result<(), Box<dyn std::error::Error>> {
        let file = File::create(path)?;
        let writer = BufWriter::new(file);
        serde_json::to_writer(writer, self)?;
        Ok(())
    }
    
    fn load(path: &str) -> Result<Self, Box<dyn std::error::Error>> {
        let file = File::open(path)?;
        let reader = BufReader::new(file);
        let model = serde_json::from_reader(reader)?;
        Ok(model)
    }
}

Creating a REST API for Model Serving

use actix_web::{web, App, HttpResponse, HttpServer, Responder};
use serde::{Deserialize, Serialize};
use std::sync::Mutex;

// Model struct
struct LinearModel {
    weights: Vec<f64>,
    bias: f64,
}

impl LinearModel {
    fn predict(&self, features: &[f64]) -> f64 {
        let mut result = self.bias;
        for (w, x) in self.weights.iter().zip(features.iter()) {
            result += w * x;
        }
        result
    }
}

// Request and response structs
#[derive(Deserialize)]
struct PredictRequest {
    features: Vec<f64>,
}

#[derive(Serialize)]
struct PredictResponse {
    prediction: f64,
}

// App state
struct AppState {
    model: Mutex<LinearModel>,
}

// Handler function
async fn predict(
    data: web::Data<AppState>,
    request: web::Json<PredictRequest>,
) -> impl Responder {
    let model = data.model.lock().unwrap();
    let prediction = model.predict(&request.features);
    
    HttpResponse::Ok().json(PredictResponse { prediction })
}

Best Practices for Machine Learning in Rust

Based on experience from real machine learning projects in Rust, here are some best practices:

1. Profile Early and Often

Performance is critical in machine learning. Use Rust’s profiling tools to identify bottlenecks:

use std::time::Instant;

fn benchmark<F, R>(name: &str, f: F) -> R
where
    F: FnOnce() -> R,
{
    let start = Instant::now();
    let result = f();
    let duration = start.elapsed();
    println!("{}: {:?}", name, duration);
    result
}

fn main() {
    // Benchmark different implementations
    let a = benchmark("Implementation A", || {
        // Implementation A
        // ...
        42
    });
    
    let b = benchmark("Implementation B", || {
        // Implementation B
        // ...
        42
    });
    
    println!("Results: {} vs {}", a, b);
}

2. Use SIMD for Performance-Critical Code

Single Instruction, Multiple Data (SIMD) operations can significantly speed up numerical computations:

use std::arch::x86_64::*;

// SIMD vector addition for f32 arrays
unsafe fn add_f32_simd(a: &[f32], b: &[f32], result: &mut [f32]) {
    assert_eq!(a.len(), b.len());
    assert_eq!(a.len(), result.len());
    
    let n = a.len();
    let mut i = 0;
    
    // Process 8 elements at a time using AVX
    while i + 8 <= n {
        let a_vec = _mm256_loadu_ps(&a[i]);
        let b_vec = _mm256_loadu_ps(&b[i]);
        let sum = _mm256_add_ps(a_vec, b_vec);
        _mm256_storeu_ps(&mut result[i], sum);
        i += 8;
    }
    
    // Process remaining elements
    for j in i..n {
        result[j] = a[j] + b[j];
    }
}

3. Minimize Allocations in Critical Paths

Allocations can cause performance issues in tight loops:

// Bad: Allocating in a tight loop
fn process_batches_bad(data: &[f32], batch_size: usize) -> Vec<f32> {
    let mut results = Vec::new();
    
    for batch in data.chunks(batch_size) {
        let processed = process_batch(batch); // Returns a new Vec
        results.extend_from_slice(&processed);
    }
    
    results
}

// Good: Reusing allocations
fn process_batches_good(data: &[f32], batch_size: usize) -> Vec<f32> {
    let mut results = Vec::with_capacity(data.len());
    let mut batch_buffer = Vec::with_capacity(batch_size);
    
    for batch in data.chunks(batch_size) {
        batch_buffer.clear();
        process_batch_into(&mut batch_buffer, batch);
        results.extend_from_slice(&batch_buffer);
    }
    
    results
}

4. Use Appropriate Data Structures

Choose the right data structures for your specific use case:

// Dense representation for dense data
use ndarray::Array2;

// Sparse representation for sparse data
use sprs::{CsMat, TriMat};

fn main() {
    // Dense matrix
    let dense = Array2::<f64>::zeros((1000, 1000));
    
    // Sparse matrix (99% zeros)
    let mut triplet = TriMat::new((1000, 1000));
    for i in 0..1000 {
        triplet.add_triplet(i, i, 1.0);
    }
    let sparse = triplet.to_csr();
    
    // The sparse representation uses much less memory
    println!("Dense size: {} elements", dense.len());
    println!("Sparse size: {} non-zero elements", sparse.nnz());
}

Conclusion

Rust is emerging as a powerful language for machine learning applications, offering a unique combination of performance, safety, and modern language features. Its ownership system prevents many common bugs that plague machine learning code, while its performance characteristics make it suitable for even the most demanding ML workloads.

The key takeaways from this exploration of machine learning in Rust are:

Performance comparable to C/C++ without sacrificing safety
Memory safety through the ownership system, preventing common bugs
Concurrency made safer through Rust’s type system
Interoperability with existing ML ecosystems like Python and C++
Growing ecosystem of machine learning libraries and tools

Whether you’re building a simple model or a complex AI system, Rust provides the tools and abstractions you need to create fast, reliable machine learning applications. As the ecosystem continues to mature, Rust is poised to become an increasingly popular choice for machine learning practitioners seeking both performance and safety.

Tags: Rust Machine Learning AI Programming

Andrew

Andrew is a visionary software engineer and DevOps expert with a proven track record of delivering cutting-edge solutions that drive innovation at Ataiva.com. As a leader on numerous high-profile projects, Andrew brings his exceptional technical expertise and collaborative leadership skills to the table, fostering a culture of agility and excellence within the team. With a passion for architecting scalable systems, automating workflows, and empowering teams, Andrew is a sought-after authority in the field of software development and DevOps.