Memory Management & Safety

This guide covers how Ruby's garbage collector interacts with Rust objects and how to prevent memory leaks and segmentation faults in your extensions.

warning

Improper memory management is the leading cause of crashes and security vulnerabilities in native extensions. Rust's safety guarantees help prevent many common issues, but you still need to carefully manage the boundary between Ruby and Rust memory.

Ruby's Garbage Collection System

Ruby uses a mark-and-sweep garbage collector to manage memory. Understanding how it works is essential for writing safe extensions:

Marking Phase: Ruby traverses all visible objects, marking them as "in use"
Sweeping Phase: Objects that weren't marked are considered garbage and freed

note

When you create Rust objects that reference Ruby objects, you need to tell Ruby's GC about these references to prevent premature garbage collection. The TypedData trait and mark method provide the mechanism to do this.

TypedData and DataTypeFunctions

Magnus provides a TypedData trait and DataTypeFunctions trait for managing Ruby objects that wrap Rust structs. This is the recommended way to handle complex objects in Rust.

Basic TypedData Example

Here's how to define a simple Ruby object that wraps a Rust struct:

use magnus::{prelude::*, Error, Ruby, TypedData, DataTypeFunctions, function, method, Value};

use std::cell::RefCell;

// Define your Rust struct with interior mutability
#[derive(TypedData)]
#[magnus(class = "MyExtension::Counter", free_immediately)]
struct Counter(RefCell<i64>);

// Implement required functions
impl DataTypeFunctions for Counter {}

// Implement methods for your struct
impl Counter {
    fn new(initial_value: i64) -> Self {
        Counter(RefCell::new(initial_value))
    }

    fn increment(&self, amount: i64) -> i64 {
        let mut count = self.0.borrow_mut();
        *count += amount;
        *count
    }

    fn value(&self) -> i64 {
        *self.0.borrow()
    }
}

// Register with Ruby
fn init(ruby: &Ruby) -> Result<(), Error> {
    let class = ruby.define_class("Counter", ruby.class_object())?;

    class.define_singleton_method("new", function!(Counter::new, 1))?;

    class.define_method("increment", method!(Counter::increment, 1))?;
    class.define_method("value", method!(Counter::value, 0))?;

    Ok(())
}

Implementing GC Marking

When your Rust struct holds references to Ruby objects, you need to implement the mark method to tell Ruby's GC about those references. Here's a simple example:

use magnus::{
    prelude::*, Error, Ruby, Value, TypedData, DataTypeFunctions,
    gc::Marker, typed_data::Obj, function, method
};

use std::cell::RefCell;

// Internal data structure
struct PersonData {
    // Reference to a Ruby string (their name)
    name: Value,
    // Reference to Ruby array (their hobbies)
    hobbies: Value,
    // Optional reference to another Person (their friend)
    friend: Option<Obj<Person>>,
}

// A struct that holds references to Ruby objects
#[derive(TypedData)]
#[magnus(class = "MyExtension::Person", free_immediately, mark)]
struct Person(RefCell<PersonData>);

// SAFETY: Person is only accessed from Ruby threads with the GVL
unsafe impl Send for Person {}

// Implement DataTypeFunctions with mark method
impl DataTypeFunctions for Person {
    // This is called during GC mark phase
    fn mark(&self, marker: &Marker) {
        let data = self.0.borrow();
        // Mark the Ruby objects we reference
        marker.mark(data.name);
        marker.mark(data.hobbies);

        // If we have a friend, mark them too
        if let Some(ref friend) = data.friend {
            marker.mark(*friend);
        }
    }
}

impl Person {
    fn new(name: Value, hobbies: Value) -> Self {
        Self(RefCell::new(PersonData {
            name,
            hobbies,
            friend: None,
        }))
    }

    fn add_friend(&self, friend: Obj<Person>) {
        self.0.borrow_mut().friend = Some(friend);
    }

    fn name(&self) -> Value {
        self.0.borrow().name
    }

    fn hobbies(&self) -> Value {
        self.0.borrow().hobbies
    }

    fn friend(&self) -> Option<Obj<Person>> {
        self.0.borrow().friend
    }
}

// Register with Ruby
fn init(ruby: &Ruby) -> Result<(), Error> {
    let class = ruby.define_class("Person", ruby.class_object())?;

    class.define_singleton_method("new", function!(Person::new, 2))?;

    class.define_method("name", method!(Person::name, 0))?;
    class.define_method("hobbies", method!(Person::hobbies, 0))?;
    class.define_method("friend", method!(Person::friend, 0))?;
    class.define_method("add_friend", method!(Person::add_friend, 1))?;

    Ok(())
}

In this example:

The Person struct holds references to Ruby objects (name and hobbies) and another wrapped Rust object (friend)
We implement the mark method to tell Ruby's GC about all these references
During garbage collection, Ruby will know not to collect these objects as long as the Person is alive

A Real-World Example: Error Handler

Here's an example of a real-world error handling pattern:

use magnus::{
    prelude::*, method, Error, Ruby, TypedData, DataTypeFunctions,
    typed_data::Obj, Symbol
};

// Different error types that might occur
#[derive(Debug, Clone)]
pub enum ErrorKind {
    StackOverflow,
    MemoryOutOfBounds,
    InvalidOperation(String),
    Unknown,
}

// A struct representing an error with optional backtrace
#[derive(TypedData)]
#[magnus(class = "MyExtension::Error", size, free_immediately)]
pub struct ErrorInfo {
    kind: ErrorKind,
    message: String,
    backtrace: Option<String>,
}

// No references to Ruby objects, so mark is empty
impl DataTypeFunctions for ErrorInfo {}

impl ErrorInfo {
    pub fn new(kind: ErrorKind, message: String, backtrace: Option<String>) -> Self {
        Self {
            kind,
            message,
            backtrace,
        }
    }

    // Return a text description of the error
    pub fn message(&self) -> String {
        self.message.clone()
    }

    // Return the backtrace if available
    pub fn backtrace_message(&self) -> Option<String> {
        self.backtrace.clone()
    }

    // Return the error code as a Ruby symbol
    pub fn code(&self) -> Result<Option<Symbol>, Error> {
        match &self.kind {
            ErrorKind::StackOverflow => Ok(Some(Symbol::new("STACK_OVERFLOW"))),
            ErrorKind::MemoryOutOfBounds => Ok(Some(Symbol::new("MEMORY_OUT_OF_BOUNDS"))),
            ErrorKind::InvalidOperation(_) => Ok(Some(Symbol::new("INVALID_OPERATION"))),
            ErrorKind::Unknown => Ok(Some(Symbol::new("UNKNOWN"))),
        }
    }

    // Custom inspect method
    pub fn inspect(rb_self: Obj<Self>) -> Result<String, Error> {
        Ok(format!(
            "#<MyExtension::Error:0x{:016x} @error_code={}>",
            &rb_self as *const _ as usize,
            rb_self.code()?.map_or("nil".to_string(), |s| s.to_string())
        ))
    }
}

// Register with Ruby
fn init(ruby: &Ruby) -> Result<(), Error> {
    let module = ruby.define_module("MyExtension")?;
    let class = module.define_class("Error", ruby.class_object())?;

    class.define_method("message", method!(ErrorInfo::message, 0))?;
    class.define_method("backtrace_message", method!(ErrorInfo::backtrace_message, 0))?;
    class.define_method("code", method!(ErrorInfo::code, 0))?;
    class.define_method("inspect", method!(ErrorInfo::inspect, 0))?;

    Ok(())
}

This example shows:

A Rust struct that wraps error information with an optional backtrace
Methods that convert Rust values to Ruby-friendly types
A simple implementation of DataTypeFunctions (since there are no Ruby object references to mark)

More Complex Example: Memory References

Let's look at a more complex scenario involving memory management:

use magnus::{
    prelude::*, gc::Marker, Error, Ruby, TypedData, DataTypeFunctions,
    typed_data::Obj, value::Opaque, RString, Value
};

// Placeholder types for the example
struct StoreContext(Value);
struct WasmMemory;

impl StoreContext {
    fn mark(&self, marker: &Marker) {
        marker.mark(self.0);
    }
}

// Represents WebAssembly memory
#[derive(TypedData)]
#[magnus(class = "Wasmtime::Memory", free_immediately, mark)]
struct Memory {
    // Reference to a store (context) object
    store: StoreContext,
    // The actual WebAssembly memory
    memory: WasmMemory,
}

// SAFETY: Memory is only accessed from Ruby threads with the GVL
unsafe impl Send for Memory {}

impl DataTypeFunctions for Memory {
    fn mark(&self, marker: &Marker) {
        // Mark the store so it stays alive
        self.store.mark(marker);
    }
}

impl Memory {
    fn size(&self) -> Result<u64, Error> {
        Ok(65536) // Example: 64KB
    }
    
    fn data_size(&self) -> Result<usize, Error> {
        Ok(65536) // Example: 64KB
    }
    
    fn data(&self) -> Result<Vec<u8>, Error> {
        Ok(vec![0; 65536]) // Example: zero-filled memory
    }
}

// A guard that ensures memory access is safe
struct MemoryGuard {
    // Reference to Memory object
    memory: Opaque<Obj<Memory>>,
    // Size when created, to detect resizing
    original_size: u64,
}

impl MemoryGuard {
    fn new(memory: Obj<Memory>) -> Result<Self, Error> {
        let original_size = memory.size()?;

        Ok(Self {
            memory: memory.into(),
            original_size,
        })
    }

    fn get(&self) -> Result<&Memory, Error> {
        let ruby = unsafe { Ruby::get_unchecked() };
        let mem = ruby.get_inner_ref(&self.memory);

        // Check that memory size hasn't changed
        if mem.size()? != self.original_size {
            return Err(Error::new(
                magnus::exception::runtime_error(),
                "memory was resized, reference is no longer valid"
            ));
        }

        Ok(mem)
    }

    fn mark(&self, marker: &Marker) {
        marker.mark(self.memory);
    }
}

// A slice of WebAssembly memory
#[derive(TypedData)]
#[magnus(class = "Wasmtime::MemorySlice", free_immediately, mark)]
struct MemorySlice {
    guard: MemoryGuard,
    offset: usize,
    size: usize,
}

// SAFETY: MemorySlice is only accessed from Ruby threads with the GVL
unsafe impl Send for MemorySlice {}

impl DataTypeFunctions for MemorySlice {
    fn mark(&self, marker: &Marker) {
        // Mark the memory guard, which marks the memory object
        self.guard.mark(marker);
    }
}

impl MemorySlice {
    fn new(memory: Obj<Memory>, offset: usize, size: usize) -> Result<Self, Error> {
        let guard = MemoryGuard::new(memory)?;

        // Validate the slice is in bounds
        let mem = guard.get()?;
        if offset + size > mem.data_size()? {
            return Err(Error::new(
                magnus::exception::range_error(),
                "memory slice out of bounds"
            ));
        }

        Ok(Self {
            guard,
            offset,
            size,
        })
    }

    // Read the slice as a Ruby string (efficiently, without copying)
    fn to_str(&self) -> Result<RString, Error> {
        let ruby = unsafe { Ruby::get_unchecked() };
        let mem = self.guard.get()?;
        let data = mem.data()?;

        // Extract the relevant slice
        let slice = &data[self.offset..self.offset + self.size];

        // Create a Ruby string directly from the slice (zero-copy)
        Ok(ruby.str_from_slice(slice))
    }

    // Read the slice as a UTF-8 string (with validation)
    fn to_utf8_str(&self) -> Result<RString, Error> {
        let ruby = unsafe { Ruby::get_unchecked() };
        let mem = self.guard.get()?;
        let data = mem.data()?;

        // Extract the relevant slice
        let slice = &data[self.offset..self.offset + self.size];

        // Validate UTF-8 and create a Ruby string
        match std::str::from_utf8(slice) {
            Ok(s) => Ok(RString::new(s)),
            Err(e) => Err(Error::new(
                magnus::exception::encoding_error(),
                format!("invalid UTF-8: {}", e)
            ))
        }
    }
}

This more advanced example demonstrates:

Guarded Resource Access: The MemoryGuard ensures memory operations are safe by checking for resizing
Proper GC Integration: Both structs implement marking to ensure referenced objects aren't collected
Efficient String Creation: Using str_from_slice to create strings directly from memory without extra copying
Error Handling: All operations that might fail return meaningful errors
Resource Validation: The code validates bounds before accessing memory

Common Memory Management Pitfalls

warning

These pitfalls can lead to crashes, memory leaks, or undefined behavior in your Ruby extensions. Understanding and avoiding them is crucial for writing reliable code.

1. Forgetting to Mark References

If your Rust struct holds Ruby objects but doesn't implement marking, those objects might be collected while still in use:

use magnus::{Error, RString};
use std::cell::RefCell;

// Example: Store Ruby data as Rust types instead of Value
#[magnus::wrap(class = "DataHolder")]
struct DataHolder {
    // Store extracted data, not Ruby Values
    strings: RefCell<Vec<String>>,
    data: RefCell<std::collections::HashMap<String, String>>,
}

impl DataHolder {
    fn new() -> Self {
        DataHolder {
            strings: RefCell::new(Vec::new()),
            data: RefCell::new(std::collections::HashMap::new()),
        }
    }
    
    fn add_string(&self, ruby_str: RString) -> Result<(), Error> {
        self.strings.borrow_mut().push(ruby_str.to_string()?);
        Ok(())
    }
    
    fn add_data(&self, key: String, value: String) -> Result<(), Error> {
        self.data.borrow_mut().insert(key, value);
        Ok(())
    }
}

2. Creating Cyclic References

Cyclic references (A references B, which references A) can lead to memory leaks. Consider using weak references or redesigning your object graph.

3. Inefficient String Creation

note

String handling is often a performance bottleneck in Ruby extensions. Using the right APIs can significantly improve performance.

Creating strings inefficiently can significantly impact performance:

use magnus::{Error, RString, Ruby};

// BAD: Creates unnecessary temporary Rust String
fn inefficient_string(data: &[u8]) -> Result<RString, Error> {
    let temp_string = String::from_utf8(data.to_vec()).map_err(|e| Error::new(magnus::exception::type_error(), e.to_string()))?; // Unnecessary allocation
    Ok(RString::new(&temp_string))  // Another copy
}

// GOOD: Direct creation from slice
fn efficient_string(ruby: &Ruby, data: &[u8]) -> RString {
    ruby.str_from_slice(data)  // No extra copies
}

// ALSO GOOD: Creating from string slice when UTF-8 is confirmed
fn from_str(_ruby: &Ruby, s: &str) -> RString {
    RString::new(s)
}

// ALSO GOOD: Creating binary string with capacity then filling
fn build_string(_ruby: &Ruby, size: usize) -> RString {
    // In a real implementation, you would fill the string here
    RString::with_capacity(size)
}

4. Not Handling Exceptions Properly

Ruby exceptions can disrupt the normal flow of your code. Ensure resources are cleaned up even when exceptions occur.

RefCell and Interior Mutability

When creating Ruby objects with Rust, you'll often need to use interior mutability patterns. The most common approach is using RefCell to allow your Ruby objects to be mutated even when users hold immutable references to them.

Understanding RefCell and Borrowing

Rust's RefCell allows mutable access to data through shared references, but enforces Rust's borrowing rules at runtime. This is perfect for Ruby extension objects, where Ruby owns the object and we interact with it via method calls.

A common pattern is to wrap your Rust struct in a RefCell:

use std::cell::RefCell;
use magnus::{prelude::*, Error, Ruby};

struct Counter {
    count: i64,
}

#[magnus::wrap(class = "MyExtension::Counter")]
struct MutCounter(RefCell<Counter>);

impl MutCounter {
    fn new(initial: i64) -> Self {
        Self(RefCell::new(Counter { count: initial }))
    }

    fn count(&self) -> i64 {
        self.0.borrow().count
    }

    fn increment(&self) -> i64 {
        let mut counter = self.0.borrow_mut();
        counter.count += 1;
        counter.count
    }
}

The BorrowMutError Problem

A common mistake when using RefCell is trying to borrow mutably when you already have an active immutable borrow. This leads to a BorrowMutError panic:

use magnus::{Error, Ruby};
use std::cell::RefCell;
use std::rc::Rc;

struct CounterInner {
    count: i64,
}

struct Counter(Rc<RefCell<CounterInner>>);

// BAD - will panic with "already borrowed: BorrowMutError"
impl Counter {
    fn buggy_add(&self, val: i64) -> Result<i64, Error> {
        let ruby = unsafe { Ruby::get_unchecked() };
        // First borrow is still active when we try to borrow_mut below
        if let Some(sum) = self.0.borrow().count.checked_add(val) {
            self.0.borrow_mut().count = sum; // ERROR - already borrowed above
            Ok(sum)
        } else {
            Err(Error::new(
                ruby.exception_range_error(),
                "result too large"
            ))
        }
    }
}

The problem is that the borrow() in the if condition is still active when we try to use borrow_mut() in the body. Rust's borrow checker would catch this at compile time for normal references, but RefCell defers this check to runtime, resulting in a panic.

The Solution: Complete Borrows Before Mutating

The solution is to complete all immutable borrows before starting mutable ones:

use magnus::{Error, Ruby};
use std::cell::RefCell;
use std::rc::Rc;

struct CounterInner {
    count: i64,
}

struct Counter(Rc<RefCell<CounterInner>>);

// GOOD - copy the value first to complete the borrow
impl Counter {
    fn safe_add(&self, val: i64) -> Result<i64, Error> {
        let ruby = unsafe { magnus::Ruby::get_unchecked() };
        // Get the current count, completing this borrow
        let current_count = self.0.borrow().count;

        // Now we can safely borrow mutably
        if let Some(sum) = current_count.checked_add(val) {
            self.0.borrow_mut().count = sum; // Safe now
            Ok(sum)
        } else {
            Err(Error::new(
                ruby.exception_range_error(),
                "result too large"
            ))
        }
    }
}

By copying count to a local variable, we complete the immutable borrow before starting the mutable one, avoiding the runtime panic.

Complex Example with Multiple Operations

When working with more complex data structures:

use std::cell::RefCell;
use magnus::Error;

struct Game {
    players: Vec<String>,
    current_player: usize,
    score: i64,
}

#[magnus::wrap(class = "MyGame")]
struct MutGame(RefCell<Game>);

impl MutGame {
    fn new() -> Self {
        Self(RefCell::new(Game {
            players: Vec::new(),
            current_player: 0,
            score: 0,
        }))
    }

    // INCORRECT: Multiple borrows that will cause issues
    fn buggy_next_player_scores(&self, points: i64) -> Result<String, Error> {
        let game = self.0.borrow();
        if game.players.is_empty() {
            return Err(Error::new(
                magnus::exception::runtime_error(),
                "No players in game"
            ));
        }

        // This would panic - we're still borrowing game
        let mut game_mut = self.0.borrow_mut();
        game_mut.score += points;
        let player = game_mut.current_player;
        game_mut.current_player = (player + 1) % game_mut.players.len();

        Ok(format!("{} scored {} points! New total: {}",
            game_mut.players[player], points, game_mut.score))
    }

    // CORRECT: Copy all needed data before releasing the borrow
    fn safe_next_player_scores(&self, points: i64) -> Result<String, Error> {
        // Read all the data we need first
        let player_name: String;
        let new_player_index: usize;
        let new_score: i64;

        {
            // Create a block scope to ensure the borrow is dropped
            let game = self.0.borrow();
            if game.players.is_empty() {
                return Err(Error::new(
                    magnus::exception::runtime_error(),
                    "No players in game"
                ));
            }

            player_name = game.players[game.current_player].clone();
            new_player_index = (game.current_player + 1) % game.players.len();
            new_score = game.score + points;
        } // borrow is dropped here

        // Now we can borrow mutably
        let mut game = self.0.borrow_mut();
        game.score = new_score;
        game.current_player = new_player_index;

        Ok(format!("{} scored {} points! New total: {}",
            player_name, points, new_score))
    }
}

Using Temporary Variables Instead of Block Scopes

If you prefer, you can use temporary variables instead of block scopes:

use magnus::{Error, Ruby};
use std::cell::RefCell;
use std::rc::Rc;

struct GameState {
    players: Vec<String>,
    score: i64,
    current_player: usize,
}

struct Game(Rc<RefCell<GameState>>);

impl Game {
    fn add_player(&self, player: String) -> Result<usize, Error> {
        // Get the current number of players first
        let player_count = self.0.borrow().players.len();

        // Now we can mutate
        let mut game = self.0.borrow_mut();
        game.players.push(player);

        Ok(player_count + 1) // Return new count
    }
}

RefCell Best Practices

Complete All Borrows: Always complete immutable borrows before starting mutable borrows.
Use Block Scopes or Variables: Either use block scopes to limit borrow lifetimes or copy needed values to local variables.
Minimize Borrow Scope: Keep the scope of borrows as small as possible.
Clone When Necessary: If you need to keep references to data while mutating other parts, clone the data you need to keep.
Consider Data Design: Structure your data to minimize the need for complex borrowing patterns.
Error When Conflicting: If you can't resolve a borrowing conflict cleanly, make the operation an error rather than trying to force it.

Best Practices

Use TypedData and DataTypeFunctions: They provide a safe framework for memory management
Always Implement Mark Methods: Mark all Ruby objects your struct references
Validate Assumptions: Check that resources are valid before using them
Use Zero-Copy APIs: Leverage APIs like str_from_slice to avoid unnecessary copying
Use Guards for Changing Data: Validate assumptions before accessing data that might change
Test Thoroughly with GC Stress: Run tests with GC.stress = true to expose memory issues
Handle RefCell Borrowing Carefully: Complete all immutable borrows before starting mutable ones to avoid runtime panics

By following these practices, you can write Ruby extensions in Rust that are both memory-safe and efficient.

Ruby's Garbage Collection System​

TypedData and DataTypeFunctions​

Basic TypedData Example​

Implementing GC Marking​

A Real-World Example: Error Handler​

More Complex Example: Memory References​

Common Memory Management Pitfalls​

1. Forgetting to Mark References​

2. Creating Cyclic References​

3. Inefficient String Creation​

4. Not Handling Exceptions Properly​

RefCell and Interior Mutability​

Understanding RefCell and Borrowing​

The BorrowMutError Problem​

The Solution: Complete Borrows Before Mutating​

Complex Example with Multiple Operations​

Using Temporary Variables Instead of Block Scopes​

RefCell Best Practices​

Best Practices​