Skip to main content

Development Approaches

When building Ruby extensions with Rust and rb-sys, you have two main approaches to choose from:

  1. Direct rb-sys usage: Working directly with Ruby's C API through the rb-sys bindings
  2. Higher-level wrappers: Using libraries like Magnus that build on top of rb-sys

This chapter will help you understand when to use each approach and how to mix them when needed.

Direct rb-sys Usage

The rb-sys crate provides low-level bindings to Ruby's C API. This approach gives you complete control over how your Rust code interacts with Ruby.

When to Use Direct rb-sys

  • When you need maximum control over Ruby VM interaction
  • For specialized extensions that need access to low-level Ruby internals
  • When performance is absolutely critical and you need to eliminate any overhead
  • When implementing functionality not yet covered by higher-level wrappers

Example: Simple Extension with Direct rb-sys

Here's a simple example of a Ruby extension using direct rb-sys:

use rb_sys::{
rb_define_module, rb_define_module_function, rb_str_new_cstr,
rb_string_value_cstr, VALUE
};
use std::ffi::CString;
use std::os::raw::c_char;

// Helper macro for creating C strings
macro_rules! cstr {
($s:expr) => {
concat!($s, "\0").as_ptr() as *const c_char
};
}

// Reverse a string
unsafe extern "C" fn reverse(_: VALUE, s: VALUE) -> VALUE {
let mut s_copy = s;
let c_str = rb_string_value_cstr(&mut s_copy);
let rust_str = match std::ffi::CStr::from_ptr(c_str).to_str() {
Ok(s) => s,
Err(_) => return rb_str_new_cstr(c"".as_ptr()),
};
let reversed = rust_str.chars().rev().collect::<String>();

let c_string = match CString::new(reversed) {
Ok(s) => s,
Err(_) => return rb_str_new_cstr(c"".as_ptr()),
};
rb_str_new_cstr(c_string.as_ptr())
}

// Module initialization function
#[no_mangle]
pub extern "C" fn Init_string_utils() {
unsafe {
let module = rb_define_module(cstr!("StringUtils"));

rb_define_module_function(
module,
cstr!("reverse"),
Some(std::mem::transmute::<unsafe extern "C" fn(VALUE, VALUE) -> VALUE, _>(reverse)),
1,
);
}
}

Using rb_thread_call_without_gvl for Performance

When performing computationally intensive operations, it's important to release Ruby's Global VM Lock (GVL) to allow other threads to run. The rb_thread_call_without_gvl function provides this capability:

Thread-Safe Operations without the GVL
use magnus::{Error, Ruby, RString};
use rb_sys::rb_thread_call_without_gvl;
use std::{ffi::c_void, panic::{self, AssertUnwindSafe}, ptr::null_mut};

/// Execute a function without holding the Global VM Lock (GVL).
/// This allows other Ruby threads to run while performing CPU-intensive tasks.
///
/// # Safety
///
/// The passed function must not interact with the Ruby VM or Ruby objects
/// as it runs without the GVL, which is required for safe Ruby operations.
///
/// # Returns
///
/// Returns the result of the function or a magnus::Error if the function panics.
pub fn nogvl<F, R>(func: F) -> Result<R, Error>
where
F: FnOnce() -> R,
R: Send + 'static,
{
struct CallbackData<F, R> {
func: Option<F>,
result: Option<Result<R, String>>, // Store either the result or a panic message
}

extern "C" fn call_without_gvl<F, R>(data: *mut c_void) -> *mut c_void
where
F: FnOnce() -> R,
R: Send + 'static,
{
// Safety: We know this pointer is valid because we just created it below
let data = unsafe { &mut *(data as *mut CallbackData<F, R>) };

// Use take() to move out of the Option, ensuring we don't try to run the function twice
if let Some(func) = data.func.take() {
// Use panic::catch_unwind to prevent Ruby process termination if the Rust code panics
match panic::catch_unwind(AssertUnwindSafe(func)) {
Ok(result) => data.result = Some(Ok(result)),
Err(panic_info) => {
// Convert panic info to a string message
let panic_msg = if let Some(s) = panic_info.downcast_ref::<&'static str>() {
s.to_string()
} else if let Some(s) = panic_info.downcast_ref::<String>() {
s.clone()
} else {
"Unknown panic occurred in Rust code".to_string()
};

data.result = Some(Err(panic_msg));
}
}
}

null_mut()
}

// Create a data structure to pass the function and receive the result
let mut data = CallbackData {
func: Some(func),
result: None,
};

unsafe {
// Release the GVL and call our function
rb_thread_call_without_gvl(
Some(call_without_gvl::<F, R>),
&mut data as *mut _ as *mut c_void,
None, // No unblock function
null_mut(),
);
}

// Extract the result or create an error if the function failed
match data.result {
Some(Ok(result)) => Ok(result),
Some(Err(panic_msg)) => {
// Convert the panic message to a Ruby RuntimeError
let ruby = unsafe { Ruby::get_unchecked() };
Err(Error::new(
ruby.exception_runtime_error(),
format!("Rust panic in nogvl: {}", panic_msg)
))
},
None => {
// This should never happen if the callback runs, but handle it anyway
let ruby = unsafe { Ruby::get_unchecked() };
Err(Error::new(
ruby.exception_runtime_error(),
"nogvl function was not executed"
))
}
}
}

How Direct rb-sys Works

When using rb-sys directly:

  1. You define C-compatible functions with the extern "C" calling convention
  2. You manually convert between Ruby's VALUE type and Rust types
  3. You're responsible for memory management and type safety
  4. You must use the #[no_mangle] attribute on the initialization function so Ruby can find it
  5. All interactions with Ruby data happen through raw pointers and unsafe code

Higher-level Wrappers (Magnus)

Magnus provides a more ergonomic, Rust-like API on top of rb-sys. It handles many of the unsafe aspects of Ruby integration for you.

When to Use Magnus

  • For most standard Ruby extensions where ease of development is important
  • When you want to avoid writing unsafe code
  • When you want idiomatic Rust error handling
  • For extensions with complex type conversions
  • When working with Ruby classes and objects in an object-oriented way

Example: Simple Extension with Magnus

Let's look at a simple example using Magnus, based on real-world usage patterns:

use magnus::{function, prelude::*, Error, Ruby};

fn hello(subject: String) -> String {
format!("Hello from Rust, {subject}!")
}

#[magnus::init]
fn init(ruby: &Ruby) -> Result<(), Error> {
let module = ruby.define_module("StringUtils")?;
module.define_singleton_method("hello", function!(hello, 1))?;
Ok(())
}

Looking at a more complex example from a real-world project (lz4-flex-rb):

use magnus::{function, prelude::*, Error, RModule, Ruby, RString};

// Placeholder functions for the example
fn compress(input: RString) -> Result<RString, Error> {
// Compression implementation would go here
Ok(input)
}

fn decompress(input: RString) -> Result<RString, Error> {
// Decompression implementation would go here
Ok(input)
}

fn compress_varint(input: RString) -> Result<RString, Error> {
// VarInt compression implementation would go here
Ok(input)
}

fn decompress_varint(input: RString) -> Result<RString, Error> {
// VarInt decompression implementation would go here
Ok(input)
}

#[magnus::init]
fn init(ruby: &Ruby) -> Result<(), Error> {
let module = ruby.define_module("Lz4Flex")?;

// Define error classes
let base_error = module.define_error("Error", magnus::exception::standard_error())?;
let _ = module.define_error("EncodeError", base_error)?;
let _ = module.define_error("DecodeError", base_error)?;

// Define methods
module.define_singleton_method("compress", function!(compress, 1))?;
module.define_singleton_method("decompress", function!(decompress, 1))?;

// Define aliases
module.singleton_class()?.define_alias("deflate", "compress")?;
module.singleton_class()?.define_alias("inflate", "decompress")?;

// Define nested module
let varint_module = module.define_module("VarInt")?;
varint_module.define_singleton_method("compress", function!(compress_varint, 1))?;
varint_module.define_singleton_method("decompress", function!(decompress_varint, 1))?;

Ok(())
}

How Magnus Works

Magnus builds on top of rb-sys and provides:

  1. Automatic type conversions between Ruby and Rust
  2. Rust-like error handling with Result types
  3. Memory safety through RAII patterns
  4. More ergonomic APIs for defining modules, classes, and methods
  5. A more familiar development experience for Rust programmers

When to Choose Each Approach


```
Direct rb-sys:
✅ Maximum performance
✅ Low-level Ruby VM control
✅ Fine-grained GVL management
✅ Version-specific behavior

❌ Lots of unsafe code
❌ Manual memory management
❌ More verbose type conversions
❌ Steeper learning curve
```

Mixing Approaches

You can also mix the two approaches when appropriate. Magnus provides access to the underlying rb-sys functionality when needed:

use magnus::{function, prelude::*, Error, Ruby, value::ReprValue, IntoValue};
use std::os::raw::c_char;

fn high_level() -> String {
"High level".to_string()
}

// Helper macro for C strings
macro_rules! cstr {
($s:expr) => {
concat!($s, "\0").as_ptr() as *const c_char
};
}

unsafe extern "C" fn low_level(_: rb_sys::VALUE) -> rb_sys::VALUE {
// Direct rb-sys implementation
let c_string = match std::ffi::CString::new("Low level") {
Ok(s) => s,
Err(_) => return rb_sys::rb_str_new_cstr(c"".as_ptr()),
};
rb_sys::rb_str_new_cstr(c_string.as_ptr())
}

#[magnus::init]
fn init(ruby: &Ruby) -> Result<(), Error> {
let module = ruby.define_module("MixedExample")?;

// Use Magnus for most things
module.define_singleton_method("high_level", function!(high_level, 0))?;

// Use rb-sys directly for special cases
unsafe {
rb_sys::rb_define_module_function(
unsafe { std::mem::transmute::<_, rb_sys::VALUE>(module.as_value()) },
cstr!("low_level"),
Some(std::mem::transmute::<unsafe extern "C" fn(rb_sys::VALUE) -> rb_sys::VALUE, _>(low_level)),
0,
);
}

Ok(())
}

Enabling rb-sys Feature in Magnus

To access rb-sys through Magnus, enable the rb-sys feature:

# Cargo.toml
[dependencies]
magnus = { version = "0.7", features = ["rb-sys"] }

Common Mixing Patterns

  1. Use Magnus for most functionality, rb-sys for specific optimizations:

    • Define your public API using Magnus for safety and ease
    • Drop down to rb-sys in critical performance paths, especially when using nogvl
  2. Use rb-sys for core functionality, Magnus for complex conversions:

    • Build core functionality with rb-sys for maximum control
    • Use Magnus for handling complex Ruby objects or collections
  3. Start with Magnus, optimize with rb-sys over time:

    • Begin development with Magnus for rapid progress
    • Profile your code and replace hot paths with direct rb-sys

Real-World Examples

Let's look at how real projects decide between these approaches:

Blake3-Ruby (Direct rb-sys)

Blake3-Ruby is a cryptographic hashing library that uses direct rb-sys to achieve maximum performance:

// Based on blake3-ruby - simplified example showing direct rb-sys usage
use rb_sys::{
rb_define_module, rb_define_module_function,
rb_str_new, VALUE, RSTRING_LEN, RSTRING_PTR,
};
use std::os::raw::c_char;

// Helper macro for creating C strings
macro_rules! cstr {
($s:expr) => {
concat!($s, "\0").as_ptr() as *const c_char
};
}

#[no_mangle]
pub extern "C" fn Init_digest_ext() {
unsafe {
// Create module hierarchy
let digest_module = rb_define_module(cstr!("Digest"));

// Define methods directly using rb-sys for maximum performance
rb_define_module_function(
digest_module,
cstr!("simple_hash"),
Some(std::mem::transmute::<unsafe extern "C" fn(VALUE, VALUE) -> VALUE, _>(rb_simple_hash)),
1,
);
}
}

unsafe extern "C" fn rb_simple_hash(_klass: VALUE, string: VALUE) -> VALUE {
// Extract data from Ruby VALUE
let data_ptr = RSTRING_PTR(string) as *const u8;
let data_len = RSTRING_LEN(string) as usize;
let data_slice = std::slice::from_raw_parts(data_ptr, data_len);

// Simple hash calculation (just for demonstration)
let mut hash: u32 = 0;
for &byte in data_slice {
hash = hash.wrapping_mul(31).wrapping_add(byte as u32);
}

// Convert hash to string
let hash_str = format!("{:08x}", hash);
let hash_bytes = hash_str.as_bytes();

// Return result as Ruby string
rb_str_new(hash_bytes.as_ptr() as *const c_char, hash_bytes.len() as i64)
}

LZ4-Flex-RB (Mixed Approach)

The LZ4-Flex-RB gem demonstrates a more sophisticated approach mixing Magnus with direct rb-sys calls:

// Based on lz4-flex-rb
use magnus::{function, prelude::*, Error, RModule, Ruby, RString, value::ReprValue, IntoValue};
use rb_sys::{rb_str_locktmp, rb_str_unlocktmp, RSTRING_PTR, RSTRING_LEN};

#[magnus::init]
fn init(ruby: &Ruby) -> Result<(), Error> {
let module = ruby.define_module("Lz4Flex")?;

// High-level API using Magnus
module.define_singleton_method("compress", function!(compress, 1))?;
module.define_singleton_method("decompress", function!(decompress, 1))?;

Ok(())
}

// Functions that mix high-level Magnus with low-level rb-sys
fn compress(input: RString) -> Result<RString, Error> {
let input_locked = LockedRString::new(input);
let bufsize = lz4_flex::block::get_maximum_output_size(input_locked.as_slice().len());

// Create output buffer
let mut output_vec = vec![0u8; bufsize];

// Compress the data
let outsize = lz4_flex::block::compress_into(
input_locked.as_slice(),
&mut output_vec
).map_err(|e| Error::new(magnus::exception::standard_error(), e.to_string()))?;

// Resize to actual output size
output_vec.truncate(outsize);

// Convert to Ruby string
let ruby = unsafe { magnus::Ruby::get_unchecked() };
Ok(RString::from_slice(&output_vec))
}

fn decompress(input: RString) -> Result<RString, Error> {
let input_locked = LockedRString::new(input);

// Decompress the data (need to provide max output size)
let max_size = input_locked.as_slice().len() * 20; // Conservative estimate
let decompressed = lz4_flex::block::decompress(input_locked.as_slice(), max_size)
.map_err(|e| Error::new(magnus::exception::standard_error(), e.to_string()))?;

// Convert to Ruby string
let ruby = unsafe { magnus::Ruby::get_unchecked() };
Ok(RString::from_slice(&decompressed))
}

// Helper for locked RString (uses rb-sys directly)
struct LockedRString(RString);

impl LockedRString {
fn new(string: RString) -> Self {
unsafe { rb_str_locktmp(std::mem::transmute::<_, rb_sys::VALUE>(string.as_value())) };
Self(string)
}

fn as_slice(&self) -> &[u8] {
unsafe {
let ptr = RSTRING_PTR(std::mem::transmute::<_, rb_sys::VALUE>(self.0.as_value())) as *const u8;
let len = RSTRING_LEN(std::mem::transmute::<_, rb_sys::VALUE>(self.0.as_value())) as usize;
std::slice::from_raw_parts(ptr, len)
}
}
}

impl Drop for LockedRString {
fn drop(&mut self) {
unsafe { rb_str_unlocktmp(std::mem::transmute::<_, rb_sys::VALUE>(self.0.as_value())) };
}
}