Open Table of contents

Conclusion

The Problem with General Purpose Allocators

In Part 1, we examined how the OS manages memory. Now we will look at the tools we use to access it, new in C++ and malloc in C.

These functions are general-purpose. They must handle any allocation size and any allocation pattern. To do this, they incur significant overhead.

When you call new, several things happen.

Locking The allocator must acquire a lock to be thread-safe. This causes contention in multithreaded programs.
Searching It must search its data structures (like free lists) to find a suitable block. This search is not free.
Metadata It writes a header next to the allocation to track its size. This wastes memory and pollutes the cache.
Unlocking It releases the lock.

Every single allocation pays this tax. Deallocation is just as complex, involving more locking and manipulation of the free lists. This overhead is the price of flexibility.

For high-performance applications like game engines or servers, this per-allocation cost is too high. We need a specialized solution, the Arena.

The Arena Allocator

A Memory Arena (also known as a linear allocator) uses a simple strategy.

Allocate a large block of memory upfront. To allocate an object, increment a pointer. To free memory, reset the entire arena.

This approach offers three main benefits.

Performance Allocation involves only a pointer increment. There are no locks and no searches. It is one of the fastest ways to allocate memory.
Cache Efficiency Objects are packed contiguously in memory. There is no metadata header between them. This improves cache performance.
Simplified Lifetimes All objects in the arena share the same lifetime. You do not free them individually. You reset the arena when the task is done. This eliminates memory leaks and double-free bugs for that data.

Typical Heap (new/malloc) Arena Allocator

Implementation

We’ll build an arena using the virtual memory techniques from Part 1. We will reserve a large address space but only commit physical memory as needed.

struct Arena {
    char* base_ptr;          // Start of the reservation
    size_t reserved_size;    // Total size (e.g., 1GB)
    size_t committed_size;   // Currently committed memory
    size_t current_offset;   // Bump pointer
};

Step 1 - Creation

The CreateArena function reserves the virtual address space. It does not commit any physical memory yet.

// A simplified, cross-platform implementation
#ifdef _WIN32
#include <windows.h>
#else
#include <sys/mman.h>
#include <unistd.h>
#endif

// Dynamically get the OS page size once.
const size_t PAGE_SIZE = []() {
#ifdef _WIN32
    SYSTEM_INFO sysInfo;
    GetSystemInfo(&sysInfo);
    return sysInfo.dwPageSize;
#else
    return (size_t)sysconf(_SC_PAGESIZE);
#endif
}();

Arena* CreateArena(size_t reserve_size) {
    Arena* arena = (Arena*)malloc(sizeof(Arena));
    if (!arena) return NULL;

    // Align reservation up to the nearest page size
    reserve_size = (reserve_size + PAGE_SIZE - 1) / PAGE_SIZE * PAGE_SIZE;

#ifdef _WIN32
    void* block = VirtualAlloc(NULL, reserve_size, MEM_RESERVE, PAGE_NOACCESS);
    if (!block) {
        free(arena);
        return NULL;
    }
#else
    void* block = mmap(NULL, reserve_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (block == MAP_FAILED) {
        free(arena);
        return NULL;
    }
#endif

    arena->base_ptr = (char*)block;
    arena->reserved_size = reserve_size;
    arena->committed_size = 0;
    arena->current_offset = 0;

    return arena;
}

Step 2 - Allocation

The ArenaAlloc function checks if there is enough committed memory. If not, it commits more. Then it increments the offset.

The Allocation Process

void* ArenaAlloc(Arena* arena, size_t size) {
    if (!arena || size == 0) return nullptr;
    
    size_t new_offset = arena->current_offset + size;

    if (new_offset > arena->reserved_size) {
        return NULL; // Out of reserved space
    }

    if (new_offset > arena->committed_size) {
        // Align the required commit size up to the nearest page
        size_t new_commit_target = (new_offset + PAGE_SIZE - 1) / PAGE_SIZE * PAGE_SIZE;
        // Clamp to the reservation limit
        if (new_commit_target > arena->reserved_size) {
            new_commit_target = arena->reserved_size;
        }

        size_t size_to_commit = new_commit_target - arena->committed_size;
        void* commit_start_addr = arena->base_ptr + arena->committed_size;

#ifdef _WIN32
        if (!VirtualAlloc(commit_start_addr, size_to_commit, MEM_COMMIT, PAGE_READWRITE)) {
            return NULL;
        }
#else
        if (mprotect(commit_start_addr, size_to_commit, PROT_READ | PROT_WRITE) != 0) {
            return NULL;
        }
#endif
        arena->committed_size = new_commit_target;
    }

    void* memory = arena->base_ptr + arena->current_offset;
    arena->current_offset = new_offset;

    return memory;
}

Step 3 - Reset

To free all objects in the arena, we simply reset the offset to zero.

void ArenaReset(Arena* arena) {
    arena->current_offset = 0;
}

To release the memory back to the OS completely.

void ArenaRelease(Arena* arena) {
#ifdef _WIN32
    VirtualFree(arena->base_ptr, 0, MEM_RELEASE);
#else
    munmap(arena->base_ptr, arena->reserved_size);
#endif
    free(arena);
}

Example - The Game Loop

Arenas are ideal for per-frame allocations in a game loop.

Consider a typical game frame. You need to allocate memory for particle effects, UI elements, temporary physics data, and AI command lists. All of this memory is needed for just 16 milliseconds, then it’s all garbage. This is the perfect use case for an arena.

// At the start of the game
Arena* frame_arena = CreateArena(1024 * 1024 * 64); // 64MB frame arena

// --- Main Game Loop ---
while (game_is_running) {
    // --- Start of Frame ---
    ArenaReset(frame_arena);

    // Allocate memory for various systems using our lightning-fast allocator
    Particle* particles = (Particle*)ArenaAlloc(frame_arena, sizeof(Particle) * 1000);
    UICommand* ui_cmds = (UICommand*)ArenaAlloc(frame_arena, sizeof(UICommand) * 50);
    TempCollisionData* coll_data = (TempCollisionData*)ArenaAlloc(frame_arena, sizeof(TempCollisionData) * 200);

    // ... do all game logic, physics, and rendering ...
    // All the pointers above are used here.

    // --- End of Frame ---
    // No need to call free() or delete on anything.
    // The ArenaReset at the top of the loop already handled it.
}

// At the end of the game
ArenaRelease(frame_arena);

We just serviced thousands of potential allocations and deallocations with two trivial function calls per frame.

Scoped Arenas for Nested Lifetimes

The frame arena is a powerful pattern, but what happens when a function needs its own temporary memory within a frame?

Imagine a function ProcessPhysicsCollisions() that needs to generate a temporary list of potential collision pairs. It could allocate from the main frame_arena, but then it permanently consumes that space for the rest of the frame, even though the data is only needed for a few milliseconds.

This is where the concept of a Scoped Arena (or scratch arena) comes in. Instead of creating a whole new virtual memory arena, we create a temporary, lightweight allocator that “borrows” memory from a parent arena. The mechanism is beautifully simple: we just save the state of the parent arena’s pointer and restore it when we’re done.

We can wrap this logic in a clean C++ RAII object:

// A temporary "save point" for a parent arena
struct ScopedArena {
    Arena* parent_arena;
    size_t original_offset;

    // Constructor saves the parent's current state
    ScopedArena(Arena* parent) {
        parent_arena = parent;
        original_offset = parent->current_offset;
    }

    // Destructor restores the parent's state, effectively "freeing"
    // all memory allocated during this scope's lifetime.
    ~ScopedArena() {
        parent_arena->current_offset = original_offset;
    }

    // We can even add an Alloc method for convenience
    void* Alloc(size_t size) {
        return ArenaAlloc(parent_arena, size);
    }
};

Now, our ProcessPhysicsCollisions function can have its own clean, isolated memory space without any long-term impact on the frame arena.

void ProcessPhysicsCollisions(Arena* frame_arena) {
    // All memory allocated inside this block will be automatically
    // reclaimed when 'scope' goes out of scope.
    ScopedArena scope(frame_arena);

    // This allocation is temporary
    PotentialPair* pairs = (PotentialPair*)scope.Alloc(sizeof(PotentialPair) * 1000);

    // ... do complex work with the 'pairs' list ...

} // <-- ~ScopedArena() is called here, 'frame_arena->current_offset' is reset!

Scoped Arena

The Performance Showdown: By the Numbers

Let’s benchmark this. We’ll perform a stress test: allocate 1 million small, randomly-sized objects, then free them all. This is a nightmare scenario for malloc due to its overhead and search complexity.

The Results:

Test	Allocation Time	Deallocation Time
`malloc`/`free`	~8.77 ms	~15.07 ms
Arena Allocator	~0.84 ms	~0.042 µs (42 ns)

Ran on a M4 Pro 12-Core Macbook

The arena is dramatically faster. Allocations were 10.44x faster because they’re just pointer bumps. Deallocation was 358,000x faster because it’s a single integer assignment, so fast we measure it in nanoseconds.

Embracing the Arena’s “Limitations”

At first glance, the arena’s “all or nothing” deallocation seems like a major limitation. It’s natural to think they’re only useful for niche cases or trivial, throwaway data. But this perspective misses the bigger picture. In high-performance software like game engines, the vast majority of allocations are not random, they are grouped by a well-defined lifetime.

Think about it:

All the data needed to render a single frame.
All the assets and state loaded for a specific game level.
All the temporary data generated to process a physics tick.

These are not individual, unrelated allocations. They are groups of objects that are born together and die together. This is the exact pattern the arena is designed to master. The “limitation” of no individual free becomes a feature, forcing you to think about memory in terms of lifetimes, which leads to simpler, more robust code. It structurally eliminates entire classes of bugs like use-after-free and memory leaks from forgotten delete calls.

But what about truly temporary, scratch memory?

This is where the ScopedArena pattern shines. It’s the standard solution for handling nested lifetimes. If your main frame_arena lives for 16ms, but a function within that frame needs memory for only 50 microseconds, you create a ScopedArena. This gives you a “scratch pad” that borrows from the main arena and automatically cleans itself up when the function ends, without permanently consuming space from the frame’s budget. It’s the perfect way to handle temporary data without compromising the efficiency of the parent arena.

Conclusion

The arena allocator is not a drop-in replacement for every call to new. It’s a foundational tool that requires a shift in thinking, from managing individual objects to managing object lifetimes. Once you make that shift, you’ll find that arenas can handle almost all of the allocations in a typical game loop, making your code faster, safer, and easier to reason about.

And for that remaining, long-lived objects, that truly do need individual deallocation? As we’ll see in Part 3, the arena still serves as the perfect, high-performance foundation upon which we can build even more powerful, specialized allocators.

Memory Magic Part 2: The Arena Allocator

Table of contents