File Cache#

Join provides memory-mapped file caching for efficient repeated access to file contents. It uses mmap() to map files directly into memory and automatically tracks file modifications, making it ideal for serving static content or frequently accessed files.

Cache is designed to be:

  • zero-copy — direct memory access via mmap()
  • automatic — tracks file modifications and invalidates stale entries
  • thread-safe — concurrent access protected by mutex
  • efficient — avoids redundant file I/O

Key features#

FeatureDescription
Memory mappingUses mmap() for zero-copy access
Automatic invalidationDetects file changes by modification time
Thread-safe operationsProtected by internal mutex
Lazy loadingFiles mapped only when requested
Resource managementAutomatic cleanup on destruction

Basic usage#

Creating a cache#

#include <join/cache.hpp>

Cache cache;

The cache is thread-safe and can be shared across multiple threads.

Getting cached file#

Cache cache;

struct stat sbuf;
void* data = cache.get("/path/to/file.txt", sbuf);

if (data != nullptr)
{
    // File successfully cached
    size_t fileSize = sbuf.st_size;

    // Access file contents directly
    const char* content = static_cast<const char*>(data);
    processData(content, fileSize);
}

Basic lifecycle#

Cache cache;

// First access - file is read and mapped
struct stat sbuf;
void* data1 = cache.get("file.txt", sbuf);

// Second access - returned from cache (no I/O)
void* data2 = cache.get("file.txt", sbuf);
// data1 == data2

// File modified externally...

// Next access - detects change, remaps file
void* data3 = cache.get("file.txt", sbuf);
// data3 != data1 (new mapping)

Cache operations#

Get or create entry#

Cache cache;

struct stat sbuf;
void* ptr = cache.get(filename, sbuf);

if (ptr == nullptr)
{
    // File doesn't exist, is a directory, or mapping failed
    handleError();
}
else
{
    // Use cached data
    processFile(ptr, sbuf.st_size);
}

The get() method:

  1. Checks if file exists and is not a directory
  2. Looks for existing cache entry
  3. Validates entry against file modification time
  4. Returns cached data if valid
  5. Remaps file if changed or not cached
  6. Returns pointer to mapped memory

Remove entry#

Cache cache;

// Remove specific file from cache
cache.remove("/path/to/file.txt");

// File is unmapped and entry is deleted
// Next get() will remap the file

Clear all entries#

Cache cache;

// Clear entire cache
cache.clear();

// All files are unmapped
// Cache is empty

Query cache size#

Cache cache;

size_t count = cache.size();
// Returns number of cached files

Modification tracking#

The cache automatically detects file changes using modification timestamps.

How it works#

Cache cache;

// Initial load
struct stat sbuf1;
void* data1 = cache.get("file.txt", sbuf1);

// File modified externally (via editor, build system, etc.)
system("echo 'new content' > file.txt");

// Next access detects change
struct stat sbuf2;
void* data2 = cache.get("file.txt", sbuf2);

// Cache automatically:
// 1. Detected modification time changed
// 2. Unmapped old version
// 3. Mapped new version
// 4. Returned pointer to new data

Validation criteria#

// Entry is valid if BOTH match:
if (cached.modifTime.tv_sec  == stat.st_ctim.tv_sec &&
    cached.modifTime.tv_nsec == stat.st_ctim.tv_nsec)
{
    // Use cached version
}
else
{
    // Remap file
}

Thread safety#

All cache operations are thread-safe.

Concurrent access#

Cache cache;  // Shared across threads

// Thread 1
std::thread t1([&cache]() {
    struct stat sbuf;
    void* data = cache.get("file1.txt", sbuf);
    process(data, sbuf.st_size);
});

// Thread 2
std::thread t2([&cache]() {
    struct stat sbuf;
    void* data = cache.get("file2.txt", sbuf);
    process(data, sbuf.st_size);
});

// Thread 3
std::thread t3([&cache]() {
    cache.remove("file3.txt");
});

t1.join();
t2.join();
t3.join();

Thread safety guarantees#

  • All operations are atomic with respect to the cache state
  • Internal mutex protects the entries map
  • Multiple threads can safely call any cache method
  • However, the returned pointers are NOT protected - see safety considerations below

Use cases#

Web server static files#

class HttpServer
{
    Cache fileCache;

public:
    void serveStaticFile(const std::string& path, HttpResponse& response)
    {
        struct stat sbuf;
        void* data = fileCache.get(path, sbuf);
        if (data == nullptr)
        {
            response.status(404);
            return;
        }

        response.status(200);
        response.header("Content-Length", std::to_string(sbuf.st_size));
        response.write(data, sbuf.st_size);
    }
};

Performance characteristics#

Memory mapping benefits#

OperationTraditional I/OMemory-mapped
First accessread() + copymmap()
Subsequent accessread() + copypointer deref
Memory overhead2x file size1x file size
Page faultsN/AOn-demand

Cache efficiency#

// Traditional approach (repeated I/O)
for (int i = 0; i < 1000; ++i)
{
    std::ifstream file("data.txt");
    std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
    process(content);
}
// 1000 file opens, 1000 reads

// Cached approach (single mmap)
Cache cache;
for (int i = 0; i < 1000; ++i)
{
    struct stat sbuf;
    void* data = cache.get("data.txt", sbuf);
    process(data, sbuf.st_size);
}
// 1 file open, 1 mmap, 999 cache hits

Memory characteristics#

  • Files are mapped MAP_PRIVATE with PROT_READ
  • Kernel manages page-in/page-out automatically
  • Multiple processes can share the same physical pages
  • Unused pages can be evicted by kernel when memory is tight

Safety considerations#

Pointer validity#

Cache cache;

struct stat sbuf;
void* ptr1 = cache.get("file.txt", sbuf);

// Pointer is valid...

// File modified externally
modifyFile("file.txt");

// Next get invalidates ptr1!
void* ptr2 = cache.get("file.txt", sbuf);

// ptr1 is now DANGLING - do not use!
// ptr2 points to new mapping

Important: Once a file is remapped, previous pointers become invalid.

Safe usage pattern#

Cache cache;

void processFile(const std::string& path)
{
    struct stat sbuf;
    void* data = cache.get(path, sbuf);

    if (data != nullptr)
    {
        // Use data immediately
        process(data, sbuf.st_size);
        // Don't store pointer for later use
    }
}

Best practices#

Use for read-only files#

// Good: Read-only access to static content
Cache cache;
void* data = cache.get("/var/www/index.html", sbuf);

// Bad: Don't use for files you're actively writing to
// (cache won't see changes until next get())

Don’t store pointers#

// Bad: Storing pointer
class BadExample
{
    void* cachedData;

    void init(Cache& cache)
    {
        struct stat sbuf;
        cachedData = cache.get("file.txt", sbuf);  // Dangerous!
    }
};

// Good: Get pointer when needed
class GoodExample
{
    Cache& cache;
    std::string filename;

    void process()
    {
        struct stat sbuf;
        void* data = cache.get(filename, sbuf);
        if (data) use(data, sbuf.st_size);
    }
};

Check for null#

// Always check return value
struct stat sbuf;
void* data = cache.get(path, sbuf);

if (data == nullptr)
{
    // Handle error: file missing, permission denied, etc.
    return;
}

// Safe to use data here

Share cache instance#

// Good: Single shared cache
class Application
{
    Cache sharedCache;

    void handleRequest1()
    {
        useCache(sharedCache);
    }

    void handleRequest2()
    {
        useCache(sharedCache);
    }
};

Limitations#

File type restrictions#

// Won't cache directories
void* data = cache.get("/path/to/dir", sbuf);  // Returns nullptr

// Only regular files are cached
S_ISDIR(sbuf.st_mode) == false

Memory considerations#

  • Large files consume address space
  • 32-bit systems limited to ~2-3 GB addressable memory
  • On 64-bit systems, address space is not a concern

No write support#

  • Cache is read-only (PROT_READ)
  • Modifications require reopening file for writing
  • Cache should be used for static or infrequently changed files

Summary#

FeatureSupported
Memory-mapped file access
Automatic invalidation
Thread-safe operations
Modification tracking
Zero-copy access
Multiple file caching
Lazy loading
Automatic cleanup
Write support