2024-07-26
#caching
#redis
#backend
#performance
#system-design

Caching & Redis: Theoretical Squeeze

Deep dive into Caching principles, CPU vs RAM latency, Cache strategies, and Redis internals.

The Essence of Caching

The Problem: Latency Gap

The primary bottleneck in program execution is often memory access, not processing power.

  • CPU: Executes operations in nanoseconds (< 1ns).
  • RAM: Access takes ~100ns (100x slower).
  • Disk/Network: Access takes milliseconds (1,000,000x slower).

The CPU spends significant time waiting for data. Solution: Add a small, ultra-fast memory layer close to the processor (L1/L2/L3 Cache) or application (RAM Cache vs DB).

Definition

Caching reduces access latency by storing frequently used data in a faster storage medium.

  • Cache Hit: Data is found in the cache (Fast).
  • Cache Miss: Data is not found, must be fetched from the slow source (Slow).

Write Strategies

When data changes, how do we update the cache and the source of truth?

  1. Write Through

    • Data is written to both the Cache and the Source (DB) simultaneously.
    • Pros: High data consistency.
    • Cons: Slower writes (wait for both).
  2. Write Back (Write Behind)

    • Data is written only to the Cache initially. It is synced to the Source later (asynchronously).
    • Pros: Extremely fast writes.
    • Cons: Risk of data loss if power fails before sync.

Cache States & Warming

Cold Cache

  • The cache is empty or contains irrelevant data.
  • User requests result in Cache Misses (slow performance).

Hot Cache

  • The cache contains relevant, frequently accessed data.
  • User requests result in Cache Hits (fast performance).

Cache Warming

  • The process of pre-populating the cache with data before users request it.
  • Goal: Ensure users always hit a "Hot Cache".
  • Methods:
    • Internal: The app loads data on startup.
    • External: Scripts/Crawlers simulate user traffic to populate the cache.

Redis (Remote Dictionary Server)

Why Redis?

Relational Databases (HDD/SSD) are too slow for high-load scenarios (100k+ RPS). We need a storage system that lives entirely in RAM.

Redis vs Memcached

Memcached

  • Pros: Extremely fast, simple multi-threaded architecture.
  • Cons: Volatile. If the server restarts, all data is lost. Limited data types (strings only).
  • Use Case: Simple session caching, temporary page fragments.

Redis

  • Pros:
    • Persistence: Can save data to disk (RDB snapshots, AOF logs).
    • Data Structures: Supports Lists, Sets, Hashes, etc.
    • Replication: Master-Slave support out of the box.
  • Use Case: Caching, Message Broker, Leaderboards, Session Store, Real-time analytics.

Redis Internals

Redis is a NoSQL In-Memory Key-Value Store.

  • Single-threaded event loop (no locking issues, but heavy commands block everyone).
  • Atomic operations.

Data Types

  1. String: Basic text or binary data (Images, JSON). Max 512MB.
    • Ops: SET, GET, INCR.
  2. List: Linked lists (efficient head/tail operations).
    • Ops: LPUSH, RPOP. Good for Queues.
  3. Set: Unordered collection of unique strings.
    • Ops: SADD, SINTER (Intersection).
  4. Sorted Set (ZSet): Sets with a "score" for sorting.
    • Ops: ZADD, ZRANGE. Good for Leaderboards.
  5. Hash: Maps string fields to string values (Objects).
    • Ops: HSET, HGET.
  6. Pub/Sub: Message passing system (not stored).
    • Ops: PUBLISH, SUBSCRIBE.

Cheat Sheet: Common Commands

Strings & TTL

php
// Basics
$redis->set('currency:USD', 100);
$redis->get('currency:USD'); // 100

// Existence & Atomic
$redis->setnx('lock:user:1', 'locked'); // Set ONLY if not exists (Distributed Lock)
$redis->mset(['key1' => 'val1', 'key2' => 'val2']); // Batch set

// Expiration (TTL)
$redis->set('otp:123', '5555', 60); // Store for 60 seconds
$redis->expire('otp:123', 30); // Update TTL
$redis->ttl('otp:123'); // Check remaining time
$redis->persist('otp:123'); // Remove expiration (make permanent)

Counters (Atomic)

php
$redis->incr('page:views'); // +1
$redis->incrBy('page:views', 10); // +10
$redis->decr('stock:items'); // -1

Lists (Queues)

php
$redis->lPush('queue:emails', 'user@example.com'); // Add to Left
$redis->rPop('queue:emails'); // Remove from Right (FIFO)

$redis->lRange('queue:emails', 0, -1); // Get all items
$redis->lLen('queue:emails'); // Get length

Scaling Redis

1. Persistence

  • RDB (Snapshot): Saves DB state to disk every N minutes. Compact, faster restore.
  • AOF (Append Only File): Logs every write command. Slower, but higher durability (less data loss).

2. Replication (Master-Slave)

  • Master: Handles Writes.
  • Slaves: Replicate Master, handle Reads.
  • Improves Read scalability and Redundancy.

3. Sharding (Partitioning)

Distributing data across multiple Redis instances.

  • Method: Hash the key (e.g., CRC32(key) % N_SERVERS).
  • Benefit: Horizontal scaling of RAM and Write throughput.
  • Trade-off: Cannot perform multi-key operations (transactions) across different shards.

4. Redis Cluster

Native distributed implementation that handles sharding and replication automatically.


Resources

Connected Thoughts

Egor Zdioruc | Lead Full Stack Developer | Laravel & AI Solutions