System Design Complete Study Guide
Everything you need to ace system design interviews, combined from two essential resources
1 Scaling Fundamentals
Single Server to Distributed System
Every large-scale system starts with a single server handling web, database, and cache. As traffic grows, you progressively separate concerns and add redundancy.
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up)
Add more CPU, RAM, or storage to a single machine. Simple but has hard limits — you can't add infinite resources to one server. Also creates a single point of failure.
Simple LimitedHorizontal Scaling (Scale Out)
Add more servers to the pool. More complex to implement but virtually unlimited. This is the approach large-scale systems use. Requires load balancing and stateless design.
Scalable ComplexKey Components
Load Balancer
Distributes incoming traffic across multiple servers using a public IP. Servers behind it use private IPs only. If one server goes down, traffic reroutes to healthy ones. Eliminates SPOF at the web tier.
Database Replication (Master-Slave)
Master handles all writes; slaves handle reads. Since most workloads are read-heavy (often 10:1 ratio), this scales reads efficiently. If a slave dies, reads go to other slaves. If master dies, a slave gets promoted.
CDN (Content Delivery Network)
Geographically distributed cache for static assets (images, CSS, JS, videos). Users fetch from the nearest edge server. Key considerations: TTL (time-to-live), cache invalidation strategies, cost (don't cache infrequently accessed content), and fallback to origin server.
Stateless Web Tier
Move session data out of individual servers into a shared data store (Redis, Memcached, or NoSQL). Any server can handle any request. This makes horizontal scaling trivial — just add/remove servers behind the load balancer.
Data Centers & GeoDNS
Multiple data centers in different regions. GeoDNS routes users to the nearest one. Challenges include data synchronization across centers, testing across different regions, and automated failover.
Database Sharding
Split data across multiple databases using a partition key (e.g., user_id % num_shards). Challenges include resharding when data grows unevenly, the celebrity/hotspot problem (one shard gets disproportionate traffic), and cross-shard joins (solved via denormalization).
2 Back-of-Envelope Estimation
Powers of 2 — Data Volume Quick Reference
| Unit | Approx | Bytes |
|---|---|---|
| 1 KB | Thousand | 103 |
| 1 MB | Million | 106 |
| 1 GB | Billion | 109 |
| 1 TB | Trillion | 1012 |
| 1 PB | Quadrillion | 1015 |
Latency Numbers Every Engineer Should Know
| Operation | Latency |
|---|---|
| L1 cache reference | ~1 ns |
| L2 cache reference | ~4 ns |
| Main memory reference | ~100 ns |
| SSD random read | ~150 μs |
| HDD seek | ~10 ms |
| Send packet CA → Netherlands → CA | ~150 ms |
Availability SLAs (The Nines)
| Availability | Downtime/Year |
|---|---|
| 99% (two 9s) | 3.65 days |
| 99.9% (three 9s) | 8.77 hours |
| 99.99% (four 9s) | 52.6 minutes |
| 99.999% (five 9s) | 5.26 minutes |
3 The 4-Step Interview Framework
| Step | Time | What To Do |
|---|---|---|
| 1. Understand & Scope | 3-10 min | Ask clarifying questions. Define features, users, scale, constraints. Never jump into design without scoping. |
| 2. High-Level Design | 10-15 min | Draw the architecture diagram. Identify core components: APIs, servers, databases, caches, queues. Get buy-in from interviewer. |
| 3. Deep Dive | 10-25 min | Pick 2-3 components to discuss in depth. Talk about trade-offs, edge cases, bottlenecks, failure modes, and scaling. |
| 4. Wrap Up | 3-5 min | Summarize design. Discuss potential improvements, error handling, monitoring, and operational concerns. |
4 CAP Theorem
In a distributed system, you can only guarantee two of three properties when a network partition occurs:
C — Consistency
All nodes see the same data at the same time. Every read returns the most recent write.
A — Availability
Every request gets a response (not error), even if it might be stale data.
P — Partition Tolerance
System continues operating despite network partitions between nodes. In real distributed systems, partitions will happen, so you must always have P.
5 Consistent Hashing
The Rehashing Problem
Simple modular hashing (key % N servers) breaks when you add or remove servers — almost all keys get remapped. Consistent hashing solves this by remapping only K/N keys on average (K = total keys, N = servers).
How It Works
Imagine a circular hash space (0 to 232). Both servers and keys are placed on this ring using a hash function. Each key is assigned to the first server encountered going clockwise. When a server is added or removed, only the keys in its immediate range need redistribution.
Virtual Nodes
Each physical server gets multiple positions (virtual nodes) on the ring. This ensures more even distribution — without virtual nodes, servers can get very uneven loads. More virtual nodes = better balance but more memory for the lookup table.
6 Key-Value Store Design
Core Components
- Data Partition: Consistent hashing distributes keys across nodes
- Replication: N replicas across distinct data centers for fault tolerance
- Consistency: Quorum-based — W (write quorum) + R (read quorum) > N guarantees strong consistency
- Conflict Resolution: Vector clocks track version history; client resolves conflicts
- Failure Detection: Gossip protocol (decentralized heartbeat propagation)
Quorum Configuration Trade-offs
| Config | Optimizes For | Guarantee |
|---|---|---|
| W=1, R=N | Fast writes | Strong consistency |
| W=N, R=1 | Fast reads | Strong consistency |
| W=2, R=2, N=3 | Balanced | Strong consistency (W+R > N) |
| W=1, R=1 | Maximum speed | Eventual consistency |
Failure Handling Mechanisms
- Sloppy Quorum + Hinted Handoff: During temp failures, nearby healthy nodes accept writes temporarily and hand data back when the failed node returns.
- Anti-Entropy + Merkle Trees: For permanent failures. Merkle trees let nodes efficiently compare data by hashing subtrees — only sync branches that differ.
- Write Path: Commit log → Memory cache → SSTable flush to disk
- Read Path: Memory cache → Bloom filter check → SSTable lookup
7 Communication Protocols
API Architectural Styles Comparison
| Style | Format | Best For | Key Trait |
|---|---|---|---|
| REST | JSON/XML | Web APIs, CRUD | Resource-oriented, HTTP methods |
| GraphQL | JSON | Complex queries, mobile | Client specifies exact data needed |
| gRPC | Protocol Buffers | Microservices | 5x faster than JSON, binary format |
| WebSocket | Any | Real-time (chat, games) | Bidirectional, persistent connection |
| SOAP | XML only | Enterprise/legacy | Strict contracts, WS-Security |
| Webhook | JSON | Event notifications | Push-based, eliminates polling |
HTTP Evolution
| Version | Year | Key Improvement |
|---|---|---|
| HTTP/1.0 | 1996 | New TCP connection per request |
| HTTP/1.1 | 1997 | Persistent connections (Keep-Alive), pipelining |
| HTTP/2.0 | 2015 | Multiplexing on single TCP, server push, binary framing |
| HTTP/3.0 | 2020 | QUIC (UDP-based), no head-of-line blocking |
8 API Design & API Gateway
API Design Best Practices
- Use nouns for resources:
GET /carts/123notGET /queryCarts/123 - Use plurals:
GET /carts/123 - Versioning:
GET /v1/carts/123 - Pagination:
GET /carts?pageSize=20&pageToken=abc - Filtering:
GET /items?filter=color:red - Sorting:
GET /items?sort_by=time - Idempotency: Use request IDs to prevent duplicate operations
- Resource cross-references:
GET /carts/123/items/321
API Gateway Functions
An API gateway sits between clients and microservices, handling: parameter validation, auth, rate limiting, service discovery, dynamic routing, protocol conversion, error handling/circuit breaking, logging (ELK stack), and caching (Redis).
5 API Performance Tricks
- Pagination: Break large results into pages
- Async Logging: Buffer logs, flush periodically to reduce I/O
- Caching: Cache frequent queries (Redis/Memcached)
- Payload Compression: gzip reduces data size significantly
- Connection Pooling: Reuse database connections
HTTP Status Codes Cheat Sheet
| Range | Meaning | Common Codes |
|---|---|---|
| 2xx | Success | 200 OK, 201 Created, 204 No Content |
| 3xx | Redirect | 301 Permanent, 302 Temporary |
| 4xx | Client Error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests |
| 5xx | Server Error | 500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout |
9 Load Balancing
Algorithms Comparison
| Type | Algorithm | How It Works | Best For |
|---|---|---|---|
| Static | Round Robin | Sequential rotation | Stateless, equal servers |
| Sticky Round Robin | Same client → same server | Session affinity needed | |
| Weighted Round Robin | More traffic to stronger servers | Heterogeneous hardware | |
| IP/URL Hash | Hash determines server | Consistent routing | |
| Dynamic | Least Connections | Route to least busy | Varying request costs |
| Least Response Time | Route to fastest responder | Latency-sensitive apps |
Forward Proxy vs Reverse Proxy
Forward Proxy
Sits between users and internet. Protects clients: bypass restrictions, block content, hide identity.
Reverse Proxy (e.g., Nginx)
Sits between internet and servers. Protects servers: load balancing, DDoS protection, SSL termination, static caching.
10 Caching Strategies
Five Caching Patterns
| Pattern | How It Works | Pros | Cons |
|---|---|---|---|
| Cache-Aside | App checks cache; on miss, reads DB, writes to cache | Simple, resilient to cache failure | Cache miss = 3 trips |
| Read-Through | Cache auto-loads from DB on miss | App logic simplified | First request always slow |
| Write-Around | Writes go to DB, cache updated on read | Avoids caching unread data | Cache miss on recent writes |
| Write-Back | Writes go to cache first; async flush to DB | Very fast writes | Data loss risk if cache fails |
| Write-Through | Writes to cache AND DB synchronously | Strong consistency | Higher write latency |
Where Caching Happens (8 Layers)
- Client/browser cache
- CDN (static content)
- Load balancer cache
- API Gateway cache
- Application-level cache (CPU/RAM/disk)
- Distributed cache (Redis, Memcached)
- Full-text search (Elasticsearch)
- Database-level (buffer pool, WAL, materialized views)
Cache Eviction Policies
LRU (Least Recently Used) is the most common. Also: LFU (Least Frequently Used), FIFO (First In First Out). Choose based on your access patterns.
Redis — Why It's Fast
- RAM-based storage (1000x faster than disk)
- I/O multiplexing with single-threaded execution (no lock overhead)
- Efficient data structures: SDS, SkipList, ZipList, HashTable, IntSet
Redis Use Cases by Data Type
| Type | Use Cases |
|---|---|
| String | Session, cache, distributed lock, counter, rate limiter |
| Hash | Shopping cart, user profiles |
| List | Message queue, activity feed |
| Set | Tags, unique visitors |
| Sorted Set | Leaderboards, rankings |
| Bitmap | User retention tracking, feature flags |
11 Databases & Storage
SQL vs NoSQL Decision Guide
| Factor | SQL (Relational) | NoSQL |
|---|---|---|
| Data Model | Structured, schema-enforced | Flexible schema, document/KV/graph |
| Scaling | Vertical (mostly) | Horizontal (built-in) |
| Consistency | ACID guaranteed | Eventual consistency (usually) |
| Best For | Transactions, joins, complex queries | Scale, flexibility, speed |
| Examples | MySQL, PostgreSQL, Oracle | MongoDB, Cassandra, DynamoDB, Redis |
8 Data Structures Powering Databases
| Structure | Type | Used By |
|---|---|---|
| Skip List | In-memory index | Redis |
| Hash Index | In-memory | General key-value lookup |
| SSTable | On-disk, immutable | Component of LSM trees |
| LSM Tree | SkipList + SSTable | Cassandra, RocksDB (high write throughput) |
| B-tree / B+ tree | Disk-based index | MySQL, PostgreSQL (balanced read/write) |
| Inverted Index | Document search | Elasticsearch, Lucene |
| Suffix Tree | String pattern matching | Text search engines |
| R-tree | Multi-dimensional | PostGIS, geospatial queries |
ACID Properties
- Atomicity: Transaction either fully completes or fully rolls back
- Consistency: Database moves from one valid state to another
- Isolation: Concurrent transactions don't interfere with each other
- Durability: Once committed, data survives crashes
Cloud Database Cheat Sheet
| Type | AWS | Azure | |
|---|---|---|---|
| Relational | RDS | SQL Database | Cloud SQL |
| Key-Value | DynamoDB | Cosmos DB | BigTable |
| Document | DocumentDB | Cosmos DB | Firestore |
| In-Memory | ElastiCache | Cache for Redis | Memorystore |
| Object/Blob | S3 | Blob Storage | Cloud Storage |
| Analytics | Redshift | Synapse | BigQuery |
| Graph | Neptune | Cosmos DB | Neo4j (partner) |
12 Message Queues & Kafka
Why Message Queues
Decouple producers from consumers. The producer publishes messages to a queue; consumers process them independently. This enables async processing, absorbs traffic spikes, and lets components fail independently.
Kafka Performance Secrets
- Sequential I/O: Kafka writes to disk sequentially (not random access), which is nearly as fast as memory
- Zero-Copy: Data goes directly from disk to network socket without passing through the application layer
13 Design: Rate Limiter
5 Rate Limiting Algorithms
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Token Bucket | Bucket fills with tokens at fixed rate; each request takes a token | Allows bursts, memory efficient | Tuning bucket size and refill rate |
| Leaking Bucket | Requests queue in FIFO bucket; processed at fixed rate | Smooths output rate | Burst of old requests can fill queue |
| Fixed Window | Count requests in fixed time windows | Simple, memory efficient | Burst at window edges (2x limit) |
| Sliding Window Log | Track timestamps of each request; count in sliding window | Very accurate | High memory (stores all timestamps) |
| Sliding Window Counter | Hybrid: weighted count from current + previous window | Smooth, memory efficient | Approximate (works for 99.97% cases) |
Architecture
Rate limiter middleware sits between client and API servers. Uses Redis for distributed counting (fast, supports INCR and EXPIRE). Returns HTTP 429 with headers: X-Ratelimit-Remaining, X-Ratelimit-Limit, X-Ratelimit-Retry-After.
14 Design: Unique ID Generator
Approaches Compared
| Approach | Pros | Cons | Verdict |
|---|---|---|---|
| Multi-master Auto-increment | Simple, numeric | Doesn't scale across DCs, not time-sortable | Limited use |
| UUID | No coordination, scalable | 128-bit, not sortable, not numeric | Good for distributed |
| Ticket Server | Easy, numeric | SPOF, scaling challenges | Small scale only |
| Snowflake (Twitter) | 64-bit, time-sortable, scalable | Clock sync needed | Recommended |
Twitter Snowflake ID Structure (64 bits)
15 Design: URL Shortener
Key Decisions
- Hash length: 7 characters using base-62 gives 627 = 3.5 trillion combinations
- Redirect type: 301 (permanent, browser caches) vs 302 (temporary, better for analytics)
- Generation approach: Base-62 conversion of a unique ID from a Snowflake-like generator
16 Design: Web Crawler
Architecture
Key Design Decisions
- BFS over DFS: BFS is standard for web crawlers; DFS can go too deep
- Politeness: Queue URLs by hostname; process same host sequentially with delays
- Deduplication: 29% of web pages are duplicates — detect via content hash
- URL Seen: Bloom filter + hash table to avoid revisiting URLs
- Scale: 1B pages/month = ~400 QPS, peak ~800. 5-year storage: ~30 PB
17 Design: Notification System
Multi-Channel Architecture
Key Design Points
- Reliability: Persist to log DB before sending; retry with exponential backoff
- Deduplication: Use event IDs to prevent duplicate sends
- User Settings: Respect opt-in/opt-out per channel per user
- Rate Limiting: Cap notifications per user to prevent spam
- Templates: Preformatted with customizable parameters
- Analytics: Track open rates, click rates, delivery success
18 Design: News Feed System
Fanout Models — The Critical Decision
| Model | How | Pro | Con |
|---|---|---|---|
| Fan-out on Write (Push) | Pre-compute feed when post is created | Instant reads | Celebrity problem: millions of writes |
| Fan-out on Read (Pull) | Build feed on-demand | No wasted work for inactive users | Slow reads |
| Hybrid (Recommended) | Push for normal users, pull for celebrities | Best of both worlds | More complex |
5-Layer Cache Architecture
- News Feed Cache: Post IDs only (not full objects) per user
- Content Cache: Post data; separate hot cache for viral posts
- Social Graph Cache: Friend/follower relationships
- Action Cache: Likes, comments, shares
- Counter Cache: Like counts, comment counts
19 Design: Chat System
Architecture Overview
- Protocol: WebSocket for real-time bidirectional messaging
- Stateless services for login, signup, profile (behind load balancer)
- Stateful chat service with persistent WebSocket connections
- Message queue (Kafka) for reliability and decoupling
- NoSQL database (Cassandra/MongoDB) for chat storage
- Presence service tracks online/offline via Redis heartbeats
Key Decisions
- Message ordering via IDs + timestamps with per-conversation sequence numbers
- Group messaging uses fan-out write with separate tables per group
- Read receipts tracked by dedicated status service
- Media attachments served via CDN
20 Design: Search Autocomplete
Core Data Structure: Trie (Prefix Tree)
Each node stores a character and frequency count. To find suggestions, traverse to the prefix node, then find top-K most frequent completions. Complexity: O(p + n).
Key Decisions
- Batch updates: Trie updated during off-peak hours, not real-time
- In-memory Trie with serialization for persistence
- Distributed Trie: Split via consistent hashing on prefix ranges
- Cache layer for frequently accessed prefixes
- Analytics pipeline tracks query frequency for ranking
21 Design: YouTube
Video Upload Pipeline
Video Streaming
- Adaptive bitrate streaming using MPEG-DASH or HLS
- Client selects bitrate based on bandwidth and device
- Popular videos cached at CDN edge servers worldwide
- Metadata in SQL; view stats in NoSQL; search via dedicated index
22 Design: Google Drive
Core Architecture
- Block-level storage: Files split into blocks; only changed blocks synced (delta sync)
- Metadata service: File info, permissions, timestamps
- Notification service: Real-time alerts of remote changes
- Versioning: Full history with rollback capability
- Merkle tree: Efficiently detect which blocks changed
Key Challenges & Solutions
| Challenge | Solution |
|---|---|
| Large files | Block-level storage + parallel upload/download |
| Offline support | Local queue; replay on reconnect |
| Concurrent edits | Version control + conflict resolution |
| Real-time sync | Notification service + change propagation |
| Storage efficiency | Block-level deduplication across users |
| Bandwidth | Delta sync (only transfer diffs) |
23 Microservices Architecture
Architecture Components
9 Microservice Best Practices
- Separate data store per service (no shared databases)
- Similar code maturity across services
- Separate build for each service
- Single responsibility per service
- Deploy into containers (Docker)
- Design stateless services
- Domain-driven design for service boundaries
- Micro frontends for UI decomposition
- Orchestrate with Kubernetes
Communication Patterns
| Pattern | When | Example |
|---|---|---|
| REST / gRPC | Synchronous request-response | User service → Auth service |
| Message Queue | Async processing, decoupling | Order → Payment service |
| Pub/Sub | Event broadcasting | Order created → notify all |
| Event Sourcing | Audit trail, replay | Financial transactions |
24 Docker & Kubernetes
Docker (Container Level)
- Packages app + dependencies into containers
- Single OS host
- Dockerfile defines build instructions
- Images stored in registries
- Lightweight, fast startup
Kubernetes (Cluster Level)
- Orchestrates containers across multiple hosts
- Auto-scaling, self-healing, rolling updates
- Service discovery & load balancing built-in
- Declarative configuration (desired state)
Kubernetes Architecture
25 CI/CD Pipelines
Deployment Strategies
- Blue-Green: Two identical environments; switch traffic between them
- Canary: Route small % of traffic to new version; gradually increase
- Rolling: Update instances one by one
Netflix Tech Stack Example
| Phase | Tools |
|---|---|
| Planning | JIRA, Confluence |
| Coding | Java, Python, Scala, JS, Kotlin |
| Build | Gradle |
| Packaging | Amazon Machine Image (AMI) |
| Testing | Chaos engineering tools |
| Deployment | Spinnaker (canary rollout) |
| Monitoring | Atlas, Kayenta |
| Incidents | PagerDuty, Dispatch |
26 Security & Authentication
Authentication Methods
| Method | How It Works | Best For |
|---|---|---|
| Session-Based | Server stores session state; client holds cookie | Traditional web apps |
| JWT | Stateless token (Header.Payload.Signature) | APIs, microservices |
| OAuth 2.0 | Third-party authorization (access + refresh tokens) | Login with Google/GitHub |
Password Storage Best Practices
27 Design Patterns
Architecture Patterns
| Pattern | Flow | Best For |
|---|---|---|
| MVC | View ↔ Controller ↔ Model | Web apps |
| MVP | View ↔ Presenter ↔ Model | More testable than MVC |
| MVVM | View ↔ ViewModel (data binding) ↔ Model | Reactive UIs |
| VIPER | View-Interactor-Presenter-Entity-Router | iOS apps |
18 Gang of Four Patterns
Creational
- Singleton: One instance only
- Factory: Creates objects without specifying class
- Builder: Step-by-step construction
- Prototype: Clone existing objects
Structural
- Adapter: Bridge incompatible interfaces
- Decorator: Add behavior dynamically
- Facade: Simple interface to complex subsystem
- Proxy: Control access to an object
- Composite: Tree structures
- Bridge: Separate abstraction from implementation
Behavioral
- Observer: Notify subscribers of changes
- Strategy: Swap algorithms at runtime
- Chain of Responsibility: Pass request through handler chain
- Command: Encapsulate request as object
- Iterator: Sequential access to collection
- Mediator: Centralize communication
- Memento: Capture and restore state (undo)
- Visitor: Add operations without changing classes
28 Cloud Services
Service Models
| Model | You Manage | Provider Manages | Example |
|---|---|---|---|
| IaaS | OS, runtime, app, data | Servers, storage, networking | AWS EC2, Azure VMs |
| PaaS | App code and data | Everything else | Heroku, App Engine |
| SaaS | Just use it | Everything | Gmail, Salesforce |
Cloud-Native Principles
- Microservices-based design with independent deployment
- Container orchestration (Kubernetes)
- Service mesh for inter-service communication
- Serverless for event-driven workloads
- Resilience patterns: circuit breaker, retry with backoff, timeout
29 Payment Systems
Credit Card Transaction Flow
Two-Phase Processing
- Authorization: Real-time approval when card is swiped (funds held)
- Capture & Settlement: Batch processing — merchant captures, network clears, funds transfer
System Design Interview Checklist
| Always Discuss | Details |
|---|---|
| Scale | DAU, QPS, peak traffic, data volume, growth rate |
| Storage | SQL vs NoSQL, schema design, sharding strategy |
| Caching | What to cache, eviction policy, strategy pattern |
| Networking | Load balancing, CDN, DNS, API gateway |
| Reliability | Replication, failover, retry with backoff, circuit breaker |
| Consistency | Strong vs eventual, CAP trade-offs, conflict resolution |
| Monitoring | Metrics, logging, alerting, dashboards |
Common Building Blocks
| Component | Purpose | Technologies |
|---|---|---|
| Load Balancer | Distribute traffic | Nginx, HAProxy, AWS ALB |
| Cache | Speed up reads | Redis, Memcached |
| Message Queue | Async processing | Kafka, RabbitMQ, SQS |
| CDN | Static content | CloudFront, Cloudflare, Akamai |
| Search | Full-text search | Elasticsearch, Solr |
| Object Storage | Files, images, videos | S3, GCS, Azure Blob |
| API Gateway | Entry point, routing | Kong, AWS API Gateway |
| Service Discovery | Find services | Consul, etcd, ZooKeeper |
Study guide generated April 2026