System Design Complete Study Guide

Everything you need to ace system design interviews, combined from two essential resources

System Design Interview - Alex Xu System Design 101 - ByteByteGo
Part I — Foundations & Principles

1 Scaling Fundamentals

Single Server to Distributed System

Every large-scale system starts with a single server handling web, database, and cache. As traffic grows, you progressively separate concerns and add redundancy.

User → DNS → Web Server (single box: web + DB + cache) ↓ scale up User → DNS → Web Server ↔ Database (separated) ↓ scale out User → DNS → Load Balancer → [Server 1, Server 2, ...] ↔ [Master DB ↔ Slave DBs] ↓ add layers User → CDN (static) + DNS → LB → Stateless Servers → Cache → DB (sharded) + Message Queue

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up)

Add more CPU, RAM, or storage to a single machine. Simple but has hard limits — you can't add infinite resources to one server. Also creates a single point of failure.

Simple Limited

Horizontal Scaling (Scale Out)

Add more servers to the pool. More complex to implement but virtually unlimited. This is the approach large-scale systems use. Requires load balancing and stateless design.

Scalable Complex

Key Components

Load Balancer

Distributes incoming traffic across multiple servers using a public IP. Servers behind it use private IPs only. If one server goes down, traffic reroutes to healthy ones. Eliminates SPOF at the web tier.

Database Replication (Master-Slave)

Master handles all writes; slaves handle reads. Since most workloads are read-heavy (often 10:1 ratio), this scales reads efficiently. If a slave dies, reads go to other slaves. If master dies, a slave gets promoted.

CDN (Content Delivery Network)

Geographically distributed cache for static assets (images, CSS, JS, videos). Users fetch from the nearest edge server. Key considerations: TTL (time-to-live), cache invalidation strategies, cost (don't cache infrequently accessed content), and fallback to origin server.

Stateless Web Tier

Move session data out of individual servers into a shared data store (Redis, Memcached, or NoSQL). Any server can handle any request. This makes horizontal scaling trivial — just add/remove servers behind the load balancer.

Data Centers & GeoDNS

Multiple data centers in different regions. GeoDNS routes users to the nearest one. Challenges include data synchronization across centers, testing across different regions, and automated failover.

Database Sharding

Split data across multiple databases using a partition key (e.g., user_id % num_shards). Challenges include resharding when data grows unevenly, the celebrity/hotspot problem (one shard gets disproportionate traffic), and cross-shard joins (solved via denormalization).

2 Back-of-Envelope Estimation

Powers of 2 — Data Volume Quick Reference

UnitApproxBytes
1 KBThousand103
1 MBMillion106
1 GBBillion109
1 TBTrillion1012
1 PBQuadrillion1015

Latency Numbers Every Engineer Should Know

OperationLatency
L1 cache reference~1 ns
L2 cache reference~4 ns
Main memory reference~100 ns
SSD random read~150 μs
HDD seek~10 ms
Send packet CA → Netherlands → CA~150 ms
Memory is fast but limited; disk is slow but cheap. Design around this: cache hot data in memory, store cold data on disk.

Availability SLAs (The Nines)

AvailabilityDowntime/Year
99% (two 9s)3.65 days
99.9% (three 9s)8.77 hours
99.99% (four 9s)52.6 minutes
99.999% (five 9s)5.26 minutes
QPS Estimation Formula: Daily Active Users × avg queries per user / 86,400 seconds = QPS. Peak QPS is typically 2× to 5× average QPS.

3 The 4-Step Interview Framework

StepTimeWhat To Do
1. Understand & Scope3-10 minAsk clarifying questions. Define features, users, scale, constraints. Never jump into design without scoping.
2. High-Level Design10-15 minDraw the architecture diagram. Identify core components: APIs, servers, databases, caches, queues. Get buy-in from interviewer.
3. Deep Dive10-25 minPick 2-3 components to discuss in depth. Talk about trade-offs, edge cases, bottlenecks, failure modes, and scaling.
4. Wrap Up3-5 minSummarize design. Discuss potential improvements, error handling, monitoring, and operational concerns.
Common Mistakes: Jumping into details without scoping, over-engineering the solution, ignoring trade-offs, not asking clarifying questions, and not considering failure scenarios.

4 CAP Theorem

In a distributed system, you can only guarantee two of three properties when a network partition occurs:

C — Consistency

All nodes see the same data at the same time. Every read returns the most recent write.

A — Availability

Every request gets a response (not error), even if it might be stale data.

P — Partition Tolerance

System continues operating despite network partitions between nodes. In real distributed systems, partitions will happen, so you must always have P.

Practical implication: Since P is non-negotiable, your real choice is CP (sacrifice availability during partitions — banks, financial systems) or AP (sacrifice consistency, serve stale data — social media, caching systems). Examples: CP = ZooKeeper, BigTable. AP = Cassandra, CouchDB, DynamoDB.

5 Consistent Hashing

The Rehashing Problem

Simple modular hashing (key % N servers) breaks when you add or remove servers — almost all keys get remapped. Consistent hashing solves this by remapping only K/N keys on average (K = total keys, N = servers).

How It Works

Imagine a circular hash space (0 to 232). Both servers and keys are placed on this ring using a hash function. Each key is assigned to the first server encountered going clockwise. When a server is added or removed, only the keys in its immediate range need redistribution.

Virtual Nodes

Each physical server gets multiple positions (virtual nodes) on the ring. This ensures more even distribution — without virtual nodes, servers can get very uneven loads. More virtual nodes = better balance but more memory for the lookup table.

Used by: Amazon DynamoDB, Apache Cassandra, Discord, Akamai CDN, Google Maglev load balancer.

6 Key-Value Store Design

Core Components

Quorum Configuration Trade-offs

ConfigOptimizes ForGuarantee
W=1, R=NFast writesStrong consistency
W=N, R=1Fast readsStrong consistency
W=2, R=2, N=3BalancedStrong consistency (W+R > N)
W=1, R=1Maximum speedEventual consistency

Failure Handling Mechanisms

Part II — Core Infrastructure Components

7 Communication Protocols

API Architectural Styles Comparison

StyleFormatBest ForKey Trait
RESTJSON/XMLWeb APIs, CRUDResource-oriented, HTTP methods
GraphQLJSONComplex queries, mobileClient specifies exact data needed
gRPCProtocol BuffersMicroservices5x faster than JSON, binary format
WebSocketAnyReal-time (chat, games)Bidirectional, persistent connection
SOAPXML onlyEnterprise/legacyStrict contracts, WS-Security
WebhookJSONEvent notificationsPush-based, eliminates polling

HTTP Evolution

VersionYearKey Improvement
HTTP/1.01996New TCP connection per request
HTTP/1.11997Persistent connections (Keep-Alive), pipelining
HTTP/2.02015Multiplexing on single TCP, server push, binary framing
HTTP/3.02020QUIC (UDP-based), no head-of-line blocking
Head-of-line blocking: In HTTP/1.x, if one request stalls, all queued requests behind it wait. HTTP/2 solves this at the application layer; HTTP/3 solves it at the transport layer using QUIC.

8 API Design & API Gateway

API Design Best Practices

API Gateway Functions

An API gateway sits between clients and microservices, handling: parameter validation, auth, rate limiting, service discovery, dynamic routing, protocol conversion, error handling/circuit breaking, logging (ELK stack), and caching (Redis).

5 API Performance Tricks

  1. Pagination: Break large results into pages
  2. Async Logging: Buffer logs, flush periodically to reduce I/O
  3. Caching: Cache frequent queries (Redis/Memcached)
  4. Payload Compression: gzip reduces data size significantly
  5. Connection Pooling: Reuse database connections

HTTP Status Codes Cheat Sheet

RangeMeaningCommon Codes
2xxSuccess200 OK, 201 Created, 204 No Content
3xxRedirect301 Permanent, 302 Temporary
4xxClient Error400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests
5xxServer Error500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout

9 Load Balancing

Algorithms Comparison

TypeAlgorithmHow It WorksBest For
StaticRound RobinSequential rotationStateless, equal servers
Sticky Round RobinSame client → same serverSession affinity needed
Weighted Round RobinMore traffic to stronger serversHeterogeneous hardware
IP/URL HashHash determines serverConsistent routing
DynamicLeast ConnectionsRoute to least busyVarying request costs
Least Response TimeRoute to fastest responderLatency-sensitive apps

Forward Proxy vs Reverse Proxy

Forward Proxy

Sits between users and internet. Protects clients: bypass restrictions, block content, hide identity.

Reverse Proxy (e.g., Nginx)

Sits between internet and servers. Protects servers: load balancing, DDoS protection, SSL termination, static caching.

10 Caching Strategies

Five Caching Patterns

PatternHow It WorksProsCons
Cache-AsideApp checks cache; on miss, reads DB, writes to cacheSimple, resilient to cache failureCache miss = 3 trips
Read-ThroughCache auto-loads from DB on missApp logic simplifiedFirst request always slow
Write-AroundWrites go to DB, cache updated on readAvoids caching unread dataCache miss on recent writes
Write-BackWrites go to cache first; async flush to DBVery fast writesData loss risk if cache fails
Write-ThroughWrites to cache AND DB synchronouslyStrong consistencyHigher write latency

Where Caching Happens (8 Layers)

  1. Client/browser cache
  2. CDN (static content)
  3. Load balancer cache
  4. API Gateway cache
  5. Application-level cache (CPU/RAM/disk)
  6. Distributed cache (Redis, Memcached)
  7. Full-text search (Elasticsearch)
  8. Database-level (buffer pool, WAL, materialized views)

Cache Eviction Policies

LRU (Least Recently Used) is the most common. Also: LFU (Least Frequently Used), FIFO (First In First Out). Choose based on your access patterns.

Watch out for: cache avalanche (many keys expire simultaneously), cache penetration (queries for non-existent data bypass cache), and thundering herd (many cache misses at once).

Redis — Why It's Fast

Redis Use Cases by Data Type

TypeUse Cases
StringSession, cache, distributed lock, counter, rate limiter
HashShopping cart, user profiles
ListMessage queue, activity feed
SetTags, unique visitors
Sorted SetLeaderboards, rankings
BitmapUser retention tracking, feature flags

11 Databases & Storage

SQL vs NoSQL Decision Guide

FactorSQL (Relational)NoSQL
Data ModelStructured, schema-enforcedFlexible schema, document/KV/graph
ScalingVertical (mostly)Horizontal (built-in)
ConsistencyACID guaranteedEventual consistency (usually)
Best ForTransactions, joins, complex queriesScale, flexibility, speed
ExamplesMySQL, PostgreSQL, OracleMongoDB, Cassandra, DynamoDB, Redis

8 Data Structures Powering Databases

StructureTypeUsed By
Skip ListIn-memory indexRedis
Hash IndexIn-memoryGeneral key-value lookup
SSTableOn-disk, immutableComponent of LSM trees
LSM TreeSkipList + SSTableCassandra, RocksDB (high write throughput)
B-tree / B+ treeDisk-based indexMySQL, PostgreSQL (balanced read/write)
Inverted IndexDocument searchElasticsearch, Lucene
Suffix TreeString pattern matchingText search engines
R-treeMulti-dimensionalPostGIS, geospatial queries

ACID Properties

Cloud Database Cheat Sheet

TypeAWSAzureGoogle
RelationalRDSSQL DatabaseCloud SQL
Key-ValueDynamoDBCosmos DBBigTable
DocumentDocumentDBCosmos DBFirestore
In-MemoryElastiCacheCache for RedisMemorystore
Object/BlobS3Blob StorageCloud Storage
AnalyticsRedshiftSynapseBigQuery
GraphNeptuneCosmos DBNeo4j (partner)

12 Message Queues & Kafka

Why Message Queues

Decouple producers from consumers. The producer publishes messages to a queue; consumers process them independently. This enables async processing, absorbs traffic spikes, and lets components fail independently.

Producer → Message Queue → Consumer(s) Benefits: - Decoupling: Producer doesn't need to know about consumers - Buffering: Queue absorbs traffic spikes - Reliability: Messages persist until consumed - Scaling: Add consumers independently

Kafka Performance Secrets

  1. Sequential I/O: Kafka writes to disk sequentially (not random access), which is nearly as fast as memory
  2. Zero-Copy: Data goes directly from disk to network socket without passing through the application layer
Part III — System Design Deep Dives

13 Design: Rate Limiter

5 Rate Limiting Algorithms

AlgorithmHow It WorksProsCons
Token BucketBucket fills with tokens at fixed rate; each request takes a tokenAllows bursts, memory efficientTuning bucket size and refill rate
Leaking BucketRequests queue in FIFO bucket; processed at fixed rateSmooths output rateBurst of old requests can fill queue
Fixed WindowCount requests in fixed time windowsSimple, memory efficientBurst at window edges (2x limit)
Sliding Window LogTrack timestamps of each request; count in sliding windowVery accurateHigh memory (stores all timestamps)
Sliding Window CounterHybrid: weighted count from current + previous windowSmooth, memory efficientApproximate (works for 99.97% cases)

Architecture

Rate limiter middleware sits between client and API servers. Uses Redis for distributed counting (fast, supports INCR and EXPIRE). Returns HTTP 429 with headers: X-Ratelimit-Remaining, X-Ratelimit-Limit, X-Ratelimit-Retry-After.

Race conditions solved using Lua scripts in Redis (atomic operations). Synchronization across instances handled by centralized Redis.

14 Design: Unique ID Generator

Approaches Compared

ApproachProsConsVerdict
Multi-master Auto-incrementSimple, numericDoesn't scale across DCs, not time-sortableLimited use
UUIDNo coordination, scalable128-bit, not sortable, not numericGood for distributed
Ticket ServerEasy, numericSPOF, scaling challengesSmall scale only
Snowflake (Twitter)64-bit, time-sortable, scalableClock sync neededRecommended

Twitter Snowflake ID Structure (64 bits)

| 1 bit sign | 41 bits timestamp | 5 bits datacenter | 5 bits machine | 12 bits sequence | - Sign: Always 0 (reserved) - Timestamp: Milliseconds since epoch → supports 69 years - Datacenter ID: 2^5 = 32 datacenters - Machine ID: 2^5 = 32 machines per DC - Sequence: 2^12 = 4,096 IDs per millisecond per machine

15 Design: URL Shortener

Key Decisions

URL Shortening: 1. POST /api/v1/data/shorten with longURL 2. Check if longURL exists in DB → return existing shortURL 3. Generate unique ID → convert to base-62 → 7-char shortURL 4. Save (shortURL, longURL) mapping to DB URL Redirect: 1. GET /{shortURL} 2. Check cache → if miss, check DB → cache it 3. Return 301/302 redirect to longURL

16 Design: Web Crawler

Architecture

Seed URLs → URL Frontier → HTML Downloader → Content Parser ↑ ↓ URL Filter Content Seen? (dedup) ↑ ↓ URL Seen? Link Extractor (bloom filter) ↓ ↑ URL Filter → URL Seen? ←-------------- New URLs ---›

Key Design Decisions

17 Design: Notification System

Multi-Channel Architecture

→ iOS Queue → Workers → APNS Service 1 --+ Service 2 --+-→ Notification --→ Android Queue → Workers → FCM Service 3 --+ Servers → SMS Queue → Workers → Twilio → Email Queue → Workers → SendGrid

Key Design Points

18 Design: News Feed System

Fanout Models — The Critical Decision

ModelHowProCon
Fan-out on Write (Push)Pre-compute feed when post is createdInstant readsCelebrity problem: millions of writes
Fan-out on Read (Pull)Build feed on-demandNo wasted work for inactive usersSlow reads
Hybrid (Recommended)Push for normal users, pull for celebritiesBest of both worldsMore complex

5-Layer Cache Architecture

  1. News Feed Cache: Post IDs only (not full objects) per user
  2. Content Cache: Post data; separate hot cache for viral posts
  3. Social Graph Cache: Friend/follower relationships
  4. Action Cache: Likes, comments, shares
  5. Counter Cache: Like counts, comment counts

19 Design: Chat System

Architecture Overview

Key Decisions

20 Design: Search Autocomplete

Core Data Structure: Trie (Prefix Tree)

Each node stores a character and frequency count. To find suggestions, traverse to the prefix node, then find top-K most frequent completions. Complexity: O(p + n).

Key Decisions

21 Design: YouTube

Video Upload Pipeline

Upload → Virus Scan → Metadata Extraction → Original Storage (Blob) ↓ Transcoding Service (async via message queue) ↓ Multiple Formats & Bitrates (H.264, VP9, AV1 x 240p-4K) ↓ CDN Distribution

Video Streaming

Transcoding is the bottleneck — always use message queues for async processing. A single video may need 30+ transcoded versions.

22 Design: Google Drive

Core Architecture

Key Challenges & Solutions

ChallengeSolution
Large filesBlock-level storage + parallel upload/download
Offline supportLocal queue; replay on reconnect
Concurrent editsVersion control + conflict resolution
Real-time syncNotification service + change propagation
Storage efficiencyBlock-level deduplication across users
BandwidthDelta sync (only transfer diffs)
Part IV — Platform, DevOps & Patterns

23 Microservices Architecture

Architecture Components

Client → Load Balancer → CDN (static) ↓ API Gateway → Auth/Identity Provider ↓ ↓ Service Discovery → [Service A (DB A), Service B (DB B), ...] ↓ Message Broker (Kafka) ↓ Monitoring & Logging

9 Microservice Best Practices

  1. Separate data store per service (no shared databases)
  2. Similar code maturity across services
  3. Separate build for each service
  4. Single responsibility per service
  5. Deploy into containers (Docker)
  6. Design stateless services
  7. Domain-driven design for service boundaries
  8. Micro frontends for UI decomposition
  9. Orchestrate with Kubernetes

Communication Patterns

PatternWhenExample
REST / gRPCSynchronous request-responseUser service → Auth service
Message QueueAsync processing, decouplingOrder → Payment service
Pub/SubEvent broadcastingOrder created → notify all
Event SourcingAudit trail, replayFinancial transactions

24 Docker & Kubernetes

Docker (Container Level)

  • Packages app + dependencies into containers
  • Single OS host
  • Dockerfile defines build instructions
  • Images stored in registries
  • Lightweight, fast startup

Kubernetes (Cluster Level)

  • Orchestrates containers across multiple hosts
  • Auto-scaling, self-healing, rolling updates
  • Service discovery & load balancing built-in
  • Declarative configuration (desired state)

Kubernetes Architecture

Control Plane: +-- API Server (all communication hub) +-- Scheduler (assigns pods to nodes) +-- Controller Manager (Node, Job, Endpoint controllers) +-- etcd (key-value backing store) Worker Nodes: +-- kubelet (ensures containers run in pods) +-- kube-proxy (network routing to correct containers) +-- Pods (smallest unit; group of containers with shared IP)

25 CI/CD Pipelines

Developer → Commit → Push → CI Server detects change → Build → Unit Tests → Integration Tests → Code Quality Gates → Artifact Creation → Deploy to Staging → Staging Tests → Approval → Deploy to Production (canary/blue-green)

Deployment Strategies

Netflix Tech Stack Example

PhaseTools
PlanningJIRA, Confluence
CodingJava, Python, Scala, JS, Kotlin
BuildGradle
PackagingAmazon Machine Image (AMI)
TestingChaos engineering tools
DeploymentSpinnaker (canary rollout)
MonitoringAtlas, Kayenta
IncidentsPagerDuty, Dispatch

26 Security & Authentication

Authentication Methods

MethodHow It WorksBest For
Session-BasedServer stores session state; client holds cookieTraditional web apps
JWTStateless token (Header.Payload.Signature)APIs, microservices
OAuth 2.0Third-party authorization (access + refresh tokens)Login with Google/GitHub

Password Storage Best Practices

Registration: password → generate random salt → hash(password + salt) → store {hash, salt} Validation: input password → retrieve salt → hash(input + salt) = H1 compare H1 with stored hash → match = valid
Salting defeats rainbow table attacks. Each user gets a unique random salt, so identical passwords produce different hashes.

27 Design Patterns

Architecture Patterns

PatternFlowBest For
MVCView ↔ Controller ↔ ModelWeb apps
MVPView ↔ Presenter ↔ ModelMore testable than MVC
MVVMView ↔ ViewModel (data binding) ↔ ModelReactive UIs
VIPERView-Interactor-Presenter-Entity-RouteriOS apps

18 Gang of Four Patterns

Creational

  • Singleton: One instance only
  • Factory: Creates objects without specifying class
  • Builder: Step-by-step construction
  • Prototype: Clone existing objects

Structural

  • Adapter: Bridge incompatible interfaces
  • Decorator: Add behavior dynamically
  • Facade: Simple interface to complex subsystem
  • Proxy: Control access to an object
  • Composite: Tree structures
  • Bridge: Separate abstraction from implementation

Behavioral

28 Cloud Services

Service Models

ModelYou ManageProvider ManagesExample
IaaSOS, runtime, app, dataServers, storage, networkingAWS EC2, Azure VMs
PaaSApp code and dataEverything elseHeroku, App Engine
SaaSJust use itEverythingGmail, Salesforce

Cloud-Native Principles

29 Payment Systems

Credit Card Transaction Flow

Cardholder pays $100 → Merchant → Acquirer (merchant's bank) → Card Network (Visa/MC) → Issuer (cardholder's bank) Fee Breakdown ($100 transaction): Interchange fee (to Issuer): ~$1.75 Network fee (to Visa/MC): ~$0.10 Acquirer markup: ~$0.25 Total merchant discount: ~$2.10 Merchant receives: ~$97.90

Two-Phase Processing

Idempotency is critical. Every request must include a unique ID so retries don't cause double charges.
Quick Reference Cheat Sheet

System Design Interview Checklist

Always DiscussDetails
ScaleDAU, QPS, peak traffic, data volume, growth rate
StorageSQL vs NoSQL, schema design, sharding strategy
CachingWhat to cache, eviction policy, strategy pattern
NetworkingLoad balancing, CDN, DNS, API gateway
ReliabilityReplication, failover, retry with backoff, circuit breaker
ConsistencyStrong vs eventual, CAP trade-offs, conflict resolution
MonitoringMetrics, logging, alerting, dashboards

Common Building Blocks

ComponentPurposeTechnologies
Load BalancerDistribute trafficNginx, HAProxy, AWS ALB
CacheSpeed up readsRedis, Memcached
Message QueueAsync processingKafka, RabbitMQ, SQS
CDNStatic contentCloudFront, Cloudflare, Akamai
SearchFull-text searchElasticsearch, Solr
Object StorageFiles, images, videosS3, GCS, Azure Blob
API GatewayEntry point, routingKong, AWS API Gateway
Service DiscoveryFind servicesConsul, etcd, ZooKeeper
Combined from "System Design Interview" by Alex Xu and "System Design 101" by ByteByteGo
Study guide generated April 2026