System Design Complete Study Guide

Everything you need to ace system design interviews, combined from two essential resources

System Design Interview - Alex Xu System Design 101 - ByteByteGo

Part I — Foundations & Principles

1 Scaling Fundamentals

Single Server to Distributed System

Every large-scale system starts with a single server handling web, database, and cache. As traffic grows, you progressively separate concerns and add redundancy.

User → DNS → Web Server (single box: web + DB + cache) ↓ scale up User → DNS → Web Server ↔ Database (separated) ↓ scale out User → DNS → Load Balancer → [Server 1, Server 2, ...] ↔ [Master DB ↔ Slave DBs] ↓ add layers User → CDN (static) + DNS → LB → Stateless Servers → Cache → DB (sharded) + Message Queue

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up)

Add more CPU, RAM, or storage to a single machine. Simple but has hard limits — you can't add infinite resources to one server. Also creates a single point of failure.

Simple Limited

Horizontal Scaling (Scale Out)

Add more servers to the pool. More complex to implement but virtually unlimited. This is the approach large-scale systems use. Requires load balancing and stateless design.

Scalable Complex

Key Components

Load Balancer

Distributes incoming traffic across multiple servers using a public IP. Servers behind it use private IPs only. If one server goes down, traffic reroutes to healthy ones. Eliminates SPOF at the web tier.

Database Replication (Master-Slave)

Master handles all writes; slaves handle reads. Since most workloads are read-heavy (often 10:1 ratio), this scales reads efficiently. If a slave dies, reads go to other slaves. If master dies, a slave gets promoted.

CDN (Content Delivery Network)

Geographically distributed cache for static assets (images, CSS, JS, videos). Users fetch from the nearest edge server. Key considerations: TTL (time-to-live), cache invalidation strategies, cost (don't cache infrequently accessed content), and fallback to origin server.

Stateless Web Tier

Move session data out of individual servers into a shared data store (Redis, Memcached, or NoSQL). Any server can handle any request. This makes horizontal scaling trivial — just add/remove servers behind the load balancer.

Data Centers & GeoDNS

Multiple data centers in different regions. GeoDNS routes users to the nearest one. Challenges include data synchronization across centers, testing across different regions, and automated failover.

Database Sharding

Split data across multiple databases using a partition key (e.g., user_id % num_shards). Challenges include resharding when data grows unevenly, the celebrity/hotspot problem (one shard gets disproportionate traffic), and cross-shard joins (solved via denormalization).

2 Back-of-Envelope Estimation

Powers of 2 — Data Volume Quick Reference

Unit	Approx	Bytes
1 KB	Thousand	10³
1 MB	Million	10⁶
1 GB	Billion	10⁹
1 TB	Trillion	10¹²
1 PB	Quadrillion	10¹⁵

Latency Numbers Every Engineer Should Know

Operation	Latency
L1 cache reference	~1 ns
L2 cache reference	~4 ns
Main memory reference	~100 ns
SSD random read	~150 μs
HDD seek	~10 ms
Send packet CA → Netherlands → CA	~150 ms

Memory is fast but limited; disk is slow but cheap. Design around this: cache hot data in memory, store cold data on disk.

Availability SLAs (The Nines)

Availability	Downtime/Year
99% (two 9s)	3.65 days
99.9% (three 9s)	8.77 hours
99.99% (four 9s)	52.6 minutes
99.999% (five 9s)	5.26 minutes

QPS Estimation Formula: Daily Active Users × avg queries per user / 86,400 seconds = QPS. Peak QPS is typically 2× to 5× average QPS.

3 The 4-Step Interview Framework

Step	Time	What To Do
1. Understand & Scope	3-10 min	Ask clarifying questions. Define features, users, scale, constraints. Never jump into design without scoping.
2. High-Level Design	10-15 min	Draw the architecture diagram. Identify core components: APIs, servers, databases, caches, queues. Get buy-in from interviewer.
3. Deep Dive	10-25 min	Pick 2-3 components to discuss in depth. Talk about trade-offs, edge cases, bottlenecks, failure modes, and scaling.
4. Wrap Up	3-5 min	Summarize design. Discuss potential improvements, error handling, monitoring, and operational concerns.

Common Mistakes: Jumping into details without scoping, over-engineering the solution, ignoring trade-offs, not asking clarifying questions, and not considering failure scenarios.

4 CAP Theorem

In a distributed system, you can only guarantee two of three properties when a network partition occurs:

C — Consistency

All nodes see the same data at the same time. Every read returns the most recent write.

A — Availability

Every request gets a response (not error), even if it might be stale data.

P — Partition Tolerance

System continues operating despite network partitions between nodes. In real distributed systems, partitions will happen, so you must always have P.

Practical implication: Since P is non-negotiable, your real choice is CP (sacrifice availability during partitions — banks, financial systems) or AP (sacrifice consistency, serve stale data — social media, caching systems). Examples: CP = ZooKeeper, BigTable. AP = Cassandra, CouchDB, DynamoDB.

5 Consistent Hashing

The Rehashing Problem

Simple modular hashing (key % N servers) breaks when you add or remove servers — almost all keys get remapped. Consistent hashing solves this by remapping only K/N keys on average (K = total keys, N = servers).

How It Works

Imagine a circular hash space (0 to 2³²). Both servers and keys are placed on this ring using a hash function. Each key is assigned to the first server encountered going clockwise. When a server is added or removed, only the keys in its immediate range need redistribution.

Virtual Nodes

Each physical server gets multiple positions (virtual nodes) on the ring. This ensures more even distribution — without virtual nodes, servers can get very uneven loads. More virtual nodes = better balance but more memory for the lookup table.

Used by: Amazon DynamoDB, Apache Cassandra, Discord, Akamai CDN, Google Maglev load balancer.

6 Key-Value Store Design

Core Components

Data Partition: Consistent hashing distributes keys across nodes
Replication: N replicas across distinct data centers for fault tolerance
Consistency: Quorum-based — W (write quorum) + R (read quorum) > N guarantees strong consistency
Conflict Resolution: Vector clocks track version history; client resolves conflicts
Failure Detection: Gossip protocol (decentralized heartbeat propagation)

Quorum Configuration Trade-offs

Config	Optimizes For	Guarantee
W=1, R=N	Fast writes	Strong consistency
W=N, R=1	Fast reads	Strong consistency
W=2, R=2, N=3	Balanced	Strong consistency (W+R > N)
W=1, R=1	Maximum speed	Eventual consistency

Failure Handling Mechanisms

Sloppy Quorum + Hinted Handoff: During temp failures, nearby healthy nodes accept writes temporarily and hand data back when the failed node returns.
Anti-Entropy + Merkle Trees: For permanent failures. Merkle trees let nodes efficiently compare data by hashing subtrees — only sync branches that differ.
Write Path: Commit log → Memory cache → SSTable flush to disk
Read Path: Memory cache → Bloom filter check → SSTable lookup

Part II — Core Infrastructure Components

7 Communication Protocols

API Architectural Styles Comparison

Style	Format	Best For	Key Trait
REST	JSON/XML	Web APIs, CRUD	Resource-oriented, HTTP methods
GraphQL	JSON	Complex queries, mobile	Client specifies exact data needed
gRPC	Protocol Buffers	Microservices	5x faster than JSON, binary format
WebSocket	Any	Real-time (chat, games)	Bidirectional, persistent connection
SOAP	XML only	Enterprise/legacy	Strict contracts, WS-Security
Webhook	JSON	Event notifications	Push-based, eliminates polling

HTTP Evolution

Version	Year	Key Improvement
HTTP/1.0	1996	New TCP connection per request
HTTP/1.1	1997	Persistent connections (Keep-Alive), pipelining
HTTP/2.0	2015	Multiplexing on single TCP, server push, binary framing
HTTP/3.0	2020	QUIC (UDP-based), no head-of-line blocking

Head-of-line blocking: In HTTP/1.x, if one request stalls, all queued requests behind it wait. HTTP/2 solves this at the application layer; HTTP/3 solves it at the transport layer using QUIC.

8 API Design & API Gateway

API Design Best Practices

Use nouns for resources: GET /carts/123 not GET /queryCarts/123
Use plurals: GET /carts/123
Versioning: GET /v1/carts/123
Pagination: GET /carts?pageSize=20&pageToken=abc
Filtering: GET /items?filter=color:red
Sorting: GET /items?sort_by=time
Idempotency: Use request IDs to prevent duplicate operations
Resource cross-references: GET /carts/123/items/321

API Gateway Functions

An API gateway sits between clients and microservices, handling: parameter validation, auth, rate limiting, service discovery, dynamic routing, protocol conversion, error handling/circuit breaking, logging (ELK stack), and caching (Redis).

5 API Performance Tricks

Pagination: Break large results into pages
Async Logging: Buffer logs, flush periodically to reduce I/O
Caching: Cache frequent queries (Redis/Memcached)
Payload Compression: gzip reduces data size significantly
Connection Pooling: Reuse database connections

HTTP Status Codes Cheat Sheet

Range	Meaning	Common Codes
2xx	Success	200 OK, 201 Created, 204 No Content
3xx	Redirect	301 Permanent, 302 Temporary
4xx	Client Error	400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 429 Too Many Requests
5xx	Server Error	500 Internal, 502 Bad Gateway, 503 Unavailable, 504 Timeout

9 Load Balancing

Algorithms Comparison

Type	Algorithm	How It Works	Best For
Static	Round Robin	Sequential rotation	Stateless, equal servers
	Sticky Round Robin	Same client → same server	Session affinity needed
	Weighted Round Robin	More traffic to stronger servers	Heterogeneous hardware
	IP/URL Hash	Hash determines server	Consistent routing
Dynamic	Least Connections	Route to least busy	Varying request costs
Dynamic	Least Response Time	Route to fastest responder	Latency-sensitive apps

Forward Proxy vs Reverse Proxy

Forward Proxy

Sits between users and internet. Protects clients: bypass restrictions, block content, hide identity.

Reverse Proxy (e.g., Nginx)

Sits between internet and servers. Protects servers: load balancing, DDoS protection, SSL termination, static caching.

10 Caching Strategies

Five Caching Patterns

Pattern	How It Works	Pros	Cons
Cache-Aside	App checks cache; on miss, reads DB, writes to cache	Simple, resilient to cache failure	Cache miss = 3 trips
Read-Through	Cache auto-loads from DB on miss	App logic simplified	First request always slow
Write-Around	Writes go to DB, cache updated on read	Avoids caching unread data	Cache miss on recent writes
Write-Back	Writes go to cache first; async flush to DB	Very fast writes	Data loss risk if cache fails
Write-Through	Writes to cache AND DB synchronously	Strong consistency	Higher write latency

Where Caching Happens (8 Layers)

Client/browser cache
CDN (static content)
Load balancer cache
API Gateway cache
Application-level cache (CPU/RAM/disk)
Distributed cache (Redis, Memcached)
Full-text search (Elasticsearch)
Database-level (buffer pool, WAL, materialized views)

Cache Eviction Policies

LRU (Least Recently Used) is the most common. Also: LFU (Least Frequently Used), FIFO (First In First Out). Choose based on your access patterns.

Watch out for: cache avalanche (many keys expire simultaneously), cache penetration (queries for non-existent data bypass cache), and thundering herd (many cache misses at once).

Redis — Why It's Fast

RAM-based storage (1000x faster than disk)
I/O multiplexing with single-threaded execution (no lock overhead)
Efficient data structures: SDS, SkipList, ZipList, HashTable, IntSet

Redis Use Cases by Data Type

Type	Use Cases
String	Session, cache, distributed lock, counter, rate limiter
Hash	Shopping cart, user profiles
List	Message queue, activity feed
Set	Tags, unique visitors
Sorted Set	Leaderboards, rankings
Bitmap	User retention tracking, feature flags

11 Databases & Storage

SQL vs NoSQL Decision Guide

Factor	SQL (Relational)	NoSQL
Data Model	Structured, schema-enforced	Flexible schema, document/KV/graph
Scaling	Vertical (mostly)	Horizontal (built-in)
Consistency	ACID guaranteed	Eventual consistency (usually)
Best For	Transactions, joins, complex queries	Scale, flexibility, speed
Examples	MySQL, PostgreSQL, Oracle	MongoDB, Cassandra, DynamoDB, Redis

8 Data Structures Powering Databases

Structure	Type	Used By
Skip List	In-memory index	Redis
Hash Index	In-memory	General key-value lookup
SSTable	On-disk, immutable	Component of LSM trees
LSM Tree	SkipList + SSTable	Cassandra, RocksDB (high write throughput)
B-tree / B+ tree	Disk-based index	MySQL, PostgreSQL (balanced read/write)
Inverted Index	Document search	Elasticsearch, Lucene
Suffix Tree	String pattern matching	Text search engines
R-tree	Multi-dimensional	PostGIS, geospatial queries

ACID Properties

Atomicity: Transaction either fully completes or fully rolls back
Consistency: Database moves from one valid state to another
Isolation: Concurrent transactions don't interfere with each other
Durability: Once committed, data survives crashes

Cloud Database Cheat Sheet

Type	AWS	Azure	Google
Relational	RDS	SQL Database	Cloud SQL
Key-Value	DynamoDB	Cosmos DB	BigTable
Document	DocumentDB	Cosmos DB	Firestore
In-Memory	ElastiCache	Cache for Redis	Memorystore
Object/Blob	S3	Blob Storage	Cloud Storage
Analytics	Redshift	Synapse	BigQuery
Graph	Neptune	Cosmos DB	Neo4j (partner)

12 Message Queues & Kafka

Why Message Queues

Decouple producers from consumers. The producer publishes messages to a queue; consumers process them independently. This enables async processing, absorbs traffic spikes, and lets components fail independently.

Producer → Message Queue → Consumer(s) Benefits: - Decoupling: Producer doesn't need to know about consumers - Buffering: Queue absorbs traffic spikes - Reliability: Messages persist until consumed - Scaling: Add consumers independently

Kafka Performance Secrets

Sequential I/O: Kafka writes to disk sequentially (not random access), which is nearly as fast as memory
Zero-Copy: Data goes directly from disk to network socket without passing through the application layer

Part III — System Design Deep Dives

13 Design: Rate Limiter

5 Rate Limiting Algorithms

Algorithm	How It Works	Pros	Cons
Token Bucket	Bucket fills with tokens at fixed rate; each request takes a token	Allows bursts, memory efficient	Tuning bucket size and refill rate
Leaking Bucket	Requests queue in FIFO bucket; processed at fixed rate	Smooths output rate	Burst of old requests can fill queue
Fixed Window	Count requests in fixed time windows	Simple, memory efficient	Burst at window edges (2x limit)
Sliding Window Log	Track timestamps of each request; count in sliding window	Very accurate	High memory (stores all timestamps)
Sliding Window Counter	Hybrid: weighted count from current + previous window	Smooth, memory efficient	Approximate (works for 99.97% cases)

Architecture

Rate limiter middleware sits between client and API servers. Uses Redis for distributed counting (fast, supports INCR and EXPIRE). Returns HTTP 429 with headers: X-Ratelimit-Remaining, X-Ratelimit-Limit, X-Ratelimit-Retry-After.

Race conditions solved using Lua scripts in Redis (atomic operations). Synchronization across instances handled by centralized Redis.

14 Design: Unique ID Generator

Approaches Compared

Approach	Pros	Cons	Verdict
Multi-master Auto-increment	Simple, numeric	Doesn't scale across DCs, not time-sortable	Limited use
UUID	No coordination, scalable	128-bit, not sortable, not numeric	Good for distributed
Ticket Server	Easy, numeric	SPOF, scaling challenges	Small scale only
Snowflake (Twitter)	64-bit, time-sortable, scalable	Clock sync needed	Recommended

Twitter Snowflake ID Structure (64 bits)

| 1 bit sign | 41 bits timestamp | 5 bits datacenter | 5 bits machine | 12 bits sequence | - Sign: Always 0 (reserved) - Timestamp: Milliseconds since epoch → supports 69 years - Datacenter ID: 2^5 = 32 datacenters - Machine ID: 2^5 = 32 machines per DC - Sequence: 2^12 = 4,096 IDs per millisecond per machine

15 Design: URL Shortener

Key Decisions

Hash length: 7 characters using base-62 gives 62⁷ = 3.5 trillion combinations
Redirect type: 301 (permanent, browser caches) vs 302 (temporary, better for analytics)
Generation approach: Base-62 conversion of a unique ID from a Snowflake-like generator

URL Shortening: 1. POST /api/v1/data/shorten with longURL 2. Check if longURL exists in DB → return existing shortURL 3. Generate unique ID → convert to base-62 → 7-char shortURL 4. Save (shortURL, longURL) mapping to DB URL Redirect: 1. GET /{shortURL} 2. Check cache → if miss, check DB → cache it 3. Return 301/302 redirect to longURL

16 Design: Web Crawler

Architecture

Seed URLs → URL Frontier → HTML Downloader → Content Parser ↑ ↓ URL Filter Content Seen? (dedup) ↑ ↓ URL Seen? Link Extractor (bloom filter) ↓ ↑ URL Filter → URL Seen? ←-------------- New URLs ---›

Key Design Decisions

BFS over DFS: BFS is standard for web crawlers; DFS can go too deep
Politeness: Queue URLs by hostname; process same host sequentially with delays
Deduplication: 29% of web pages are duplicates — detect via content hash
URL Seen: Bloom filter + hash table to avoid revisiting URLs
Scale: 1B pages/month = ~400 QPS, peak ~800. 5-year storage: ~30 PB

17 Design: Notification System

Multi-Channel Architecture

→ iOS Queue → Workers → APNS Service 1 --+ Service 2 --+-→ Notification --→ Android Queue → Workers → FCM Service 3 --+ Servers → SMS Queue → Workers → Twilio → Email Queue → Workers → SendGrid

Key Design Points

Reliability: Persist to log DB before sending; retry with exponential backoff
Deduplication: Use event IDs to prevent duplicate sends
User Settings: Respect opt-in/opt-out per channel per user
Rate Limiting: Cap notifications per user to prevent spam
Templates: Preformatted with customizable parameters
Analytics: Track open rates, click rates, delivery success

18 Design: News Feed System

Fanout Models — The Critical Decision

Model	How	Pro	Con
Fan-out on Write (Push)	Pre-compute feed when post is created	Instant reads	Celebrity problem: millions of writes
Fan-out on Read (Pull)	Build feed on-demand	No wasted work for inactive users	Slow reads
Hybrid (Recommended)	Push for normal users, pull for celebrities	Best of both worlds	More complex

5-Layer Cache Architecture

News Feed Cache: Post IDs only (not full objects) per user
Content Cache: Post data; separate hot cache for viral posts
Social Graph Cache: Friend/follower relationships
Action Cache: Likes, comments, shares
Counter Cache: Like counts, comment counts

19 Design: Chat System

Architecture Overview

Protocol: WebSocket for real-time bidirectional messaging
Stateless services for login, signup, profile (behind load balancer)
Stateful chat service with persistent WebSocket connections
Message queue (Kafka) for reliability and decoupling
NoSQL database (Cassandra/MongoDB) for chat storage
Presence service tracks online/offline via Redis heartbeats

Key Decisions

Message ordering via IDs + timestamps with per-conversation sequence numbers
Group messaging uses fan-out write with separate tables per group
Read receipts tracked by dedicated status service
Media attachments served via CDN

20 Design: Search Autocomplete

Core Data Structure: Trie (Prefix Tree)

Each node stores a character and frequency count. To find suggestions, traverse to the prefix node, then find top-K most frequent completions. Complexity: O(p + n).

Key Decisions

Batch updates: Trie updated during off-peak hours, not real-time
In-memory Trie with serialization for persistence
Distributed Trie: Split via consistent hashing on prefix ranges
Cache layer for frequently accessed prefixes
Analytics pipeline tracks query frequency for ranking

21 Design: YouTube

Video Upload Pipeline

Upload → Virus Scan → Metadata Extraction → Original Storage (Blob) ↓ Transcoding Service (async via message queue) ↓ Multiple Formats & Bitrates (H.264, VP9, AV1 x 240p-4K) ↓ CDN Distribution

Video Streaming

Adaptive bitrate streaming using MPEG-DASH or HLS
Client selects bitrate based on bandwidth and device
Popular videos cached at CDN edge servers worldwide
Metadata in SQL; view stats in NoSQL; search via dedicated index

Transcoding is the bottleneck — always use message queues for async processing. A single video may need 30+ transcoded versions.

22 Design: Google Drive

Core Architecture

Block-level storage: Files split into blocks; only changed blocks synced (delta sync)
Metadata service: File info, permissions, timestamps
Notification service: Real-time alerts of remote changes
Versioning: Full history with rollback capability
Merkle tree: Efficiently detect which blocks changed

Key Challenges & Solutions

Challenge	Solution
Large files	Block-level storage + parallel upload/download
Offline support	Local queue; replay on reconnect
Concurrent edits	Version control + conflict resolution
Real-time sync	Notification service + change propagation
Storage efficiency	Block-level deduplication across users
Bandwidth	Delta sync (only transfer diffs)

Part IV — Platform, DevOps & Patterns

23 Microservices Architecture

Architecture Components

Client → Load Balancer → CDN (static) ↓ API Gateway → Auth/Identity Provider ↓ ↓ Service Discovery → [Service A (DB A), Service B (DB B), ...] ↓ Message Broker (Kafka) ↓ Monitoring & Logging

9 Microservice Best Practices

Separate data store per service (no shared databases)
Similar code maturity across services
Separate build for each service
Single responsibility per service
Deploy into containers (Docker)
Design stateless services
Domain-driven design for service boundaries
Micro frontends for UI decomposition
Orchestrate with Kubernetes

Communication Patterns

Pattern	When	Example
REST / gRPC	Synchronous request-response	User service → Auth service
Message Queue	Async processing, decoupling	Order → Payment service
Pub/Sub	Event broadcasting	Order created → notify all
Event Sourcing	Audit trail, replay	Financial transactions

24 Docker & Kubernetes

Docker (Container Level)

Packages app + dependencies into containers
Single OS host
Dockerfile defines build instructions
Images stored in registries
Lightweight, fast startup

Kubernetes (Cluster Level)

Orchestrates containers across multiple hosts
Auto-scaling, self-healing, rolling updates
Service discovery & load balancing built-in
Declarative configuration (desired state)

Kubernetes Architecture

Control Plane: +-- API Server (all communication hub) +-- Scheduler (assigns pods to nodes) +-- Controller Manager (Node, Job, Endpoint controllers) +-- etcd (key-value backing store) Worker Nodes: +-- kubelet (ensures containers run in pods) +-- kube-proxy (network routing to correct containers) +-- Pods (smallest unit; group of containers with shared IP)

25 CI/CD Pipelines

Developer → Commit → Push → CI Server detects change → Build → Unit Tests → Integration Tests → Code Quality Gates → Artifact Creation → Deploy to Staging → Staging Tests → Approval → Deploy to Production (canary/blue-green)

Deployment Strategies

Blue-Green: Two identical environments; switch traffic between them
Canary: Route small % of traffic to new version; gradually increase
Rolling: Update instances one by one

Netflix Tech Stack Example

Phase	Tools
Planning	JIRA, Confluence
Coding	Java, Python, Scala, JS, Kotlin
Build	Gradle
Packaging	Amazon Machine Image (AMI)
Testing	Chaos engineering tools
Deployment	Spinnaker (canary rollout)
Monitoring	Atlas, Kayenta
Incidents	PagerDuty, Dispatch

26 Security & Authentication

Authentication Methods

Method	How It Works	Best For
Session-Based	Server stores session state; client holds cookie	Traditional web apps
JWT	Stateless token (Header.Payload.Signature)	APIs, microservices
OAuth 2.0	Third-party authorization (access + refresh tokens)	Login with Google/GitHub

Password Storage Best Practices

Registration: password → generate random salt → hash(password + salt) → store {hash, salt} Validation: input password → retrieve salt → hash(input + salt) = H1 compare H1 with stored hash → match = valid

Salting defeats rainbow table attacks. Each user gets a unique random salt, so identical passwords produce different hashes.

27 Design Patterns

Architecture Patterns

Pattern	Flow	Best For
MVC	View ↔ Controller ↔ Model	Web apps
MVP	View ↔ Presenter ↔ Model	More testable than MVC
MVVM	View ↔ ViewModel (data binding) ↔ Model	Reactive UIs
VIPER	View-Interactor-Presenter-Entity-Router	iOS apps

18 Gang of Four Patterns

Creational

Singleton: One instance only
Factory: Creates objects without specifying class
Builder: Step-by-step construction
Prototype: Clone existing objects

Structural

Adapter: Bridge incompatible interfaces
Decorator: Add behavior dynamically
Facade: Simple interface to complex subsystem
Proxy: Control access to an object
Composite: Tree structures
Bridge: Separate abstraction from implementation

Behavioral

Observer: Notify subscribers of changes
Strategy: Swap algorithms at runtime
Chain of Responsibility: Pass request through handler chain
Command: Encapsulate request as object
Iterator: Sequential access to collection
Mediator: Centralize communication
Memento: Capture and restore state (undo)
Visitor: Add operations without changing classes

28 Cloud Services

Service Models

Model	You Manage	Provider Manages	Example
IaaS	OS, runtime, app, data	Servers, storage, networking	AWS EC2, Azure VMs
PaaS	App code and data	Everything else	Heroku, App Engine
SaaS	Just use it	Everything	Gmail, Salesforce

Cloud-Native Principles

Microservices-based design with independent deployment
Container orchestration (Kubernetes)
Service mesh for inter-service communication
Serverless for event-driven workloads
Resilience patterns: circuit breaker, retry with backoff, timeout

29 Payment Systems

Credit Card Transaction Flow

Cardholder pays $100 → Merchant → Acquirer (merchant's bank) → Card Network (Visa/MC) → Issuer (cardholder's bank) Fee Breakdown ($100 transaction): Interchange fee (to Issuer): ~$1.75 Network fee (to Visa/MC): ~$0.10 Acquirer markup: ~$0.25 Total merchant discount: ~$2.10 Merchant receives: ~$97.90

Two-Phase Processing

Authorization: Real-time approval when card is swiped (funds held)
Capture & Settlement: Batch processing — merchant captures, network clears, funds transfer

Idempotency is critical. Every request must include a unique ID so retries don't cause double charges.

Quick Reference Cheat Sheet

System Design Interview Checklist

Always Discuss	Details
Scale	DAU, QPS, peak traffic, data volume, growth rate
Storage	SQL vs NoSQL, schema design, sharding strategy
Caching	What to cache, eviction policy, strategy pattern
Networking	Load balancing, CDN, DNS, API gateway
Reliability	Replication, failover, retry with backoff, circuit breaker
Consistency	Strong vs eventual, CAP trade-offs, conflict resolution
Monitoring	Metrics, logging, alerting, dashboards

Common Building Blocks

Component	Purpose	Technologies
Load Balancer	Distribute traffic	Nginx, HAProxy, AWS ALB
Cache	Speed up reads	Redis, Memcached
Message Queue	Async processing	Kafka, RabbitMQ, SQS
CDN	Static content	CloudFront, Cloudflare, Akamai
Search	Full-text search	Elasticsearch, Solr
Object Storage	Files, images, videos	S3, GCS, Azure Blob
API Gateway	Entry point, routing	Kong, AWS API Gateway
Service Discovery	Find services	Consul, etcd, ZooKeeper

Combined from "System Design Interview" by Alex Xu and "System Design 101" by ByteByteGo
Study guide generated April 2026