# System Design
# Framework for the interview
- Clarify the problem and define the scope
- Propose high-level design and get buy in
- Come up with blueprint, ask for feedbacks
- Draw diagrams
- Do back-of-the-envelop calculation to validate the blueprint
- Deep dive into the design
- Wrap up
- Traffic bottleneck
- Security hole
- Single point of failure
# Design a rate limiter
Benefits of a rate limiter
- Prevent from DoS (deniel of serice) attack.
- Reduce cost for unnecessary calls.
- Prevent servers from being overloaded.
# Clarification
- What types of rate limiter is this, client side or server side? Server
- What property does the rate limiter throttle on, IP/User/etc? It should be a general rate limiter
- Is the rate limiter a service or embedded into code? Up to you.
Summary
This is a rate limiter which
- Limites excessive requests
- Introduce little latency to http traffic
- Applicable to distributed servers
- Does not impact the service if it fails
# Algorighms
- Token bucket
- Tokens are refilled at certain intervals, each request consumes a token.
- It takes two params: bucket size and refill rate.
- Burst is allowed as long as there are tokens in the bucket.
- Leak bucket
- At the end of token bucket, add a queue to process requests at fixed rate.
- Burst requests are allowed but will be processed in a queue with fixed rate.
- Fixed window counter
- At certain interval, e.g. every 1 min, count how many requests are sent and drop any requests that exceeds limit. Reset the count when the next window starts.
- There may be over flow if a bunch of requests sent at the end of previous window and the beginning of the next window.
- Sliding window log
- Keep logging the timestamp. If in the designed time window the number of requests are more than limit, just do log and no op.
- It consumes a lot of memory because all the timestamps are recorded.
- Sliding window counter
# Deep Dive
- How to guarantee consistency and avoid race happens?
- Maybe all rate limiters can read and write to the same redis service.
- Architecture
- There should be a redis service for storing the request info
- There should be configuration service to set up the rate limiting rules. The rules can be stored in a database and the rate limiter constantly reads the database and update the rules in memory.
- Before the rate limiter, there should also be a request user identification service to distinguish different users.
- There are two options when limit is reached, either drop the request or store the request in a message queue to handle later.
- To make sure different servers share the same information, we can use a distributed cache system such as redis.
# References
- This YouTube video Link
# Consistent hashing
When you have multiple cache servers, each cache servers stores certain keys. Find a solution such that the keys are consistently mapped to servers.
By definition, consitent hashing means when a hash table with _k keys is remapped from N slots to N+1 slots, only k/N keys have to be remapped.
# Algorithm
To achieve consistent hashing, create a ring and evenly distribute the servers on the ring. For keys allocated on the ring, the first server in clockwise direction is the server that stores this key.
A problem with this approach is that keys may be unevenly distributed, so in a certain position, there may be a lot of keys. A improved version of this approach to address such issue is to have every server split around the ring with N parts.
When inserting a new server, just insert to the largest gap between two servers.
# Key-Value store
# Single machine
Storeing k-v store in a single machine is easy. It's just a hashmap. We can even store frequently accessed data in memory and the rest on disk.
# Distributed system
CAP Theorem
- Consistency means every node in the distributed system show the same data
- Availability means evey some nodes are down, the system still works
- Parition tolerance means even if connection between nodes are broken, the system still works
We could do a consistent hashing with virtual nodes. And make sure every data is written to the next N replicas on the ring.
We can also design parameters such as read and write quorum.
- N is the number of replicas to store the data.
- R is the read quorum. When accessing data, we must wait for reponse from at least R replicas.
- W is the write quorum. When accessing data, we must wait for W replicas ACK the write.
when R = 1, it's in fast read mode. W = 1, in fast write mode. R + W > N means it's in consistent mode. W + R < N means consistency is not guaranteed.
Consistency Models
- Strong consistency means everytime at read, there's no stale data.
- Weak consitency may read stale data
- Eventual consistency is a special form of weak consistency. Given enough time, no data will be stale.
Conflict Handling What if one node has been written data A and another node has been written data B? We can do data versioning (i.e. give each server a priority list) to resolve conflicts. Each write will be accompanied with a version of [server, version], such as [Server A, version 1] meaning this is the first time Server A writes the data. Then create a rule to resolve conflicts.
# Google's load balancer
This is based on Maglev
# Requirements
- Connect network packet to their corresponding service and distribute them evenly to the endpoints.
- Large processing capability with small packets.
- Consist hashing and connection tracking so that connection can be recovered upon failures.