# System Design

# Framework for the interview

  1. Clarify the problem and define the scope
  2. Propose high-level design and get buy in
    1. Come up with blueprint, ask for feedbacks
    2. Draw diagrams
    3. Do back-of-the-envelop calculation to validate the blueprint
  3. Deep dive into the design
  4. Wrap up
    1. Traffic bottleneck
    2. Security hole
    3. Single point of failure

# Design a rate limiter

Benefits of a rate limiter

  1. Prevent from DoS (deniel of serice) attack.
  2. Reduce cost for unnecessary calls.
  3. Prevent servers from being overloaded.

# Clarification

  1. What types of rate limiter is this, client side or server side? Server
  2. What property does the rate limiter throttle on, IP/User/etc? It should be a general rate limiter
  3. Is the rate limiter a service or embedded into code? Up to you.

Summary

This is a rate limiter which

  1. Limites excessive requests
  2. Introduce little latency to http traffic
  3. Applicable to distributed servers
  4. Does not impact the service if it fails

# Algorighms

  1. Token bucket
    • Tokens are refilled at certain intervals, each request consumes a token.
    • It takes two params: bucket size and refill rate.
    • Burst is allowed as long as there are tokens in the bucket.
  2. Leak bucket
    • At the end of token bucket, add a queue to process requests at fixed rate.
    • Burst requests are allowed but will be processed in a queue with fixed rate.
  3. Fixed window counter
    • At certain interval, e.g. every 1 min, count how many requests are sent and drop any requests that exceeds limit. Reset the count when the next window starts.
    • There may be over flow if a bunch of requests sent at the end of previous window and the beginning of the next window.
  4. Sliding window log
    • Keep logging the timestamp. If in the designed time window the number of requests are more than limit, just do log and no op.
    • It consumes a lot of memory because all the timestamps are recorded.
  5. Sliding window counter

# Deep Dive

  1. How to guarantee consistency and avoid race happens?
    • Maybe all rate limiters can read and write to the same redis service.
  2. Architecture
    • There should be a redis service for storing the request info
    • There should be configuration service to set up the rate limiting rules. The rules can be stored in a database and the rate limiter constantly reads the database and update the rules in memory.
    • Before the rate limiter, there should also be a request user identification service to distinguish different users.
    • There are two options when limit is reached, either drop the request or store the request in a message queue to handle later.
    • To make sure different servers share the same information, we can use a distributed cache system such as redis.

# References

  1. This YouTube video Link

# Consistent hashing

When you have multiple cache servers, each cache servers stores certain keys. Find a solution such that the keys are consistently mapped to servers.

By definition, consitent hashing means when a hash table with _k keys is remapped from N slots to N+1 slots, only k/N keys have to be remapped.

# Algorithm

To achieve consistent hashing, create a ring and evenly distribute the servers on the ring. For keys allocated on the ring, the first server in clockwise direction is the server that stores this key.

A problem with this approach is that keys may be unevenly distributed, so in a certain position, there may be a lot of keys. A improved version of this approach to address such issue is to have every server split around the ring with N parts.

When inserting a new server, just insert to the largest gap between two servers.

# Key-Value store

# Single machine

Storeing k-v store in a single machine is easy. It's just a hashmap. We can even store frequently accessed data in memory and the rest on disk.

# Distributed system

CAP Theorem

  1. Consistency means every node in the distributed system show the same data
  2. Availability means evey some nodes are down, the system still works
  3. Parition tolerance means even if connection between nodes are broken, the system still works

We could do a consistent hashing with virtual nodes. And make sure every data is written to the next N replicas on the ring.

We can also design parameters such as read and write quorum.

  1. N is the number of replicas to store the data.
  2. R is the read quorum. When accessing data, we must wait for reponse from at least R replicas.
  3. W is the write quorum. When accessing data, we must wait for W replicas ACK the write.

when R = 1, it's in fast read mode. W = 1, in fast write mode. R + W > N means it's in consistent mode. W + R < N means consistency is not guaranteed.

Consistency Models

  1. Strong consistency means everytime at read, there's no stale data.
  2. Weak consitency may read stale data
  3. Eventual consistency is a special form of weak consistency. Given enough time, no data will be stale.

Conflict Handling What if one node has been written data A and another node has been written data B? We can do data versioning (i.e. give each server a priority list) to resolve conflicts. Each write will be accompanied with a version of [server, version], such as [Server A, version 1] meaning this is the first time Server A writes the data. Then create a rule to resolve conflicts.

# Google's load balancer

This is based on Maglev

# Requirements

  1. Connect network packet to their corresponding service and distribute them evenly to the endpoints.
  2. Large processing capability with small packets.
  3. Consist hashing and connection tracking so that connection can be recovered upon failures.