How would you design a URL shortening system like TinyURL?

Medium
2 years ago

Let's design a system for URL shortening, like TinyURL. Assume a large-scale service. Please consider the following aspects in your design:

  1. Functional Requirements:

    • The system should generate shorter URLs (a.k.a. "short links") given longer URLs.
    • Users should be able to navigate to the original URL by entering the short link in their browser.
    • The system should be highly available.
  2. Non-Functional Requirements:

    • The system should have low latency for both URL shortening and redirection.
    • Short links should be relatively short (e.g., 6-8 characters).
    • The system should be able to handle a large number of URL shortening and redirection requests.
  3. Capacity Estimation:

    • Assume 100 million URLs are shortened per day.
    • Assume 1 billion URL redirections per day.
    • Estimate storage requirements for 5 years.
  4. Design Considerations:

    • How would you generate unique short links?
    • What data structures and databases would you use to store the URL mappings?
    • How would you handle collisions when generating short links?
    • How would you optimize for low latency redirection?
    • How would you scale the system to handle the expected traffic?

Walk me through your design, explaining your choices and the trade-offs you considered. Provide diagrams where necessary to illustrate the architecture.

Sample Answer

URL Shortening System Design

1. Requirements

Functional Requirements:

  • Generate shorter URLs (short links) given longer URLs.
  • Users navigate to the original URL using the short link.
  • High availability.

Non-Functional Requirements:

  • Low latency for shortening and redirection.
  • Short links should be short (6-8 characters).
  • Handle a large number of requests.

Capacity Estimation:

  • 100 million URLs shortened per day.
  • 1 billion URL redirections per day.
  • Storage for 5 years.

2. High-Level Design

The system consists of:

  1. URL Shortening Service: Accepts long URLs and generates short URLs.
  2. URL Redirection Service: Resolves short URLs to their original URLs.
  3. Data Store: Stores the mappings between short and long URLs.
[Diagram of the high-level architecture: Client <-> URL Shortening Service <-> Data Store, Client <-> URL Redirection Service <-> Data Store]

3. Data Model

Using a relational database.

URLs Table:

FieldTypeDescription
idBIGINTPrimary key, auto-increment
short_urlVARCHAR(8)Shortened URL identifier
original_urlVARCHAR(2048)Original URL
created_atTIMESTAMPTimestamp of URL shortening
expires_atTIMESTAMPOptional expiration timestamp

Indexes:

  • Index on short_url for fast lookups.
  • Index on created_at for archival purposes.

4. Endpoints

URL Shortening Endpoint:

  • Endpoint: /shorten
  • Method: POST
  • Request:
    {
      "long_url": "https://www.example.com/very/long/url"
    }
    
  • Response:
    {
      "short_url": "http://short.url/xyz123"
    }
    

URL Redirection Endpoint:

  • Endpoint: /xyz123 (where xyz123 is the short URL)
  • Method: GET
  • Response: HTTP 302 Redirect to the original URL.

5. Design Considerations

Generating Unique Short Links:

  1. Base 62 Encoding: Use digits (0-9), lowercase letters (a-z), and uppercase letters (A-Z) for a total of 62 characters. A 6-character short link can represent 62^6 possible URLs.

  2. ID Generation: Use an auto-incrementing ID from the database. Convert the ID to base 62.

    Example: ID = 12345 -> short_url = "dnh" (base 62 encoded)

  3. Collision Handling:

    • Before inserting a new short URL, check if it already exists in the database. If it does, generate a new ID and try again. To mitigate repeated collisions, implement a retry mechanism with a limit.

Data Structures and Databases:

  • Database: Relational database (e.g., MySQL, PostgreSQL) for durability and consistency.
  • Cache: Use a caching layer (e.g., Redis, Memcached) in front of the database to reduce latency for frequently accessed URLs.

Optimizing for Low Latency Redirection:

  1. Caching:
    • Cache short-to-long URL mappings in a distributed cache.
    • Use a CDN (Content Delivery Network) to cache redirection responses closer to the users.
  2. Database Optimization:
    • Index the short_url column in the database.
    • Use database connection pooling.
  3. Load Balancing:
    • Distribute traffic across multiple redirection servers using a load balancer.

Scaling the System:

  1. Horizontal Scaling: Add more servers to the URL Shortening and Redirection services.
  2. Database Sharding: Shard the database based on the id or short_url to distribute the load.
  3. Replication: Use database replication for read scalability and fault tolerance.

6. Trade-offs

ComponentApproachProsConsAlternativesProsCons
Short Link GenerationBase62 Encoding of IDSimple, easy to implement, guaranteed uniqueness (with collision handling)Requires database access for ID generationUUID GenerationNo database access for short link generationHigher chance of collisions, longer short URLs
Data StoreRelational Database (MySQL)Strong consistency, supports complex queries, mature technologyCan be a bottleneck for high read/write workloads, requires schema managementNoSQL Database (e.g., Cassandra)High scalability, better suited for write-heavy workloadsEventual consistency, more complex data modeling
CachingRedisFast, in-memory data store, supports various data structuresData loss in case of failure, requires careful cache invalidation strategyMemcachedSimpler than RedisFewer features than Redis
RedirectionHTTP 302 RedirectsSimple, standard, supported by all browsersAdds an extra hop, slightly higher latencyHTTP 301 RedirectsCached by browsers, lower latency for subsequent requestsNot suitable if the mapping changes frequently

7. Other Approaches

Approach 1: Using UUIDs

Instead of encoding an auto-incrementing ID, we can use UUIDs (Universally Unique Identifiers). We can either use the UUID directly (which is very long) or hash the UUID and take the first 6-8 characters.

Pros:

  • No need for a database to generate IDs.
  • Can generate short links independently.

Cons:

  • Higher chance of collisions (though still very low).
  • Longer short URLs if using the full UUID.

Approach 2: Bloom Filters

Use Bloom filters to check if a short URL exists before querying the database. This can reduce the number of database lookups.

Pros:

  • Reduces database load.

Cons:

  • Bloom filters can have false positives (but no false negatives).
  • Requires extra memory to store the Bloom filter.

8. Edge Cases

  1. High Collision Rate: Implement a retry mechanism with exponential backoff when generating short URLs. If the collision rate is consistently high, consider increasing the length of the short URLs.
  2. Malicious URLs: Implement a URL filtering system to prevent shortening of malicious URLs. Use a third-party service or maintain a blacklist of known malicious domains.
  3. URL Expiration: Allow users to set an expiration date for short URLs. Implement a background process to delete expired URLs from the database.
  4. Spam/Abuse: Implement rate limiting to prevent abuse of the URL shortening service. Monitor usage patterns and block suspicious activity.
  5. Database Downtime: Ensure high availability of the database using replication and failover mechanisms. Use caching to serve requests even during database downtime.

9. Future Considerations

  1. Custom Short URLs: Allow users to create custom short URLs (e.g., short.url/mybrand).
  2. Analytics: Track the number of clicks on each short URL, geographic location of users, and other relevant metrics.
  3. API Usage: Provide an API for developers to programmatically shorten URLs.
  4. Mobile Apps: Develop mobile apps for URL shortening and redirection.
  5. Link Preview Generation: Generate link previews with titles, descriptions, and images when sharing short URLs on social media.
  6. Support for different encoding schemes: Explore other encoding schemes such as base36 or variable length encoding to potentially shorten the URL further.