Let's design a system for URL shortening, like TinyURL. Assume a large-scale service. Please consider the following aspects in your design:
Functional Requirements:
Non-Functional Requirements:
Capacity Estimation:
Design Considerations:
Walk me through your design, explaining your choices and the trade-offs you considered. Provide diagrams where necessary to illustrate the architecture.
The system consists of:
[Diagram of the high-level architecture: Client <-> URL Shortening Service <-> Data Store, Client <-> URL Redirection Service <-> Data Store]
Using a relational database.
URLs Table:
Field | Type | Description |
---|---|---|
id | BIGINT | Primary key, auto-increment |
short_url | VARCHAR(8) | Shortened URL identifier |
original_url | VARCHAR(2048) | Original URL |
created_at | TIMESTAMP | Timestamp of URL shortening |
expires_at | TIMESTAMP | Optional expiration timestamp |
Indexes:
short_url
for fast lookups.created_at
for archival purposes./shorten
{
"long_url": "https://www.example.com/very/long/url"
}
{
"short_url": "http://short.url/xyz123"
}
/xyz123
(where xyz123
is the short URL)Base 62 Encoding: Use digits (0-9), lowercase letters (a-z), and uppercase letters (A-Z) for a total of 62 characters. A 6-character short link can represent 62^6 possible URLs.
ID Generation: Use an auto-incrementing ID from the database. Convert the ID to base 62.
Example: ID = 12345 -> short_url = "dnh" (base 62 encoded)
Collision Handling:
short_url
column in the database.id
or short_url
to distribute the load.Component | Approach | Pros | Cons | Alternatives | Pros | Cons |
---|---|---|---|---|---|---|
Short Link Generation | Base62 Encoding of ID | Simple, easy to implement, guaranteed uniqueness (with collision handling) | Requires database access for ID generation | UUID Generation | No database access for short link generation | Higher chance of collisions, longer short URLs |
Data Store | Relational Database (MySQL) | Strong consistency, supports complex queries, mature technology | Can be a bottleneck for high read/write workloads, requires schema management | NoSQL Database (e.g., Cassandra) | High scalability, better suited for write-heavy workloads | Eventual consistency, more complex data modeling |
Caching | Redis | Fast, in-memory data store, supports various data structures | Data loss in case of failure, requires careful cache invalidation strategy | Memcached | Simpler than Redis | Fewer features than Redis |
Redirection | HTTP 302 Redirects | Simple, standard, supported by all browsers | Adds an extra hop, slightly higher latency | HTTP 301 Redirects | Cached by browsers, lower latency for subsequent requests | Not suitable if the mapping changes frequently |
Instead of encoding an auto-incrementing ID, we can use UUIDs (Universally Unique Identifiers). We can either use the UUID directly (which is very long) or hash the UUID and take the first 6-8 characters.
Pros:
Cons:
Use Bloom filters to check if a short URL exists before querying the database. This can reduce the number of database lookups.
Pros:
Cons:
short.url/mybrand
).