Skip to main content

Design a URL Shortener

A URL shortener takes a long URL and converts it into a short, unique identifier that redirects to the original link. Services like Bitly and TinyURL handle billions of such redirects daily. On the surface, the problem seems trivial. But designing a system that scales to millions of users, responds with minimal latency, and remains highly available is a classic system design interview question that tests your understanding of distributed systems, database design, and caching.

This article walks through the complete design process, from requirements to a production-ready architecture.

Step 1: Requirements Clarification

Before drawing any architecture diagrams, clarify what the system should do and under what constraints.

Functional Requirements

  • Shorten a long URL – Users provide a long URL and receive a short, unique identifier.
  • Redirect to the original URL – When anyone accesses the short URL, they are redirected to the original long URL.
  • Track click analytics (optional) – Count clicks, capture referrers, and record geographic data.
  • Custom alias (optional) – Allow users to specify their own short identifier.
  • Expiration (optional) – Links may expire after a configurable time.

Non-Functional Requirements

  • High availability – Links must resolve. If the service is down, every shortened link is broken.
  • Low latency redirects – Redirection should feel instant. Target under 50 milliseconds.
  • Scalability – The system must handle billions of URLs and millions of redirects per second.
  • Fault tolerance – Failures should not break existing links.
  • Unpredictable traffic – A single shortened URL can go viral and receive a sudden spike in traffic.

These constraints shape every decision that follows.

Step 2: Capacity Estimation

Back-of-the-envelope calculations help ground the design in realistic numbers.

Assume the system handles 100 million new URLs per month. This translates to roughly 40 writes per second on average, with peaks potentially much higher.

Redirects are the heavy load. If the average shortened URL is accessed ten times per month, redirect throughput is 1 billion reads per month, or approximately 400 redirects per second. In practice, the read-to-write ratio can exceed 100:1 for popular links.

Storage: Each URL mapping might require 500 bytes (short URL, long URL, metadata). For 100 million new URLs per month, that is 50 GB of new data per month, or 600 GB per year. A five-year retention would require about 3 TB, easily handled by modern databases.

Cache estimate: Following the 80-20 rule, 20% of URLs generate 80% of traffic. Caching these hot URLs dramatically reduces database load.

Step 3: API Design

A clean REST API keeps the system simple and interoperable.

Shorten a URL

POST /api/v1/shorten
Content-Type: application/json

{
"long_url": "https://www.example.com/very/long/path/to/resource",
"custom_alias": "my-link", // optional
"expiration_date": "2027-06-28" // optional
}

Response:
{
"short_url": "https://short.ly/abc123",
"long_url": "https://www.example.com/very/long/path/to/resource",
"created_at": "2026-06-28T10:30:00Z"
}

The API returns the shortened URL. Idempotency can be achieved by hashing the long URL and returning an existing mapping if one exists.

Redirect

GET https://short.ly/abc123

Response:
HTTP 301 Permanent Redirect
Location: https://www.example.com/very/long/path/to/resource

A 301 redirect is permanent and cachable by browsers. For analytics, a 302 temporary redirect forces the browser to hit the server each time, enabling click tracking.

Step 4: High-Level Architecture

The core components form a pipeline from client to data store.

  • Client – Browser or mobile app making requests.
  • Load Balancer – Distributes incoming traffic across application servers.
  • Application Servers – Handle shorten and redirect logic. Stateless and horizontally scalable.
  • Cache (Redis) – Stores hot URL mappings in memory for sub-millisecond lookups.
  • Database – Persistent store for all URL mappings.
  • Analytics Pipeline (optional) – Captures click events for reporting.

For a redirect, the flow is simple: the client requests a short URL, the load balancer routes it to an application server, the server checks the cache, and if found, returns the redirect. If the cache misses, the server queries the database, populates the cache, and returns the redirect.

Step 5: URL Generation Strategy

Generating unique, short identifiers is a core challenge. Several approaches exist.

Hashing

Take the long URL, apply a hash function like MD5 or SHA-256, and truncate the output. This ensures the same long URL always produces the same short code. However, truncation increases collision risk, and hash outputs are not naturally short.

Base62 Encoding

Encode a unique integer or counter into Base62 (characters a-z, A-Z, 0-9). This produces short, readable identifiers. A 7-character Base62 value supports over 3.5 trillion unique URLs. This requires a distributed counter or ID generation service.

Random String Generation

Generate a random alphanumeric string of a fixed length. Simple to implement, but collisions require database checks. With a sufficiently long string, collision probability is negligible.

Collision Handling

Collisions are inevitable at scale. Strategies include:

  • Retry on collision – Generate a new identifier and check again.
  • Pre-generate – Maintain a pool of unused identifiers.
  • Append timestamp or sequence – Reduce collision probability.

Step 6: Database Design

The primary schema is straightforward.

url_mappings:
short_url VARCHAR(10) PRIMARY KEY
long_url TEXT NOT NULL
user_id BIGINT (optional)
created_at TIMESTAMP
expires_at TIMESTAMP (optional)

Indexing: The primary key on short_url ensures fast lookups. An additional index on user_id supports listing a user's shortened URLs.

SQL vs NoSQL: A relational database like PostgreSQL works well for this workload. The schema is structured, and ACID transactions help with consistency. For extreme scale, NoSQL stores like DynamoDB offer automatic partitioning, but the read-heavy, key-value nature of URL lookups suits almost any database.

Step 7: Caching Strategy

Caching is critical for redirect latency. Without it, every redirect hits the database.

Cache-aside pattern: The application checks Redis first. On a cache miss, it queries the database and writes the mapping to Redis with a TTL.

What to cache: Popular URLs. The 80-20 rule applies strongly to shortened URLs. A small percentage of links generate the vast majority of redirects.

TTL strategy: Set a reasonable TTL (e.g., 24 hours) to evict cold data. For analytics-driven systems, use shorter TTLs if clicks must be tracked per redirect.

Eviction policy: Redis LRU eviction automatically removes less frequently accessed data when memory fills.

Step 8: Scalability Design

Each component scales differently.

  • Application servers – Stateless. Scale horizontally by adding instances behind the load balancer.
  • Database – Start with a single instance. Add read replicas as read traffic grows. When write throughput becomes the bottleneck, introduce sharding by short_url hash. A consistent hashing ring distributes data across shards.
  • Cache – Redis Cluster shards data across multiple nodes. Each node handles a subset of keys.

A sharded database architecture with read replicas can handle millions of redirects per second.

Step 9: Handling High Traffic

Viral links can bring a single short URL extreme traffic. Mitigation strategies include:

  • Load balancing – Distribute requests evenly across application servers.
  • Rate limiting – Prevent abuse of the shortening API at the gateway level.
  • CDN – If redirects are served via HTTP, a CDN can cache the redirect response at the edge, reducing origin load.
  • Hot key problem – If a single short URL generates massive traffic, that database shard may become a hotspot. Replicate that key across multiple cache nodes or cache at the application level.

Step 10: Analytics Pipeline (Optional)

For click analytics, capture events without slowing redirects.

  1. Fire and forget – The application server sends a click event to a message queue (Kafka or SQS) asynchronously.
  2. Stream processing – A stream processor like Kafka Streams or Flink aggregates events into time windows.
  3. Batch processing – Periodically write aggregated data to an analytics database (ClickHouse, Redshift) for dashboards.

This decouples analytics from the critical redirect path, ensuring low latency.

Step 11: Trade-offs

Every design decision involves trade-offs.

  • 301 vs 302 redirects – 301 is cachable and reduces server load but hides per-click analytics. 302 enables analytics but increases server load.
  • Short URL length – Shorter URLs are more user-friendly but reduce the key space and increase collision probability.
  • Database vs cache consistency – A cache TTL means redirects may temporarily fail after a database update. Acceptable for most use cases.
  • Expiration – Supporting expiration adds complexity to cache invalidation and storage reclamation.

Step 12: Failure Scenarios

Plan for common failure modes.

  • Database failure – Read replicas handle read failover. Promote a replica to primary if the primary fails.
  • Cache failure – Application falls back to the database directly. Latency increases but redirects still work.
  • Traffic spikes – Auto-scaling groups add application servers. Rate limiting protects the database.
  • Retry strategy – Transient failures are retried with exponential backoff. Idempotent design prevents duplicate URL creation.

Graceful degradation keeps the system functional even when components fail.

Step 13: Final Architecture Summary

The complete system consists of:

  • Load balancer distributing traffic to stateless application servers.
  • Application servers handling shorten and redirect requests.
  • Redis cache storing hot URL mappings for low-latency redirects.
  • Primary database with read replicas for persistent storage.
  • Optional analytics pipeline with a message queue and batch processor.

Data flow: Client request flows through the load balancer to an application server. The server checks the cache, falls back to the database on a miss, and returns the redirect. Analytics events fire asynchronously to the message queue.

The system scales horizontally at every tier. Caching absorbs the majority of read traffic. Database sharding handles write growth.

Interview Tips

  • Start simple – Propose a basic design with an application server, database, and cache. Add complexity as the interviewer probes.
  • Clarify requirements first – Confirm whether analytics, custom aliases, and expiration are needed.
  • Always mention caching – Redis or Memcached is essential for redirect latency.
  • Discuss bottlenecks – Identify the database as the first bottleneck and explain how sharding addresses it.
  • Highlight trade-offs – For example, 301 vs 302 redirects and the impact on analytics.

Learning Outcome

After completing this design walkthrough, you should be able to:

  • Apply a structured system design process from requirements to final architecture.
  • Reason about database selection, caching strategies, and scaling techniques.
  • Handle traffic spikes and failure scenarios in a real-world distributed system.
  • Confidently answer the URL shortener question in system design interviews.
  • Recognize the trade-offs inherent in every architecture decision.