BaseToolbox

There's a particular comfort in UUIDs. Generate one, use it as a database primary key, never think about collisions again. The math says collisions are effectively impossible, so we don't worry about it.

Except the math depends on things we rarely verify.

Quick History: Why UUIDs Exist

Before UUIDs, generating globally unique identifiers required coordination. You'd need a central authority handing out numbers, or clever schemes with prefixes and suffixes. This works fine in a single system but falls apart in distributed environments.

In 1997, the Open Software Foundation (now The Open Group) standardized the UUID format: 128 bits, typically displayed as 32 hexadecimal characters with hyphens. The genius was that different generation methods could produce non-colliding identifiers without any coordination.

That's where "versions" come in.

UUID Versions: Not All Randomness Is Created Equal

UUID v1: Time + MAC address

Version 1 combines a timestamp (in 100-nanosecond intervals since October 1582) with your network card's MAC address. This guarantees uniqueness without randomness—if your clock never goes backward and your MAC address is unique, collisions are impossible.

The downside: your MAC address is now embedded in every ID you generate. In some contexts, this is a privacy concern. Also, if multiple processes on the same machine generate v1 UUIDs simultaneously, clock precision becomes an issue.

UUID v4: Pure Random

Version 4 is what most people mean when they say "UUID." 122 bits of random data (6 bits are reserved for version and variant markers). No timestamp, no hardware identifier, just randomness.

The collision probability is astronomically low—something like 1 in 2^122 for any pair. But here's the catch: this assumes your random number generator is actually random.

On a virtual machine with limited entropy, or early in system boot before the entropy pool fills, your "random" UUIDs might not be random at all. Most production systems are fine. Most.

UUID v7: The New Standard

Version 7 is the newest and, for many use cases, the best choice. It embeds a Unix timestamp in the first bits, followed by random data. This gives you:

Natural sorting by creation time (great for databases)
Guaranteed uniqueness through randomness
Privacy-friendly (no MAC address)

If you're choosing a UUID version today for new work, v7 is usually the right answer.

The Collision Question

"But the probability is so low it's basically zero!"

True, but let's be precise. For UUID v4:

Generating 1 billion UUIDs gives you a collision probability of about 0.00000000000000001%
You'd need to generate 2.71 quintillion UUIDs to have a 50% chance of one collision

These are comforting numbers. But they assume:

Your random number generator is cryptographically sound
You're not accidentally reusing seeds
Your system clock hasn't drifted (for v1/v7)
You haven't hard-coded a "test UUID" somewhere that made it to production

I've seen collision bugs from all four causes.

Database Indexing: The Hidden Cost

Here's something rarely mentioned in UUID tutorials: random UUIDs are terrible for database indexes.

B-tree indexes (used by most databases) work best when new values are inserted near each other. Sequential integers? Perfect. Random UUIDs? Worst case. Every insert potentially goes to a different part of the index, causing fragmentation and degraded performance.

UUID v7 partially solves this by being time-ordered—recent UUIDs are numerically close to each other, so inserts cluster together.

If you're using UUIDs as primary keys in a high-write database, the version matters a lot.

When Not to Use UUIDs

UUIDs aren't always the right choice:

User-facing identifiers. A 36-character string is ugly and hard to remember. Short IDs (like YouTube video IDs) are better for URLs and sharing.

Storage-constrained systems. 128 bits per ID adds up when you have billions of records. Some teams use 64-bit IDs instead.

Sequential ordering is required. If you need guaranteed ordering, use sequential integers or timestamps, not random UUIDs.

The Takeaway

UUIDs are a solved problem, mostly. For everyday use—generating an ID for a database record, creating an API request correlation ID, naming a temporary file—grab a v4 or v7 UUID and move on.

Just remember: "effectively impossible" still has edge cases. Trust but verify your randomness sources, pick the right version for your use case, and don't store UUIDs in indexed database columns unless you understand the trade-offs.

Except the math depends on things we rarely verify.

Quick History: Why UUIDs Exist

That's where "versions" come in.

UUID Versions: Not All Randomness Is Created Equal

UUID v1: Time + MAC address

UUID v4: Pure Random

Version 4 is what most people mean when they say "UUID." 122 bits of random data (6 bits are reserved for version and variant markers). No timestamp, no hardware identifier, just randomness.

The collision probability is astronomically low—something like 1 in 2^122 for any pair. But here's the catch: this assumes your random number generator is actually random.

On a virtual machine with limited entropy, or early in system boot before the entropy pool fills, your "random" UUIDs might not be random at all. Most production systems are fine. Most.

UUID v7: The New Standard

Version 7 is the newest and, for many use cases, the best choice. It embeds a Unix timestamp in the first bits, followed by random data. This gives you:

Natural sorting by creation time (great for databases)
Guaranteed uniqueness through randomness
Privacy-friendly (no MAC address)

If you're choosing a UUID version today for new work, v7 is usually the right answer.

The Collision Question

"But the probability is so low it's basically zero!"

True, but let's be precise. For UUID v4:

Generating 1 billion UUIDs gives you a collision probability of about 0.00000000000000001%
You'd need to generate 2.71 quintillion UUIDs to have a 50% chance of one collision

These are comforting numbers. But they assume:

Your random number generator is cryptographically sound
You're not accidentally reusing seeds
Your system clock hasn't drifted (for v1/v7)
You haven't hard-coded a "test UUID" somewhere that made it to production

I've seen collision bugs from all four causes.

Database Indexing: The Hidden Cost

Here's something rarely mentioned in UUID tutorials: random UUIDs are terrible for database indexes.

UUID v7 partially solves this by being time-ordered—recent UUIDs are numerically close to each other, so inserts cluster together.

If you're using UUIDs as primary keys in a high-write database, the version matters a lot.

When Not to Use UUIDs

UUIDs aren't always the right choice:

User-facing identifiers. A 36-character string is ugly and hard to remember. Short IDs (like YouTube video IDs) are better for URLs and sharing.

Storage-constrained systems. 128 bits per ID adds up when you have billions of records. Some teams use 64-bit IDs instead.

Sequential ordering is required. If you need guaranteed ordering, use sequential integers or timestamps, not random UUIDs.

The Takeaway

UUIDs are a solved problem, mostly. For everyday use—generating an ID for a database record, creating an API request correlation ID, naming a temporary file—grab a v4 or v7 UUID and move on.

UUIDs Are Not As Random As You Think

Quick History: Why UUIDs Exist

UUID Versions: Not All Randomness Is Created Equal

The Collision Question

Database Indexing: The Hidden Cost

When Not to Use UUIDs

The Takeaway

Ready to try it yourself?

UUIDs Are Not As Random As You Think

Quick History: Why UUIDs Exist

UUID Versions: Not All Randomness Is Created Equal

The Collision Question

Database Indexing: The Hidden Cost

When Not to Use UUIDs

The Takeaway

Ready to try it yourself?