Let’s talk about one of the fundamental things that you need to know when talking about data access.
Always, always, always know your units.
Binary vs Decimal
Computers have a natural affinity for binary units, or multiples of 1024.
Historically, people (like us!) who work with computers have borrowed the SI prefixes for multiples of 1000 (K-, M-, G-, etc) and abused them to mean the closest multiple of 1024. This tradition continues right up to the present day: “64 KB” in a computing context is quite naturally interpreted as 64 * 1024 = 65536 bytes.
However, in many contexts — including data storage and bandwidth — decimal units may be used. This is not the manufacturers trying to “trick” you, but rather it’s the normal and unavoidable friction that results from the proper use of SI units as decimal values (in physics, etc) running up against the computer industry’s abuse of them as binary values.
You need to be particularly careful when reading. If you accidentally read a decimal unit as a binary unit, you’ll be in for a surprise when you’ve got less than you expected. This only gets worse as the sizes increase.
SI unit | Pronounced | Meaning | … But Sometimes | Difference |
---|---|---|---|---|
k (K) | kilo | 1,000 | 1,024 | 2.4% |
M | mega | 1,000,000 | 1,048,576 | 4.86% |
G | giga | 1,000,000,000 | 1,073,741,824 | 7.37% |
T | tera | 1,000,000,000,000 | 1,099,511,627,776 | 9.95% |
P | peta | 1,000,000,000,000,000 | 1,125,899,906,842,624 | 12.6% |
Since 1999, the proper, IEEE-approved way to write 1024 has been with the Ki- prefix, and its siblings Mi-, Gi-. These are deliberately unambiguous and clearly denote that you want a multiple of 1024 rather than a multiple of 1000.
Binary unit | Unambiguous Meaning |
---|---|
Ki | 10241 |
Mi | 10242 |
Gi | 10243 |
Ti | 10244 |
Pi | 10245 |
The NIST has an excellent page with more details. Go, read it! It’s quick and to the point.
Unfortunately, the original SI prefixes will remain ambiguous until everybody stops abusing them to mean different things. It’ll be a tough habit for us to break, though – especially when it’s become so culturally ingrained. Still, you and I can do our part.
Resolving ambiguity when writing
I can’t emphasize this enough: If you’re writing and you need to use a value that uses a binary multiple, YOU SHOULD USE THE BINARY PREFIXES. It’s not that hard: write “64 KiB” instead of “64 KB”, and boom, you’re done. All you need to do is practice it a bit and it’ll become second nature.
Ah, but what if you’re writing and you need to use decimal units? Unfortunately, if I really want to communicate that something can transfer 9 million bytes per second, that’s a bit harder to write. I can’t just say “9 MB/s” because you may assume I meant binary units but was just too dumb to write “9 MiB/s”.
In those cases, I’ve found that the simplest way to disambiguate is to write both the decimal and binary values: “The speed is 9 MB/s = 8.58 MiB/s”.
Resolving ambiguity when speaking
The IEC recommendation is to use special pronunciations with “-bi-” in the middle, as in: “kibibyte”, “gibibyte”, “mebibyte”, etc.
Frankly, this suggestion is crap.
Nobody I know actually does this, and you’re likely to confuse people and sound like a dork if you try. Go ahead, try it — say it out loud. My favorite ludicrous example is “gibibyte”. Tongue twister, isn’t it? Say it three times fast and you’re well on your way to rubber baby buggy bumpers.
Instead, I prefer to add an explicit qualifier and say “binary megabyte” or “decimal megabyte”. This is perfectly clear to every computer programmer I’ve ever spoken to, even those who aren’t aware of the binary/decimal confusion problem. It works beautifully.
In spoken contexts (and only then) I’ll happily shorten this to “megabyte” if I feel the context is clear — just as you might just say “John” if there’s only one in the room, but “John Smith” or “John Thompson” if there is more than one John nearby.
Resolving ambiguity when reading or listening
This one ultimately depends on context.
Any discussion of computer memory or RAM will typically use binary units. If someone says “allocate 64K”, it’s probably safe to assume they mean 65536 bytes.
Discussions of disk capacities or data bandwidths are normally written with decimal units. I like to imagine that this is because cramming bits onto a disk or through a wire is a task laden with physics, so the units are naturally the standard SI units.
In the middle, you’ll find a lot of confusion when you talk about transferring from disk (or network) to RAM, or vice versa. If a computer programmer is talking about how fast they can fill memory, they are probably thinking in binary units. If a filesystem or network person is talking about how fast an interface or device performs, they are probably thinking in decimal units. Be careful to use the right one!
What | Interpretation | Rewritten Without Ambiguity | |
---|---|---|---|
1 Gb/s Ethernet | decimal | 1 Gb/s (953 Mib/s) Ethernet | |
8 GB RAM | binary | 8 GiB RAM | |
2 TB hard drive | decimal | 2 TB (1.81 TiB) hard drive | |
200 MB file size | binary | 200 MiB file size | |
1.5 Mb/s cable modem | decimal | 1.5 Mb/s (1.43 Mib/s) cable modem | |
6.9 MB/s 5x DVD speed | decimal | 6.9 MB/s (6.6 MiB/s) 5x DVD speed | |
16.6 MB/s 12x DVD speed | decimal | 16.6 MB/s (15.85 MiB/s) 12x DVD speed | |
9 MB/s 2x Blu-ray speed | decimal | 9 MB/s (8.58 MiB/s) 2x Blu-ray speed | |
9.4 GB DVD capacity | decimal | 9.4 GB (8.75 GiB) DVD capacity | |
50 GB Blu-ray capacity | decimal | 50 GB (46.56 GiB) Blu-ray capacity |
What you can do
Start using binary units when writing, and, if necessary, qualify with “binary” or “decimal” when speaking. It’s easy, it’s safe, and it puts you into the cool club of kids who speak without ambiguity.
If you’re reading or talking to someone who doesn’t use the binary units, make sure you know the context.
Watch out for translation errors.
And, of course, suggest that the other person pick up the habit of using the binary units too. :-)