Skip to content

How to Represent Text: The ASCII Code

Think about something strange: computers don't know what a letter is. They really don't!

You're reading this right now, looking at lots of letters and words on your screen, but you already know the computer only stores numbers. So the question is inevitable: how on earth is this possible?

How a Code Is Born

In the subsection What the Unary System Teaches Us, I gave you a challenge, where we agreed that a horizontal line would stand for two units instead of one, and a vertical line would keep being worth one, and I drew some lines on paper for you to try to guess how many times I looked at my cat while writing that section.

Look at the drawing again
Times I looked at my catTimes I looked at my cat
Times I looked at my cat Richard Dias Alves @ Ultimate Rust, 2025. License CC BY-NC-SA 4.0.
Answer

The answer was 7 times.

We're going to do the same thing now. I'll write a word using only numbers, and you have to try to tell me what word it is.

Ready?

Guess the word below:

7899 256

Answer

The word is: Ok.


You're probably asking yourself "but how was I supposed to know?". I have to apologize to you in advance, but the truth is there was no way. There was no way for you to know that the number meant that word with only the information you had.

When I drew the lines, I told you the horizontal line would stand for two units, and you only knew how to interpret the drawing because we agreed on the meaning of each symbol. The drawing I made, as silly as it was (it was just lines), is a code.

A code is nothing more than a system in which symbols have a previously agreed-upon meaning.

  • A zip code is a code: each set of numbers is associated with a specific location.
  • Morse code is a code: each sequence of beeps is associated with a specific letter.
  • A word is a code: each set of letters is associated with a specific meaning.

If you stop and think about it, a lot of things in the world run on codes. But it only becomes a code when there's a meaning behind it. I hadn't told you that the number 7899 represented the letter O and that the number 256 represented the letter k.


Knowing that, now we can actually create a code. Let's agree that each letter has a number associated with it, following this table:

Number1234567891011121314151617181920212223242526
LetterABCDEFGHIJKLMNOPQRSTUVWXYZ

Guess the word below:

21 12 20 9 13 1 20 5

Answer

The word is: ULTIMATE.

See? Much easier now. That's how a code is born, and it's exactly how the ASCII code was born too.

But did you notice what just happened there? No? You read numbers and managed to interpret them as letters. That's phenomenal! Think about it: when you started this section, the question was how on earth the computer could display letters if all that existed inside it were numbers (more precisely, a bunch of 0s and 1s), and now the pieces seem to be falling into place; it's starting to make sense, and there's probably an idea forming in your head that some specific numbers might mean things like letters, signs, colors, etc... and that's exactly it, but you'll understand more as the section goes on.

To illustrate better and recap what we've seen so far, let's do a quick review:

  • With several bits (0s and 1s) we can form a specific number. For example, the binary number 1001 is the same thing as the decimal number 9.
  • We can create a code between us where each number represents some specific symbol, previously agreed upon between us.

Now, let's try to do the same thing we did above, but with binary numbers. You can still use the same table from before, where each number from 1 to 26 corresponds to a letter in alphabetical order, as our code.

Guess the words below:

Hint

Use what you learned in the subsection And in Binary? to convert binary numbers to decimal.

  1. 10 1001 10100
Answer

The word is: BIT.

  1. 10010 10101 10011 10100
Answer

The word is: RUST.


What you just did is exactly what a computer does every time it needs to interpret text. It converts bits to letters and then prints them on the screen.

But pay close attention to the problem hidden here: our table was an agreement between us, between you and me. Just as a third person seeing those horizontal lines wouldn't know how to translate them into a quantity, if that same third person saw the number 18, they wouldn't know it means R. And what's more, that third person could simply have agreed with others that 18 means the letter X. Or the symbol @. Or a sheep. The code only works because the two of us are using the same table. Change the table, and you change the meaning of everything.

Now go back a few decades, and think with me about the problem that was emerging at the time: computers from different manufacturers, in different countries, used by different people, needing to exchange text with each other. This was really starting to happen, and it was CHAOS! One computer could understand 18 as X, another could understand 18 as $, and each displayed it however it understood, without throwing any errors - after all, 18 made sense to it; it just meant something else.

This isn't hypothetical. Before a standard existed, that's literally what happened. In the 1950s and 1960s, every computer manufacturer created its own code table. IBM used one, DEC another, Honeywell another, and so it went for every other manufacturer... Someone needed to sit down and decide, once and for all, which number represents which letter. And everyone needed to agree.

The solution couldn't come from a single manufacturer deciding on its own. Think with me: if IBM simply announced "from now on, everyone uses our table", DEC and Honeywell would have no obligation whatsoever to follow. Why would they? They were competitors. No one was going to give up their own table to adopt their rival's.

It needed to be something different: a neutral agreement. A table created by a neutral organization, one that didn't belong to any of the companies and had no commercial interest in any of the tables that already existed. Something like what happened when countries needed to define international time zones, or when an entire industry needs to define the standard size of a screw. Nobody "wins", everybody gives a little, and the result is a standard everyone agrees to follow.

It was exactly that kind of organization that solved this problem. In 1961, the ASA (American Standards Association), the American standards institute, formed a committee with representatives from computer manufacturers, telecommunications companies, and government agencies. For two years, they negotiated every detail: which symbols to include, which number to assign to which letter, in what order to organize everything.

In 1963, the result was published. Its name is ASCII.

The Table the World Adopted

ASCII stands for American Standard Code for Information Interchange. ASCII is nothing more than a set of symbols with numbers associated with them, just like we did above. The difference is that now the whole world agrees on the same table.

But before you see the table, a question: how many different symbols do you think ASCII defines?

Our table had 26 numbers, one for each letter of the alphabet, and that was already enough to write ULTIMATE, BIT, and RUST. But notice what it can't do: there's no way to write Ultimate instead of ULTIMATE. No way to tell R apart from r. Our table doesn't distinguish uppercase from lowercase, but if it did, how many more symbols would it need? Another 26, right? Again, one for each letter, but now in lowercase. So we'd already have 52 entries in total (26 uppercase letters + 26 lowercase letters).

And what if we wanted to represent numbers too? If I want to write "I have 1 dollar", I need a way to represent the digit 1 as output, which is different from what I use as input to get a symbol back. That's 10 more entries in our table, then, one for each digit from 0 to 9. Now we're at 62 entries.

And punctuation? ., ,, !, ?, :, ;, ', -, (, ), [, ]... Each one of those is a symbol that needs an entry. That's a few dozen more.

And the space between words? That's a symbol too - invisible, but present. Without it, the words would become thewords.

When the ASA committee sat down to put the table together and started listing everything they'd need to include to fully cover written English (uppercase letters, lowercase letters, digits, punctuation, the space, and a few more special symbols like @, #, $, &, etc...), they arrived at a total of 95 printable symbols.

But ASCII's final table doesn't have 95 entries; it has 128.

The remaining 33 entries are something you probably wouldn't expect to find in a text table: they're non-printable symbols - symbols that aren't graphical. They're instructions for the machine itself. And to understand why they exist, we need to remember that ASCII wasn't created only for computers. It was also created for telecommunications equipment, teletypes, printers... That equipment needed to receive not only "which character" to print, but also "skip to the next line", "move the cursor back to the start", "stop the transmission". Those instructions needed representation in the table too.

Those entries represent symbols we call control characters, and they occupy the first 32 positions of the ASCII table (positions 0 through 31), plus the final position (127). You use them every day without knowing it. For example, every time you hit Enter in a text or message editor, what gets recorded is character number 10 in the table, called line feed. The line feed is just one of the 33 control characters. There are many more (another 32).

Model 33 Teletype, one of the telecommunications devices ASCII was created for, photographed in 1963
Model 33 Teletype (1963)Rama & Musée Bolo, CC BY-SA 2.0 FR, via Wikimedia Commons

Here's the table they defined:

See the full ASCII table

Control characters (0-31 and 127)

NumberSymbolMeaning
0NULNull
1SOHStart of heading
2STXStart of text
3ETXEnd of text
4EOTEnd of transmission
5ENQEnquiry
6ACKAcknowledge
7BELBell
8BSBackspace
9HTHorizontal tab
10LFLine feed
11VTVertical tab
12FFForm feed
13CRCarriage return
14SOShift out
15SIShift in
16DLEData link escape
NumberSymbolMeaning
17DC1Device control 1
18DC2Device control 2
19DC3Device control 3
20DC4Device control 4
21NAKNegative acknowledge
22SYNSynchronization
23ETBEnd of block
24CANCancel
25EMEnd of medium
26SUBSubstitute
27ESCEscape
28FSFile separator
29GSGroup separator
30RSRecord separator
31USUnit separator
127DELDelete

Graphic symbols (32-126)

Number32333435363738394041424344454647
Symbol(space)!"#$%&'()*+,-./
Number48495051525354555657585960616263
Symbol0123456789:;<=>?
Number64656667686970717273747576777879
Symbol@ABCDEFGHIJKLMNO
Number80818283848586878889909192939495
SymbolPQRSTUVWXYZ[\]^_
Number96979899100101102103104105106107108109110111
Symbol`abcdefghijklmno
Number112113114115116117118119120121122123124125126
Symbolpqrstuvwxyz{|}~

Looking at the table, two questions might come up:

  1. Why does it start at 0 and not 1? The answer lies in what you've already learned about binary numbers.

As you can probably guess, each of the table's input numbers is read as bits by the computer, and to represent the 128 possible entries, 7 bits are enough. With 7 bits you can represent 128 different values, and those values are 0, 1, 2, 3, ... up to 127. In binary, that goes from 0000000 to 1111111. Zero is a legitimate value, just as representable as any other. If the table started at 1 and went up to 128, you'd be wasting one of the 128 possible bit combinations (the combination 0000000 wouldn't represent anything), and you'd still need one extra bit to represent the number 128 (10000000). And waste was something the committee couldn't afford, considering each extra bit was very expensive at the time.

So the table starts at 0 not just because, but because 0 is the first number that 7 bits can represent. The table uses all 128 possible combinations, from 0000000 to 1111111, without wasting a single one.

This also closes the loop on the subsection The Logic of Positions, where I said it was extremely important to remember 0 as the first place, not 1.

  1. And why 128 total?

Well, you've already seen that fewer than 128 was inconvenient, because just counting the necessary important symbols, we'd already reached a quantity that needed 7 bits to be represented, so since 7 bits were going to be used anyway, might as well take advantage and use all 128 combinations, right?

But why not use more than 7 bits, then? That way we'd have more representable symbols, wouldn't we? If we used 8 bits we'd have another 128 combinations, totaling 256, since 2⁸ = 256. The answer goes back to the fact that, at the time, every extra bit was very expensive, and the 128 symbols they had already settled on were perfectly capable of representing everything written English needed, with room to spare for the control characters.

Two Details Worth Your Attention

Looking at the table, you probably also noticed two more curious things.

The first is that the digit 0 is not at position 0 of the table; it's at position 48. Why? This might seem strange or confusing, but it makes sense once you understand the distinction between position and symbol. The confusion can happen because the symbol 0 and position 0 of the table seem like the same thing, but they're not.

Think of it this way: the letter A isn't at position 1 just because it's the first letter of the alphabet, like we did in the code the two of us created. It's at position 65, because that's the position the committee chose for it. The A is a symbol, a visual drawing, and that symbol was given the number 65.

The 0 is exactly the same thing: a drawing. The visual symbol that represents zero. That drawing was given the number 48 in the table, just as the A was given 65 and the B was given 66.

Position 0 of the table belongs to the control character called null - an invisible instruction used in the programming world to mark the end of a piece of text, with no visual relationship whatsoever to the symbol 0. The null lives at position zero, but the symbol 0 you type on your keyboard lives at another position (48). The null was placed at the first position intentionally, not by chance.

But there's a second curious thing, and this one is even more interesting.

The uppercase A is at position 65, and the lowercase a is at position 97. The difference is exactly 32. In the table, the distance from lowercase letters to uppercase will always be 32, and from uppercase to lowercase, always -32.

Check it with a few other letters:

  • The B is at position 66; adding 32 (66 + 32 = 98), the b really is at position 98.
  • The C is at position 67; adding 32, we find the c at position 99.

This holds for all 26 letters without exception.

This wasn't a coincidence or an accident. It was a conscious choice by the committee, and the reason is so elegant it's almost beautiful: a difference of exactly 32 means that, in binary, the only thing that changes between an uppercase letter and its lowercase counterpart is a single bit (the sixth from the right). See:

ASCII to binaryASCII to binary
ASCII to binary Richard Dias Alves @ Ultimate Rust, 2025. License CC BY-NC-SA 4.0.

This is useful because, this way, for programmers to convert an uppercase letter to lowercase, or vice versa, it's the simplest operation a computer can possibly do: just flip a single bit on or off. The designers of ASCII didn't put the letters in those positions by accident. They chose the positions with exactly this in mind.

Note

Okay, I need to pause the technical tone here just to say: this is beautiful. There are things that deserve a moment of pure appreciation - and this is one of them. A committee of engineers in the 60s, without knowing exactly what kind of future they were building, made a choice so well thought out that it still resonates today in every computer on the planet. That's not just engineering, it's almost poetry.

WARNING

Don't confuse "nth position" with "position n".

To put it more plainly: terms like "position 4" or "position 10", for example, are different from "fourth position" and "tenth position". Remember that positions start from 0, so the first position is position 0, the second position is position 1, and so on...

The Limitation of ASCII

ASCII solved a huge problem for its time. Any computer that followed the standard could exchange text with any other. A message written on an IBM machine could now arrive readable on a DEC machine. The chaos was over.

But... like everything else we've seen so far, ASCII isn't perfect either and has its limitations. Notice a word you've already read a few times in this section, maybe without paying attention to it: American. The acronym ASCII stands for American Standard Code for Information Interchange. And that's not just a detail.

ASCII was designed for the English language, at a moment when computers were used almost exclusively by Americans, in academic and military contexts. For them, 128 symbols were more than enough. But as computers started reaching the rest of the world, that limit started to hurt.

English has no accents, and that's reflected in ASCII. Among the 95 printable symbols, there's no way to write the Spanish ñ, no German ü, no French é, no Portuguese ç, no Norwegian ø. Not a single Arabic character, no Chinese, no Japanese, no Hindi, no Greek or Russian... With only 128 positions, 33 of which were already taken by control characters, there simply wasn't room to include all of those. And this isn't a mistake; it wasn't "deliberate exclusion". It was simply enough for the needs and context of that era. But as I said in the paragraph above, as more countries joined the international communication network, they needed their characters represented too.

Fun fact

To this day, when a computer receives an instruction to display some character it can't find in its table, it doesn't simply crash or throw an error. It displays something like ? in its place and moves on. It's the way systems found to say "I received something I don't know what it is".

In practice, this means the word "codificação", typed on a Brazilian computer and displayed on a computer that only knows pure ASCII, arrives as codifica??o. The ç and the ã turn into ?. It's not a defect; it's just doing what it was programmed to do. Those symbols simply don't exist in its vocabulary.

Programmers from non-Anglophone countries (the ones where English isn't the native language) lived with this problem for decades - adapting, improvising, creating partial solutions that solved the problem for one language but broke it for another. It was the same chaos as before ASCII, except this time on a global scale. And the solution that emerged followed exactly the same logic that created ASCII: an agreement. But this time, an agreement on a scale the 1961 ASA committee probably never would have imagined possible. Not between a few American entities, but an agreement with the entire world. But how do you do that?

That project exists, is still in use today in every modern computer, including the one you're using right now - whether it's a laptop, phone, tablet... and it's the reason you can read everything on your computer's screen. It's called Unicode.

It's one of the most ambitious collaborative projects computing has ever produced. And it's what we'll see next.

Before moving on

  1. Can you explain how computers, which only understand numbers, can suddenly display all kinds of different symbols, like letters, punctuation marks, etc...?

  2. Can you explain what character encoding is and why it's necessary?

  3. Can you say why ASCII has 128 characters, no more, no less?

    • And why, despite having solved a huge problem for its time, it wasn't enough for the whole world?

If you can answer all of those questions, you're ready for the next section.