Computers operate using numbers. They therefore there need a way for a computer to represent letters (and other "characters") as numbers, so that they can be stored in computer memory, manipulated by a computer, or sent via communication links. Early systems communications systems used a range of methods. Some used a compact representation with a variable number of bits per character (e.g., morse encoding). Otehr used a fixed number of bits to represent each character, (e.g., Baudot codes, ITA-2, and ITA-5).
With the introduction of 8-bit computers, a standard format was specified, known as "ASCII" (American Standard Code for Information Interchange). ASCII was initially developed for tasks such as sending documents to printers, and many of the control commands (with the lowest value) make sense in this context.
Because ASCII is a 7-bit code, it can represent 128 different characters. When an ASCII character is stored in a byte the most significant bit is always zero (forming an 8-bit ASCII value).There are assigned unique codes for each letter (a-z and A-Z), the numeric digits (0-9), punctuation marks (,./?), etc. Some characters are not printing characters (see the first two columns of the following table). These characters represent control functions such as ejecting a sheet of paper from a printer (FF, Form Feed), or ring a bell! (BEL); or the start of a message header (SOH) and DEL which is used to erase a paper tape.
Sometimes the most significant bit is used to indicate that the byte is not an ASCII character, but is a graphics symbol, however this is not defined by ASCII.
Note: There is also a difference between the 8-bit binary representation of the number zero (00000000) and the corresponding ASCII digit '0' (00110000).
Each ASCII character can be identified according to its position within the ASCII Table.
MS 3 bits 0 1 2 3 4 5 6 7 LS 4 bits\ 0 NUL DLE SP 0 @ P ' p 1 SOH DC1 ! 1 A Q a q 2 STX DC2 " 2 B R b r 3 ETX DC3 # 3 C S c s 4 EOT DC4 $ 4 D T d t 5 NQ NAK % 5 E U e u 6 ACK SYN & 6 F V f v 7 BEL ETB ' 7 G W g w 8 BS CAN ( 8 H X h v 9 HT EM ) 9 I Y i y A LF SUB * : J Z j z B VT ESC + ; K [ k { C FF FS , < L \ l | D CR GS - = M ] m } E SO RS . > N ^ n ~ F SI US / ? O _ o DEL
Notes:
Numbers can be represented by a binary value. However, it is necessary to also know the format used. This could be as simple as encoding a value 0 to 355 as a single byte. Or using a group of bytes to represented a signed of unsigned number. It could also be more complex where the encoding uses a scientifix format, or some other representation.
As an example of the use of ASCII, consider the problem of printing the result of a numerical calculation on a terminal, or sending it over a communications line.
Suppose the number to be printed is (in binary) 01101100;
The first step is to convert this into decimal;
The answer is 108;
Each digit may be represented by the Hex codes of 0x01, 0x00, and 0x08
Each of these digits must be converted to an ASCII character code;
The corresponding codes representing the numeric digitals are (in Hex) 0x31, 0x30 and 0x38:
As a set of bytes, in binary these are: 00110001, 00110000 and 00111000.
These three values are sent as a sequence of bytes.
The receiver needs to recognise the character set and can then print these codes as a sequence of characters: "108".
The ASCII Character Set is often used, but is based on the English alphabet - which is not so good if the need is to communicate in a different alphabet, or to communicate graphics, emojii, etc.
Unicode is defined by the Unicode Consortium, founded in 1988. It is now commonly used to represent characters. Unlike ASCII, Unicode provides a way to support speakers of many languages. Each character is assigned a value (code point), used to represent the characters in computer memory and storage systems and specifications. There are 1,114,112 (17 ⨉ 2**16) code points; as of Unicode 16.0 (2024), about 155,000 have been assigned to characters.
For example, an upper case “A”, is represented by decimal 65 (0x41), and in UNICODE is expressed as U+0041, and a Black Heart, is represented by decimal 128420, expressed as U+1F5A4.
If text needs to be represeneted, it is usually stored as a string. This is a series of ASCII or Unicode characters, each one of which is stored as value. Using ASCII each character is stored in one byte. The formatting characters such as space, carriage return and line feed may be included in the string.
Some method is needed for indicating the length of the string, or where the end of the string is. There are two main methods:
The programmer must, of course, know the convention that is being used. There is nothing to distinguish bits that represent a number, from bits that reprersent characters; the receiver has to know how the bits are supposed to be interpretted before you can do anything with them.
The second way is most commonly used in C programs (see also the "pig" and "dog" examples). Note that a more sophisticated method of storing text (say with a word processing program) where you want to store details about the font, or the size of characters for example, you need other information as well; but the actual information about the text will still could be stored as ASCII (or Unicode) characters.
The input to a computer program is usually a set of strings; a high level language like C not only has lots of functions that can handle strings like this (e.g. strcat(), strcpy(), len()); but when it is actually running its compiler, it is using those same functions to read in the program, which is presented as a series of characters. Some microprocessors and computers have special instructions to handle strings of characters efficiently.
When Unicode is used, more than one byte may be needed to represent a single character.
See also:
Prof. Gorry Fairhurst, School of Engineering, University of Aberdeen, Scotland. (2025)