Introduction to binary system

This chapter is a very brief introduction to binary arithmetic – the fundamental aspect of digital electronics and the foundation stone of modern computing systems. You are welcome to proceed to the next chapter if you feel confident enough in all related to basic arithmetic operations in the binary world.

Bits, bytes, and friends

In the beginning, there was a bit – the minimal representation of information, which could only tell “yes” or “no”, “true” or “false”, “on” or “off”. A bit, as an information representation means, may be considered one of the most significant inventions along with the discovery of penicillin and alike. It is hard to imagine the modern world without bits, as almost everything we see around us is working or simply exists in virtue of bits. However, how much information can one represent with a single bit? Well, just “true” or “false”. Grouping bits in sequences is what makes the huge difference. For example, this book was initially a sequence of bits too, albeit, this would be a rather complex example to consider. Let us take a look at something simpler – Morse code representation of the “SOS” signal, which is “. . . _ _ _ . . .”. And now, let us convert it to a sequence of bits, where the signal is represented by ‘1’ and spaces between signals are represented by ‘0’, in which case we get ‘10101011011011010101”, with ‘1’ and ‘11’ being short and long signals respectively. Such encoding is called “binary” (prefix “bi” means two). Interestingly enough, “SOS” signal in such form takes precisely 20 bits – the bit width of Intel 8086 processor’s data bus. Such arrangement, however, is far from being ideal – just imagine what a mess we could have if we had to come up with specific binary representation for everything. Instead, some unification was needed, and that is why bits were grouped into bytes.

It is a widespread misperception, that byte contains precisely 8 bits. In fact, 8 bits are called octet[1] while byte may be of any reasonable size. Below is how Werner Buchholz explains it in “Planning a computer system”:

“Byte denotes a group of bits used to encode a character or the number of bits transmitted in parallel to and from input-output units. A term other than character is used here because a given character may be represented in different applications by more than one code, and different codes may use different numbers of bits (i.e., different byte sizes).” (Buchholz, 1962)

As we see, a byte is rather a minimum addressing unit (although, in case of “Project Stretch”, such claim is incorrect), which is exactly 8 bits on x86 systems. Although, there are specific instructions operating on nibbles[2] or even on individual bits, the minimum addressing unit is still 8 bits. ASCII character code, where each byte corresponds to a single symbol or control code, is an excellent example of how useful such organization of bits may be. However, no matter how good the idea is, it needs a better way of representation than bits. You would agree, that denoting ASCII code for letter ‘A’ as 01000001b is not entirely comfortable. Therefore, any value that occupies more than a single bit is usually represented in hexadecimal[3] or decimal notation. Hexadecimal notation, however, is much preferred as it allows identification of individual bytes.

Hexadecimal notation represents groups of four bits as a single character using numbers from 0 to 9 and letters from ‘a’ to ‘f’, where 0 is, well, 0 (binary 0000b) and ‘f’ is 15 (binary 1111b). Numbers in hexadecimal notation are either preceded by ‘0x’ or followed by ‘h’ as 0x1234 or 1234h. An important detail to remember when using the ‘h’ notation is that numerical values cannot start with a letter – the compiler will not know that you intended to write a hexadecimal number in such case. To solve this problem, prepend ‘0’ to a number like in 0CAFEh.


Decimal	Binary	Hexadecimal	Decimal	Binary	Hexadecimal
0	0000	0x0	8	1000	0x8
1	0001	0x1	9	1001	0x9
2	0010	0x2	10	1010	0xA
3	0011	0x3	11	1011	0xB
4	0100	0x4	12	1100	0xC
5	0101	0x5	13	1101	0xD
6	0110	0x6	14	1110	0xE
7	0111	0x7	15	1111	0xF

Table 1. The correlation between decimal, binary and hexadecimal notations.

Let us use the “SOS” sequence mentioned above to demonstrate the transformation between binary and hexadecimal notations:

10101011011011010101b à 1010 1011 0110 1101 0101 à 0xAB6D5
A B 6 D 5

Eight bits, however, are only capable of representing values in ranges 10000000b to 01111111b for signed and 00000000b to 11111111b for unsigned numbers, which are -128 to 127 or 0 to 255 respectively. Following the bit to byte conversion method, we have to start grouping bytes if we need to use larger ranges of possible values (and we do). A group of two bytes on Intel/AMD based systems is called word. Furthermore, a group of two words (4 bytes) forms double word or dword, a group of two double words (8 bytes) forms quadword or qword. As each byte added to a group enlarges the range by 2⁸, it is possible to host very large numbers with virtually any number of bits, the problem would only be in the processing thereof, and we will soon see why.

MSB & LSB

Another problem yet to be solved is the order of bits in a byte and order of bytes in larger sequences. Everything is simple in case of bytes (regardless of byte size) as the binary representation of a byte value is written from left, starting with the most significant bit, to the right, ending with the least significant bit. Let us take value 0x5D as an example:

Bit rank	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
Binary	0	1	0	1	1	1	0	1
Hexadecimal	5				D

Table 2. Order of bits in a byte.

As we may see, the most significant bit of 0x5D, the one that stands for 2⁷, is 0 and the least significant bit, the one representing 2⁰, is 1.

We have seen how easy the conversion between binary and hexadecimal notations is; now let us use Table 1 as an illustration of binary to decimal conversion. In binary notation, each bit represents a 2^{(bit position}^[5]⁾ value and the final value of a byte is the sum of values represented by bits that are set to ‘1’. Taking the binary representation of the byte used in Table 1 we get the following equation:

2⁰ + 2² + 2³ + 2⁴ + 2⁶ = 1₁₀ + 4₁₀ + 8₁₀ + 16₁₀ + 64₁₀ = 93₁₀ = 0x5D₁₆

The same rule applies to values that occupy more than one byte. For example, decimal value 953 (hexadecimal 0x3B9) – this value needs two bytes (8 bits each), and its binary form is 0000001110111001b, in which case the equation would be:

2⁰ + 2³ + 2⁴ + 2⁵ + 2⁷ + 2⁸ +2⁹ = 1₁₀ + 8₁₀ + 16₁₀ + 32₁₀ + 128₁₀ + 256₁₀ + 512₁₀ = 953₁₀ = 0x3B9₁₆

While everything is clear with the order of bits in a byte, the order of bytes in a word (dword, qword), while being written from left to right with most significant byte (the one with the most significant bit) on the left and the least significant byte on the right, may differ when stored in memory depending on the underlying architecture. There are two types of architectures:

1. Little-endian – the least significant byte is stored first, meaning the order of bytes in memory is the opposite to that of a written hexadecimal number;

2. Big-endian – bytes are stored in memory in the same order as when written in hexadecimal notation.

Table 2 shows the difference between the two.

Platform type	Little-endian		Big-endian
Written form	0x3B9		0x3B9
Addresses	addr	addr + 1	addr	addr + 1
Bytes	0xB9	0x03	0x03	0xB9

Table 3. The difference between Little-endian and Big-endian byte orders.

As we are going to work on either Intel or AMD processors, we may, at least temporarily, forget about Big-endian platforms.