Lesson 4. Fixed and Floating Point Binary

Lesson Objective

Represent fractional numbers in fixed point form in binary in a given number of bits.
Represent fractional numbers in floating point form in binary in a given number of bits.
Know and be able to explain why both fixed point and floating point representation of decimal numbers may be inaccurate.
Compare the advantages and disadvantages of fixed point and floating point forms in terms of range, precision and speed of calculation.

Lesson Notes

Fixed Decimal Point Numbers

Using bits to the right of the units column (after a notional point) introduces fractional values.

Fractional values are negative powers of 2.

A fixed-point binary value uses a specified number of bits where the placement of the binary point is fixed.

For example, in an 8 bit fixed-point binary value, the binary point could be set between the fourth and fifth bits.

2³	2²	2¹	2⁰		2^-1	2^-2	2^-3	2^-4
-8	4	2	1		1/2	1/4	1/8	1/16
0	0	0	1	•	1	0	0	1	-1.5625₁₀

Floating Point Binary

A Real number in binary has three parts:

The Sign: positive or negative number
Mantissa: the part of a floating-point number which represents the significant digits of that number (the value)
Exponent: is the power the value is raised to (how much the decimal point needs to be shifted)

Mantissa						Exponent
-1		1/2	1/4	1/8	1/16	-4	2	1
1	•	1	0	0	0	0	0	1

Sign bit = -1
Mantissa represented as a Two's Complement Number
Exponent represented as a Two's Complement Number

Standard Form?

Standard form, also known as scientific notation, is a way of writing very large or very small numbers in a way that makes them easier to read and write. It is based on the idea of using powers of 10 to represent the number. Computers, however, work with binary values. So, instead of multiplying by powers of ten, they use floating point representation.

5,000,000 can be written as 5 x 10⁶

Mantissa = 5
Exponent = 6
Base = 10

Floating Point - Positive Exponent

By using floating point binary we can increase accuracy of our binary number. It also means we can represent more numbers.

Mantissa						Exponent
-1		1/2	1/4	1/8	1/16	-4	2	1
0	•	1	0	1	1	0	1	0

To store the number in standard form, the first and second digits have to be opposite.

Mantissa and Exponent stored as one number.However, the Mantissa is the number that is displayed. The Exponent represents the position of the floating point.

Example:

The floating point always starts at the same position.
Exponent in the example is +2.
This means the floating point moved 2 places to the right.
The Base 2 weighting also changes.

Mantissa						Exponent
-4	2	1		1/2	1/4	-4	2	1
0	1	0	•	1	1	0	1	0

2 + 0.5 + 0.25 = 2.75

Floating Point - Negative Exponent

In the example below the exponent is -2

Mantissa						Exponent
-1		1/2	1/4	1/8	1/16	-4	2	1
0	•	1	0	1	1	1	1	0

If the exponent was a negative number the floating point will move to the left.

The Mantissa (🦗) can increase the number of bits to accommodate the floating point.

Example:

Mantissa								Exponent
-1		1/2	1/4	1/8	1/16	1/32	1/64	-4	2	1
0	•	0	0	1	0	1	1	1	1	0

1/8(0.125) + 1/32(0.03125) + 1/64(0.015625) = 0.171875

Floating Point - Negative Mantissa and Negative Exponent

In the example below the mantissa is -0.8125 exponent is -2

When both Exponent and Mantissa are negative numbers. Any new binary digit added must be a 1.

Mantissa						Exponent
-1		1/2	1/4	1/8	1/16	-4	2	1
1	•	0	0	1	1	1	1	0

The Mantissa (🦗) result on the right shows the new place digits as 1's

The same principle will apply to positive Mantissas (ignoring the +/- status). If it starts with 0, new place digits will be 0's

Example:

Mantissa								Exponent
-1		1/2	1/4	1/8	1/16	1/32	1/64	-4	2	1
1	•	1	1	0	0	1	1	1	1	0

-1 + 1/2 + 1/4 + 1/32 + 1/64 = -0.203125

Floating Point - Negative Mantissa and Positive Exponent

The example Exponent is now +4. the Mantissa is -1.125

This means the floating point will need to move 4 spaces to the right.

Mantissa					Exponent
-1		1/2	1/4	1/8	-8	4	2	1
1	•	0	0	1	0	1	0	0

The Mantissa (🦗) result on the right shows that 2 extra spaces/digits have been added to the right.

Right of the floating point is the fractional value.

Left of the point would be the standard Base 2 weighting.

Example:

Mantissa							Exponent
-16	8	4	2	1		1/2	-8	4	2	1
1	0	0	1	0	•	0	0	1	0	0

-16 + 2 = -14

Fixed vs Floating

Fixed and floating point each have their own advantages and disadvantages in terms of range, precision and the speed of calculation.

Floating point allows a far greater range of numbers using the same number of bits. Very large numbers and very small fractional numbers can be represented. The larger the mantissa, the greater the precision, and the larger the exponent, the greater the range.

Fixed-point numbers have a limited range, which is determined by the number of bits used to represent them. Fixed-point numbers have a fixed precision, which means that the same number of digits are always stored, regardless of the value of the number.

Fixed point binary is a simpler system and is faster to process compared to floating point.

Advantages of Fixed Point:

Faster calculations
Less hardware required
Easier to debug

Advantages of Floating Point:

Wide range
Variable precision
Can represent a wider variety of values

The best choice of representation will depend on the specific application. If speed and efficiency is critical, then fixed-point may be the better choice. If range or precision is critical, then floating-point may be the better choice.

Normalisation

There are two main reasons why we need to normalize floating point binary numbers:

To ensure maximum accuracy. When a floating point number is normalized, the mantissa (the part of the number to the right of the decimal point) is as large as possible without overflowing the number. This means that the number can be represented with the least number of bits, which in turn gives the greatest possible accuracy.
To ensure uniqueness. When a floating point number is normalized, each unique number has only one possible bit pattern to represent it. This is important for ensuring that floating point operations are performed correctly.

It is the process of moving the binary point of a floating point number to provide the maximum level of precision for a given number of bits.

To do this for a positive binary number involves removing any leading zeros (0s).
To do the same for a negative binary number involves removing and leading ones (1s).
This means that a normalised floating point number must always start as either 0.1 or 1.0.

mrahmedcomputing

Lesson 4. Fixed and Floating Point Binary

Lesson Objective

Lesson Notes

Fixed Decimal Point Numbers

Floating Point Binary

Standard Form?

Floating Point - Positive Exponent

Floating Point - Negative Exponent

Floating Point - Negative Mantissa and Negative Exponent

Floating Point - Negative Mantissa and Positive Exponent

Fixed vs Floating

Normalisation