5.2: Floating-Point Data Type

Last updated
Save as PDF

Page ID: 10262

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

General Discussion

The floating-point data type is a family of data types that act alike and differ only in the size of their domains (the allowable values). The floating-point family of data types represent number values with fractional parts. They are technically stored as two integer values: a mantissa and an exponent. The floating-point family has the same attributes and acts or behaves similarly in all programming languages. They can always store negative or positive values thus they always are signed; unlike the integer data type that could be unsigned. The domain for floating-point data types varies because they could represent very large numbers or very small numbers. Rather than talk about the actual values, we mention the precision. The more bytes of storage the larger the mantissa and exponent, thus more precision.

The most often used floating-point family data type used in C++ is the double. By default, most compilers convert floating-point constants into the double data type for use in calculations. The double data type will store just about any number most beginning programmers will ever encounter.

C++ Reserved Word	double
Represent	Numbers with fractional parts
Size	Usually 8 parts
Storage	two parts (always treated together) a mantissa and an exponent
Normal Signage	Signed (negative and positive values)
Domain (Values Allowed)	±1.7E-308 to ±1.7E308
C++ syntax rule	the presence of a decimal point means it's floating-point data

Within C++ there are various reserved words that can be used to establish the size in bytes of a floating-point data item. More bytes mean more precision:

C++ Reserved Word	Size
float	4 bytes
double	8 bytes
long double	10 to 12 bytes (varies by machine)

The domain of each of the above data type options varies with the complier being used and the computer. The domains vary because the byte size allocated to the data varies with the compiler and computer. This effect is known as being machine dependent.

These variations of the floating-point family of data types are an annoyance in C++ for a beginning programmer. For a beginning programmer it is more important to understand the general attributes of the floating-point family that apply to most programming languages.

Definitions

Double: The most often used floating-point family data type used in C++
Precision: The effect on the domain of floating-point values given a larger or smaller storage area in bytes.
Mantissa Exponent: The two integer parts of a floating-point value.