16.1: Numeric data types
- Page ID
- 85198
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)By Carey A. Smith
Read the MATLAB help for Floating-Point Numbers:
https://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html
Read the help for both double- and single-precision Floating-Point Numbers at this link.
This information includes the following statements:
- Because the default numeric type for MATLAB is double, you can create a double with a simple assignment statement:
- x = 25.783;
- Because MATLAB stores numeric data as a double by default, you need to use the single conversion function to create a single-precision number:
- x = single(25.783);
Also read "Largest and Smallest Values for Floating-Point Classes"
For most calculations, our computers have sufficient memory and speed to use double-precision floating point numbers.
Other numeric data types are at the following link. These include single and double-precision integer numbers. MATLAB also supports 1- and 2-byte integer and unsigned integer types for saving data memory. These are data types are primarily for interfacing with other application. They are not commonly used in MATLAB calculations
https://www.mathworks.com/help/matlab/numeric-types.html
.
The code shown here sums a lot of terms of a decreasing sequence using both single and double precision
. There is some differences between these sums, because single-precision is less precise, and because when the terms get smaller than the last bit of single precision sum, the terms no longer change the running sum.The relative error between the sums is a little more than 1%.
clear all;close all;clc;format compact
format long
%% Sum a lot of terms of a sequence using double precision floating-point:
a_double = double(1:100:100e6);
a_double_len = length(a_double)
seq_double = 1./a_double;
a_double_sum = double(0);
for k = 1:a_double_len
a_double_sum = a_double_sum + seq_double(k);
end
disp(a_double_sum)
% 1.143763955258335
%% Sum a lot of terms of a sequence using single precision floating-point:
a_single = single(1:100:100e6);
a_single_len = length(a_single)
seq_single = single(1./a_single);
a_single_sum = single(0);
for k = 1:a_double_len
a_single_sum = a_single_sum + seq_single(k);
end
disp(a_single_sum)
% 1.1286
a_single_sum / a_double_sum
% 0.98676
%% Note, the sum function appears to use double precision internally,
% so that function is not used for this demonstration.
Solution
Add example text here.
.
In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit integers and 64-bit integers.
Start a Matlab script with the following code.
clear all; close all; clc; format compact
% Store the maximum value of each type of integer
int8max = intmax('int8') % 127
int16max = intmax('int16') % 32767
int32max = intmax('int32') % 2147483647
int64max = intmax('int64') % 9223372036854775807
%% int32 loop
tic
result32 = int32(10);
for m = 1:4e4
for n = 2:2:127
ii = int32(n);
result32 = int32(result32*(ii+1));
result32 = int32(result32/ii);
end
end
int32time = toc % This reports the time to complete these loops
Write the time it took in a comment.
Then create a 2nd, similar loop for 16-bit integer code. Copy the 32-bit integer code and replace the following in each place it appears:
int32 by int64
result32 by result64
Write the time it took in a comment.
- Answer
-
% On one PC, the 32-bit loop took about 0.1 seconds and the 64-bit loop took about 10 seconds.
% This shows that 32-bit integer computations are much faster than 64-bit integer computations.
.
In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit floating-point floating-point and 64-bit floating-point values.
Start a Matlab script with the following code:
clear all; close all; clc; format compact; format long
% Store the maximum value of each type of floating-point number
float32max = realmax('single') % 3.4028e+38
float64max = realmax('double') % 1.7977e+308
%% Single loop
tic
for m = 1:10e6
result32 = single(10);
for n = 2:2:127
ii = single(n);
temp1 = single(result32*(ii+1));
temp2 = single(temp1/ii);
result32 = temp2;
end
end
result32 % echo the answer
singletime = toc
Then create a 2nd, similar loop for 64-bit floating-point code. Copy the 32-bit code and replace the following in each place it appears:
single by double
result32 by result64
- Answer
-
% On one PC, the two loops took about the same amount of time, so there is not a large computational speed advantage to 32-bit floating point Matlab code.
% This implies that Matlab likely performs much of these computations in 64-bit floating-point, then converts the results to 32-bit.
% However, in C and assembly language, the single-precision code is run using single-precision operations, so it is much faster than double-precision code.
.