# 14.3: Numeric data types

• • Carey Smith
• Oxnard College
$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

By Carey A. Smith

Read the MATLAB help for Floating-Point Numbers:

https://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html

Read the help for both double- and single-precision Floating-Point Numbers at this link.

This information includes the following statements:

• Because the default numeric type for MATLAB is double, you can create a double with a simple assignment statement:
• x = 25.783;
• Because MATLAB stores numeric data as a double by default, you need to use the single conversion function to create a single-precision number:
• x = single(25.783);

Also read "Largest and Smallest Values for Floating-Point Classes"

For most calculations, our computers have sufficient memory and speed to use double-precision floating point numbers.

Other numeric data types are at the following link. These include single and double-precision integer numbers. MATLAB also supports 1- and 2-byte integer and unsigned integer types for saving data memory. These are data types are primarily for interfacing with other application. They are not commonly used in MATLAB calculations

https://www.mathworks.com/help/matlab/numeric-types.html

.

##### Example $$\PageIndex{1}$$ double vs. single floating point sum of fractions

The code shown here sums a lot of terms of a decreasing sequence using both single and double precision floating point numbers. There is some differences between these sums, because single-precision is less precise, and because when the terms get smaller than the last bit of single precision sum, the terms no longer change the running sum.

The relative error between the sums is a little more than 1%.

clear all;close all;clc;format compact format long

%% Sum a lot of terms of a sequence using double precision floating-point:  a_double   = double(1:100:100e6); a_double_len = length(a_double) seq_double = 1./a_double; a_double_sum = double(0); for k = 1:a_double_len   a_double_sum = a_double_sum + seq_double(k); end disp(a_double_sum) % 1.143763955258335

%% Sum a lot of terms of a sequence using single precision floating-point:  a_single   = single(1:100:100e6); a_single_len = length(a_single) seq_single = single(1./a_single); a_single_sum = single(0); for k = 1:a_double_len   a_single_sum = a_single_sum + seq_single(k); end disp(a_single_sum) % 1.1286

a_single_sum / a_double_sum % 0.98676

%% Note, the sum function appears to use double precision internally, %  so that function is not used for this demonstration.

.

##### Exercise $$\PageIndex{1}$$ 32-bit vs. 64-bit integer processing speed

In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit integers and 64-bit integers.

Start a Matlab script with the following code.

clear all; close all; clc; format compact
% Store the maximum value of each type of integer
int8max = intmax('int8') % 127
int16max = intmax('int16') % 32767
int32max = intmax('int32') % 2147483647
int64max = intmax('int64') % 9223372036854775807

%% int32 loop
tic
result32 = int32(10);
for m = 1:4e4
for n = 2:2:127
ii = int32(n);
result32 = int32(result32*(ii+1));
result32 = int32(result32/ii);
end
end
int32time = toc % This reports the time to complete these loops

Write the time it took in a comment.

Then create a 2nd, similar loop for 16-bit integer code. Copy the 32-bit integer code and replace the following in each place it appears:

int32 by int64

result32 by result64

Write the time it took in a comment.

% On one PC, the 32-bit loop took about 0.1 seconds and the 64-bit loop took about 10 seconds.

% This shows that 32-bit integer computations are much faster than 64-bit integer computations.

.

##### Exercise $$\PageIndex{2}$$ 32-bit vs. 64-bit floating-point processing speed

In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit floating-point floating-point and 64-bit floating-point values.

Start a Matlab script with the following code:

clear all; close all; clc; format compact; format long
% Store the maximum value of each type of floating-point number
float32max = realmax('single') % 3.4028e+38
float64max = realmax('double') % 1.7977e+308

%% Single loop
tic
for m = 1:10e6
result32 = single(10);
for n = 2:2:127
ii = single(n);
temp1 = single(result32*(ii+1));
temp2 = single(temp1/ii);
result32 = temp2;
end
end
singletime = toc

Then create a 2nd, similar loop for 64-bit floating-point code. Copy the 32-bit code and replace the following in each place it appears:

single by double

result32 by result64