Skip to main content
Engineering LibreTexts

14.3: Numeric data types

  • Page ID
    85198
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    By Carey A. Smith

    Read the MATLAB help for Floating-Point Numbers:

    https://www.mathworks.com/help/matlab/matlab_prog/floating-point-numbers.html

    Read the help for both double- and single-precision Floating-Point Numbers at this link. 

    This information includes the following statements:

    • Because the default numeric type for MATLAB is double, you can create a double with a simple assignment statement:
    • x = 25.783;
    • Because MATLAB stores numeric data as a double by default, you need to use the single conversion function to create a single-precision number:
    • x = single(25.783);

    Also read "Largest and Smallest Values for Floating-Point Classes"

    For most calculations, our computers have sufficient memory and speed to use double-precision floating point numbers.

    Other numeric data types are at the following link. These include single and double-precision integer numbers. MATLAB also supports 1- and 2-byte integer and unsigned integer types for saving data memory. These are data types are primarily for interfacing with other application. They are not commonly used in MATLAB calculations

    https://www.mathworks.com/help/matlab/numeric-types.html

    .

     

    Example \(\PageIndex{1}\) double vs. single floating point sum of fractions

    The code shown here sums a lot of terms of a decreasing sequence using both single and double precision floating point numbers. There is some differences between these sums, because single-precision is less precise, and because when the terms get smaller than the last bit of single precision sum, the terms no longer change the running sum.

    The relative error between the sums is a little more than 1%.

    clear all;close all;clc;format compact
    format long

    %% Sum a lot of terms of a sequence using double precision floating-point: 
    a_double   = double(1:100:100e6);
    a_double_len = length(a_double)
    seq_double = 1./a_double;
    a_double_sum = double(0);
    for k = 1:a_double_len
      a_double_sum = a_double_sum + seq_double(k);
    end
    disp(a_double_sum)
    % 1.143763955258335

    %% Sum a lot of terms of a sequence using single precision floating-point: 
    a_single   = single(1:100:100e6);
    a_single_len = length(a_single)
    seq_single = single(1./a_single);
    a_single_sum = single(0);
    for k = 1:a_double_len
      a_single_sum = a_single_sum + seq_single(k);
    end
    disp(a_single_sum)
    % 1.1286

    a_single_sum / a_double_sum
    % 0.98676

    %% Note, the sum function appears to use double precision internally,
    %  so that function is not used for this demonstration.

    Solution

    Add example text here.

    .

    Exercise \(\PageIndex{1}\) 32-bit vs. 64-bit integer processing speed

    In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit integers and 64-bit integers.

    Start a Matlab script with the following code.

    clear all; close all; clc; format compact
    % Store the maximum value of each type of integer
    int8max = intmax('int8') % 127
    int16max = intmax('int16') % 32767
    int32max = intmax('int32') % 2147483647
    int64max = intmax('int64') % 9223372036854775807

    %% int32 loop
    tic
    result32 = int32(10);
    for m = 1:4e4
    for n = 2:2:127
    ii = int32(n);
    result32 = int32(result32*(ii+1));
    result32 = int32(result32/ii);
    end
    end
    int32time = toc % This reports the time to complete these loops

    Write the time it took in a comment.

    Then create a 2nd, similar loop for 16-bit integer code. Copy the 32-bit integer code and replace the following in each place it appears:

    int32 by int64

    result32 by result64

    Write the time it took in a comment.

    Answer

    % On one PC, the 32-bit loop took about 0.1 seconds and the 64-bit loop took about 10 seconds.

    % This shows that 32-bit integer computations are much faster than 64-bit integer computations.

    .

    Exercise \(\PageIndex{2}\) 32-bit vs. 64-bit floating-point processing speed

    In this assignment, you will compare the time it takes to many iterations of for loops using 32-bit floating-point floating-point and 64-bit floating-point values.

    Start a Matlab script with the following code:

    clear all; close all; clc; format compact; format long
    % Store the maximum value of each type of floating-point number
    float32max = realmax('single') % 3.4028e+38
    float64max = realmax('double') % 1.7977e+308

    %% Single loop
    tic
    for m = 1:10e6
    result32 = single(10);
    for n = 2:2:127
    ii = single(n);
    temp1 = single(result32*(ii+1));
    temp2 = single(temp1/ii);
    result32 = temp2;
    end
    end
    result32 % echo the answer
    singletime = toc

    Then create a 2nd, similar loop for 64-bit floating-point code. Copy the 32-bit code and replace the following in each place it appears:

    single by double

    result32 by result64

    Answer

    % On one PC, the two loops took about the same amount of time, so there is not a large computational speed advantage to 32-bit floating point Matlab code.

    % This implies that Matlab likely performs much of these computations in 64-bit floating-point, then converts the results to 32-bit.

    % However, in C and assembly language, the single-precision code is run using single-precision operations, so it is much faster than double-precision code.

    .


    This page titled 14.3: Numeric data types is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Carey Smith.