Skip to main content
Engineering LibreTexts

Chapter 10: Fitting, Regressions, and Confidence Bands

  • Page ID
    99734
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Here you can find a series of pre-recorded lecture videos that cover this content: https://youtube.com/playlist?list=PLSKNWIzCsmjBxk5lvZ_LaUX2hBGSEaxpV&si=nOGeAaWzNwuNlz5X

    As we touched on at the beginning of this course and have emphasized throughout is visualizing and plotting data and analyzing trend to find statistically significant results.

    Often this may involve fitting data which we have only briefly touched upon up onto this point with the exception of our \(\chi^2\) goodness of fit test which can indeed be utilized to analyze distribution shapes and fits.

    One of the most common methods to analyze fits and curves is with regression analysis and one of the most popular techniques is regression analysis via the method of least squares.

    Method of Least Squares

    In the method of least squares we will be working with two variables x, which will be considered the independent or input variable and y, the dependent or response variable. We will assume that the total uncertainty in y will be much greater than x which in most experimental conditions is a fairly reasonable assumption and we will start by analyzing the regression curve of y on x as a linear regression curve

    \begin{equation}
    y = a +bx
    \end{equation}

    where a and b are fitting constants. Then the error ei in predicting a y value will be given by

    \begin{equation}
    e_i = y_i - y(x_i)
    \end{equation}

    We will thereby select a and b such that \(\sum_{i=1}^n e_i\) is as close to 0 as possible to minimize the error.

    So the principle of least squares will give us

    \begin{equation}
    \sum_{i=1}^n e_i = \sum_{i=1}^n [y_{i} - y(x_{i})]^{2}
    \end{equation}

    Before we can proceed with this process it will be extremely convenient here to define some critical values specifically

    \begin{equation}
    S_{xx} = \sum_{i=1}^n (x_i-\overline{x})^2
    \end{equation}

    and similarly

    \begin{equation}
    S_{yy} = \sum_{i=1}^n (y_i-\overline{y})^2
    \end{equation}

    and

    \begin{equation}
    S_{xy} = \sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})
    \end{equation}

    Then we can calculate the error sum of squares which we seek to minimize as

    \begin{equation}
    SSE = S_{yy} - \frac{S_{xy}^2}{S_{xx}}
    \end{equation}

    This calculation will give us fitting parameters but we can also obtain estimates of the confidence intervals of those fitting parameters as well.

    Let’s look at some plots of raw data and then we can perform some linear fits

    clipboard_e60f7c438374fa0bc57f5f374d7c8ef63.png

    Figure \(\PageIndex{1}\): R Raw Data Plots.

    Here we can see some example of raw data and we can now proceed to attempt to fit the data based on functions that we believe will best fit the values. We set the function and then proceed to allow the program to run through the above calculations to determine the fitting parameters in the linear function

    We can see the fits here and we can also plot the 95% confidence interval for these fits as well as seen here

    Non-Linear Fitting:

    We can extend these tools to also perform non-linear fits that can range from exponential expressions, logarithmic expressions, polynomials, and beyond.

    Here we can see some nonlinear fits of the previous data set

    clipboard_e355110c9342821d4531ee190ea22cf4f.png

    Figure \(\PageIndex{2}\): R Linear Fit.

    clipboard_e6337c7afa640fc6cfe6819f8eb1d9b7c.png

    Figure \(\PageIndex{3}\): R 95% Confidence Interval Fit.

    Non-Linear Fitting:

    We can extend these tools to also perform non-linear fits that can range from exponential expressions, logarithmic expressions, polynomials, and beyond.

    Here we can see some nonlinear fits of the previous data set

    clipboard_e355110c9342821d4531ee190ea22cf4f.png

    Figure \(\PageIndex{2}\): R Linear Fit.

    clipboard_e6337c7afa640fc6cfe6819f8eb1d9b7c.png

    Figure \(\PageIndex{3}\): R 95% Confidence Interval Fit.

    Correlation

    So far we have assumed that the total uncertainty in x was extremely small and close to 0, while this holds for my real life scenarios there are many many many scenarios where this is not the case in which not true.

    In which case the purpose of plotting x vs y in the most general sense is to observe some type of correlation between the variables. One can quantitatively calculate this via the sample correlation coefficient r which is defined as

    \begin{equation}
    r= \frac{S_{xy}}{\sqrt{S_{xx} \cdot S_{yy}}}
    \end{equation}

    here if r = 1 we have perfect positive correlation if r = −1 we have perfect negative correlation, and if r = 0 we have no correlation, i.e. there is no linear association. Additionally the limits of r is between −1 and 1.

    We can also calculate our r2 value which varies from 0-1 and is defined as

    \begin{equation}
    r^2 = \frac{S_{xy}^2}{S_{xx}S_{yy}}
    \end{equation}

    We can see these values tabulated here for our previous fits to show correlations, both r and r2 values.

    Looking at our previous fits we can run regression analysis to investigate the correlations as well.

    Critical Consideration When Fitting

    Regardless of the statistical tool and measure that you are using when fitting value be sure that you do not lose the physical intuition of the system that you are investigating. There are very very few physical systems that can be represented by 12th order polynomial function despite the fact that that particular function may yield the strongest correlation factor. Instead, a more physically intuitive function that yields a lower correlation coefficient may be more appropriate and one may spend more time analyzing why this experimental system deviated from this theoretical function/representation.


    Chapter 10: Fitting, Regressions, and Confidence Bands is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?