Skip to main content
Engineering LibreTexts

12.4: Summary

  • Page ID
    122647
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Linear Regression and Model Building

    Linear regression is a fundamental supervised learning algorithm used for modeling the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data.

    Linear Regression

    At its core, linear regression assumes a linear relationship between the input variables (features) and the single output variable (target). The goal is to find the best-fitting straight line (or hyperplane in higher dimensions) that minimizes the sum of the squared differences between the observed and predicted values.

    The general form of a linear regression equation is:

    \[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n + \epsilon \]

    Where:

    • \(Y\) is the dependent variable (the target we want to predict).
    • \(X_1​, X_2​ , \cdots, X_n\)​ are the independent variables (features).
    • \(\beta_0\)​ is the y-intercept (the value of Y when all X's are zero).
    • \(\beta_1​, \beta_2​, \cdots, \beta_n\)​ are the coefficients (slopes) for each independent variable, representing the change in Y for a one-unit change in the corresponding X, holding other X's constant.
    • ϵ is the error term, representing the irreducible error in the model.

    The objective of training a linear regression model is to estimate the coefficients (β values) that best fit the data. This is typically done using the Ordinary Least Squares (OLS) method, which minimizes the sum of the squared residuals (the differences between actual and predicted Y values).

    Model Building Process

    Building a robust predictive model, including linear regression, generally follows these steps:

    1. Data Collection and Preparation:
    • Gather Data: Obtain relevant datasets.
    • Feature Engineering: Create new features or transform existing ones to improve model performance.
    • Handle Missing Values: Impute or remove missing data points.
    • Outlier Detection and Treatment: Identify and manage extreme values that can disproportionately affect the model.
    • Feature Scaling: Standardize or normalize features, especially important for algorithms sensitive to feature scales (though less critical for basic linear regression, it's good practice for many ML models).
    1. Data Splitting:
    • Divide your dataset into training and testing sets (e.g., 70-80% for training, 20-30% for testing). The training set is used to train the model, and the test set is used to evaluate its performance on unseen data.
    1. Model Training:
    • Select a linear regression algorithm (e.g., from Scikit-learn, Statsmodels).
    • Fit the model to the training data, allowing it to learn the coefficients.
    1. Model Evaluation:
      • Assess the model's performance on the test set using various metrics. Common metrics for regression include:
        • Mean Absolute Error (MAE): Average of the absolute differences between predictions and actual values.
        • Mean Squared Error (MSE): Average of the squared differences. Penalizes larger errors more.
        • Root Mean Squared Error (RMSE): Square root of MSE, in the same units as the target variable.
        • R-squared (R2): Represents the proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R2 indicates a better fit.
    2. Prediction:
    • Once satisfied with the model's performance, use it to make predictions on new, unseen data.

    Python Implementations

    Python offers several powerful libraries for performing linear regression, each with its own strengths:

    • NumPy: Provides basic array operations and can be used for manual implementation of linear regression, especially for simple cases or understanding the underlying math.
    • SciPy: Offers scientific computing tools, including a linregress function for simple linear regression (one independent variable).
    • Scikit-learn: The go-to library for machine learning. Provides a robust LinearRegression class for simple and multiple linear regression, focusing on prediction performance.
    • Statsmodels: A library for statistical modeling and econometric analysis. It provides more detailed statistical output, including p-values, confidence intervals, and various statistical tests, making it suitable for statistical inference and understanding the significance of variables.

    This page titled 12.4: Summary is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by .