12.1: Functions for Model Building and Regression
- Page ID
- 122644
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Regression and Model Building
Python offers several powerful libraries for regression and model building, each with its own set of functions and methodologies. Here's a breakdown of the most commonly used ones:
Regression
NumPy
While not a dedicated regression library, NumPy is fundamental for numerical operations in Python and is often used for foundational regression calculations or when implementing algorithms from scratch.
Relevant Functions:
- numpy.polynomial.Polynomial.fit(x, y, deg): (Replaces numpy.polyfit) Fits a polynomial of degree deg to points (x, y). Returns the coefficients of the polynomial.
- numpy.polyval(p, x): Evaluates a polynomial p at specific x values.
- numpy.linalg.lstsq(a, b): Solves the equation ax = b by computing a vector x that minimizes the Euclidean 2-norm ||b - ax||^2. This is the core of ordinary least squares.
- numpy.corrcoef(x, y): Returns the Pearson product-moment correlation coefficients.
- numpy.mean(), numpy.std(): For calculating means and standard deviations, useful in manual calculations of regression coefficients.
SciPy
SciPy builds on NumPy and provides more advanced scientific computing capabilities, including optimization, signal processing, and statistics.
Relevant Functions:
- scipy.stats.linregress(x, y): Calculates a linear least-squares regression for two sets of measurements. Returns slope, intercept, R-value, p-value, and standard error. It's great for simple linear regression.
- scipy.optimize.curve_fit(f, xdata, ydata): Uses non-linear least squares to fit a function f to data. Useful for more complex, non-linear regression models where you define the function form.
Model Building
Scikit-learn (sklearn)
Scikit-learn is the go-to library for model building and machine learning in Python, providing a vast array of algorithms for supervised and unsupervised learning, including many regression models.
Commonly Used Regression Models/Functions:
- sklearn.linear_model.LinearRegression:
- LinearRegression(): Initializes a linear regression model.
- .fit(X, y): Fits the linear model to the training data.
- .predict(X_new): Predicts target values for new data.
- .coef_: Returns the estimated coefficients for the features.
- .intercept_: Returns the independent term in the linear model.
- .score(X, y): Returns the coefficient of determination (R2) of the prediction.
- sklearn.linear_model.Ridge:
- Ridge(alpha=1.0): Initializes a Ridge regression model (L2 regularization). alpha controls the strength of regularization.
- .fit(X, y): Fits the model.
- .predict(X_new): Predicts.
- .coef_, .intercept_, .score(X, y): Similar to LinearRegression.
- sklearn.linear_model.Lasso:
- Lasso(alpha=1.0): Initializes a Lasso regression model (L1 regularization). alpha controls the strength of regularization.
- .fit(X, y): Fits the model.
- .predict(X_new): Predicts.
- .coef_, .intercept_, .score(X, y): Similar to LinearRegression.
- sklearn.linear_model.ElasticNet:
- ElasticNet(alpha=1.0, l1_ratio=0.5): Initializes an Elastic Net regression model (combination of L1 and L2 regularization). l1_ratio controls the mix between L1 and L2.
- .fit(X, y): Fits the model.
- .predict(X_new): Predicts.
- sklearn.linear_model.BayesianRidge:
- BayesianRidge(): Initializes a Bayesian Ridge regression model.
- sklearn.linear_model.SGDRegressor:
- SGDRegressor(): Implements linear regression with stochastic gradient descent.
- sklearn.ensemble.RandomForestRegressor:
- RandomForestRegressor(n_estimators=100): Initializes a Random Forest Regressor.
- sklearn.ensemble.GradientBoostingRegressor:
- GradientBoostingRegressor(): Initializes a Gradient Boosting Regressor.
- sklearn.tree.DecisionTreeRegressor:
- DecisionTreeRegressor(): Initializes a Decision Tree Regressor.
- sklearn.svm.SVR:
- SVR(): Initializes a Support Vector Regressor.
Helper Functions for Model Building and Evaluation:
- sklearn.model_selection.train_test_split(X, y, test_size=0.2, random_state=42): Splits data into training and testing sets.
- sklearn.preprocessing.PolynomialFeatures(degree=2): Generates polynomial features for polynomial regression.
- sklearn.pipeline.make_pipeline(*steps): Creates a pipeline for chaining multiple processing steps and models.
- sklearn.metrics.mean_squared_error(y_true, y_pred): Calculates Mean Squared Error.
- sklearn.metrics.r2_score(y_true, y_pred): Calculates the R2 score.
- sklearn.metrics.mean_absolute_error(y_true, y_pred): Calculates Mean Absolute Error.
- sklearn.model_selection.GridSearchCV: For hyperparameter tuning.
- sklearn.model_selection.KFold, sklearn.model_selection.StratifiedKFold: For cross-validation.


