Home
Bookshelves
Data Science
Principles of Data Science (OpenStax)
6: Decision-Making Using Machine Learning Basics

6: Decision-Making Using Machine Learning Basics

Last updated
Save as PDF

Page ID: 118100

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

6.0: Introduction
This page discusses the transformative role of machine learning in applications like facial recognition, medical image analysis, and predictive analytics. It highlights the shift from traditional identity verification methods to unique facial features, and covers key algorithms in classification, clustering, and regression that empower machines to analyze data and make informed decisions across various fields.
6.1: What Is Machine Learning?
This page discusses key concepts in machine learning (ML), contrasting supervised and unsupervised learning, and emphasizes the training and testing process of models. It covers accuracy metrics used for model evaluation, such as precision and recall, and addresses challenges like overfitting and underfitting. The importance of model selection to optimize predictions is highlighted, along with examples illustrating these concepts.
6.2: Classification Using Machine Learning
This page covers key learning objectives in classification problems, focusing on logistic regression and clustering techniques. It explains logistic regression for binary classification, emphasizing parameter optimization and its application in predictive modeling, illustrated with a graduation probability example. The k-means clustering algorithm is detailed, showcasing its iterative process and limitations, alongside DBScan for density-based clustering.
6.3: Machine Learning in Regression Analysis
This page covers multiple regression techniques, focusing on both multiple linear and logistic regression. It outlines the assumptions necessary for multiple linear regression and demonstrates the analysis process, including parameter estimation and significance evaluation, using Python implementations with NCAA basketball statistics. Additionally, it discusses bootstrapping to assess the variability of regression estimates. The text illustrates model building, achieving an R-squared score of 0.
6.4: Decision Trees
This page outlines the fundamentals of decision tree classification, focusing on entropy as a measure of uncertainty in decision-making. It details the construction process of decision trees, emphasizing feature selection through entropy and other criteria like the Gini index to maximize information gain. The importance of testing accuracy and employing pruning methods to avoid overfitting is discussed.
6.5: Other Machine Learning Techniques
This page explores advanced machine learning techniques, focusing on random forests and naïve Bayes classifiers. It details random forests' use of decision trees and bootstrapping for improved accuracy, alongside their application in tasks like facial recognition and temperature prediction. The page also covers the multinomial naïve Bayes classifier for categorizing news articles using Bayes' Theorem, and the Gaussian variant for predicting healthcare outcomes.
6.6: Key Terms
This page defines key machine learning terms such as accuracy, bias, classification, and regression techniques. It discusses methodologies for model improvement, including pruning and ensemble techniques like random forests. Additionally, it covers data processes like cleaning and mining, and makes distinctions between labeled and unlabeled data. Clustering algorithms, error metrics, and learning paradigms such as supervised and unsupervised learning are also explained.
6.7: Group Project
This page summarizes three data analysis and classification projects: Project A compares k-means and DBScan clustering techniques on two datasets; Project B develops a decision tree classifier to predict college completion based on GPA and in-state status, experimenting with training/testing ratios and pruning; Project C uses DGaussian Naive Bayes to predict outcomes in liver disease patients.
6.8: Chapter Review
This page discusses segmenting customers by their purchasing behavior using a dataset and highlights the suitability of k-means clustering as a machine learning technique. It focuses on grouping data points into clusters based on similarities, aiming to enhance personalized marketing strategies.
6.9: Critical Thinking
This page examines the importance of the training and testing data ratio on model performance, emphasizing the risks of underfitting or overfitting. It highlights the significance of the testing set for detecting these issues. Additionally, it discusses the challenges of applying multiple linear regression in university admissions due to the correlation between SAT and ACT scores and measurement scale differences. Lastly, there is a prompt for classifying a news article using specific keywords.
6.10: Quantitative Problems
This page covers statistical analyses of 100 m sprint world record times from 1912 to 2002, advocating for linear regression to forecast future times. It also presents a discrete logistic regression model assessing the impact of weather on rain probability. Exercises are included for computing information, entropy, and probabilities related to coin flips and dice rolls.
6.11: References
This page discusses how machine learning algorithms can unintentionally inherit biases from their training data, leading to prejudice in AI technologies. It stresses the need to address these biases to prevent discrimination and promote fairness in automated systems.

Search

Text Color

Text Size

Margin Size

Font Type