Skip to main content
Engineering LibreTexts

6.6: Key Terms

  • Page ID
    118215
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    accuracy
    for machine learning in general, a measure of the correctness of a machine learning model with respect to predictions.
    bias
    error introduced by overly-simplistic or overly-rigid models that do not capture important features of the data
    big data
    Extremely large and complex datasets that require special methods to handle
    binary (binomial) classification
    classification of data into one of two categories
    bootstrap aggregating (bagging)
    resampling from the same testing data multiple times to create a number of models (for example, decision trees) that all contribute to the overall model
    bootstrapping
    resampling portions of the data multiple times in order to generate a distribution that determines a confidence interval for parameters in a model
    centroid
    geometric center of a subset of points
    cluster
    a subset of a dataset consisting of points that have similar characteristics or are near one another
    confusion matrix
    table of values indicating how data was classified correctly or incorrectly by a given model. Entry in row i and column j gives the number of times (or percentage) that data with label i was classified by the model as label j
    data cleaning
    process of identifying and correcting errors, typos, inconsistencies, missing data, and other anomalies in a dataset
    data mining
    process of discovering patterns, trends, and insights from large datasets
    DBScan algorithm
    common density-based clustering algorithm
    decision tree
    classifier algorithm that builds a hierarchical structure where each internal node represents a decision based on a feature of the data, and each leaf node represents a final decision, label, or prediction
    density-based clustering algorithm
    clustering algorithm that builds clusters of relatively dense subsets
    depth
    number of levels of a decision tree, or equivalently, the length of the longest branch of the tree
    depth-limiting pruning
    pre-pruning method that restricts the total depth (number of levels) of a decision tree
    entropy
    measure of the average amount of information or uncertainty
    error-based (reduced-error) pruning
    pruning method that removes branches that do not significantly improve the overall accuracy of the decision tree
    F1 Score
    combination of precision (p)(p) and recall (r)(r). F1=2(p)(r)p+r=2TP2TP+FP+FNF1=2(p)(r)p+r=2TP2TP+FP+FN
    facial recognition
    application of machine learning that involves categorizing or labeling images of faces based on the identities of individuals depicted in those images
    Gaussian naïve Bayes
    classification algorithm that is useful when variables are assumed to come from normal distributions
    heatmap
    shading or coloring of a table to show contrasts in low versus high values
    information gain
    comparison of entropy change due to adding child nodes to a parent node in a decision tree
    information theory
    framework for measuring and managing the uniqueness of data, or the degree of surprise or uncertainty associated with an event or message
    k-means clustering algorithm
    clustering algorithm that iteratively locates centroids of clusters
    labeled data
    data that has classification labels
    leaf-limiting pruning
    pre-pruning method that restricts the total number of leaf nodes of a decision tree
    likelihood
    measure of accuracy of a classifier algorithm, useful for setting up logistic regression models
    logistic regression
    modeling method that fits data to a logistic (sigmoid) function and typically performs binary classification
    logit function
    function of the form ln(p1p)ln(p1p) used to compute log-odds and transform data when performing logistic regression
    machine learning (ML) model
    any algorithm that trains on data to determine or adjust parameters of a model for use in classification, clustering, decision making, prediction, or pattern recognition
    mean absolute error (MAE)
    measure of error: MAE=1ni=1n|yiy^i|MAE=1ni=1n|yiy^i|
    mean absolute percentage error (MAPE)
    measure of relative error: MAPE=1ni=1n|yiy^iyi|MAPE=1ni=1n|yiy^iyi|
    mean squared error (MSE)
    measure of error: MSE=1ni=1n(yiy^i)2MSE=1ni=1n(yiy^i)2
    minimum description length (MDL) pruning
    post-pruning method that seeks to find the least complex form of a decision tree that meets an acceptable measure of accuracy
    multiclass (multinomial) classification
    classification of data into more than two categories
    multiple regression
    regression techniques that use more than one input variable
    naïve Bayes classification
    also known as multinomial naïve Bayes classification, a classification algorithm that makes use of prior probabilities and Bayes’ Theorem to predict the class or label of new data
    odds
    probability of an event EE occurring divided by the probability of EE not occurring
    one-hot encoding
    replacing categorical/text values in a dataset with vectors that contain a single 1 and all other entries being 0; each category vector has the 1 in a distinct place
    overfitting
    modeling using a method that yields high variance; the model captures too much of the noise and so may perform well on training data but very poorly on testing data
    precision
    ratio of true positive predictions to the total number of positive predictions: p=TPTP+FPp=TPTP+FP
    prior probability
    estimate of a probability, which may be updated or corrected based on Bayes’ Theorem
    pruning
    reducing the size of a decision tree by removing branches that split the data too finely
    random forest
    classifier algorithm that uses multiple decision trees and bootstrap aggregating
    recall
    ratio of true positive predictions to the total number of actual positives: r=TPTP+FNr=TPTP+FN
    regression tree
    type of decision tree in which the decisions are based on numerical comparisons of continuous data
    root mean squared error (RMSE)
    measure of error: RMSE=1ni=1n(yiy^i)2RMSE=1ni=1n(yiy^i)2
    sigmoid function
    function useful in logistic regression: σ(x)=11+exσ(x)=11+ex
    silhouette score
    a measure of how well-separated the clusters are when using a clustering algorithm
    supervised learning
    machine learning methods that train on labeled data
    testing set (or data)
    portion of the dataset that is set aside and used after the training of the algorithm to test for accuracy of the model
    training set (or data)
    portion of the dataset that is used to train a machine learning algorithm
    underfitting
    modeling using a method that yields high bias; the model does not capture important features of the data
    unlabeled data
    data that has not been classified or for which classification data is not known yet
    unsupervised learning
    machine learning methods that do not require data to be labeled in order to learn; often, unsupervised learning is a first step in discovering meaningful clusters that will be used to define labels
    variance
    error due to an overly sensitive model that reacts to small changes in the data
    weak learners
    individual models that are trained on parts of the dataset and then combined in a bootstrap aggregating method such as random forest

    This page titled 6.6: Key Terms is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.