11.3: Appendix D- Review of Python Functions
- Page ID
- 118148
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)This appendix provides a summary of Python functions used in this textbook. The intent is to provide students with a cross-reference of Python commands that includes a description of the Python functions, general syntax for usage, and a link to the section where the function is first used in the text.
Please note this is a very high-level description of these functions. Many functions require specific libraries to be installed. For more details on Python functions, syntax, and usage, please refer to the Python documentation posted online.
| Python Function | Description | Syntax | First Reference |
|---|---|---|---|
| What Are Data and Data Science? | |||
print()
|
Prints a specified message or specified values to the screen or other output device |
print(“text”)
print(x, y)
|
Python Basics for Data Science |
pd.read_csv()
|
Loads data from a CSV (comma-separated values) file and stores in a DataFrame |
pd.read_csv
(path_to_csv datafile)
|
Python Basics for Data Science |
DataFrame.describe()
|
Returns a table with basic statistics for a dataset including min, max, mean, count, and quartiles |
DataFrame.describe()
Where:
DataFrame
is the name of theDataFrame. |
Python Basics for Data Science |
DataFrame.iloc[]
|
Allows access to data in a DataFrame using row/column integer-based indexes. |
DataFrame.iloc[row, column]
Where:
DataFrame
is the name of theDataFrame. |
Python Basics for Data Science |
DataFrame.loc[]
|
Used to access a group of rows and columns by labels or a Boolean array |
DataFrame.loc[criteria]
Where:
DataFrame
is the name of theDataFrame. |
Python Basics for Data Science |
Plt.scatter()
|
Generates a scatterplot for (x, y) data |
plt.scatter(x_data, y_data)
|
Python Basics for Data Science |
Plt.title()
|
Specifies a title for a chart |
plt.title(“Title”)
|
Python Basics for Data Science |
Plt.xlabel()
|
Specifies a label for the x-axis |
plt.xlabel(“x-axis label”)
|
Python Basics for Data Science |
Plt.ylabel()
|
Specifies a label for the y-axis |
plt.ylabel(“y-axis label”)
|
Python Basics for Data Science |
Plt.xlim()
|
Specifies limits to use for x-axis numbering |
plt.xlim(lower, upper)
|
Python Basics for Data Science |
Plt.ylim()
|
Specifies limits to use for y-axis numbering |
plt.ylim(lower, upper)
|
Python Basics for Data Science |
| Collecting and Preparing Data | |||
pd.read_html()
|
Read HTML table from a web page and convert into a DataFrame |
pd.read_html(URL)
|
Web Scraping and Social Media Data Collection |
pd.to_numeric()
|
Converts strings or other data types to numeric values |
pd.to_numeric
(column_name)
|
Web Scraping and Social Media Data Collection |
len()
|
Returns the length of an object |
len(object)
|
Web Scraping and Social Media Data Collection |
re.findall()
|
Returns all non-overlapping matches of a specified pattern in a string |
re.findall(pattern, string)
|
Web Scraping and Social Media Data Collection |
re.search()
|
Checks if a specified pattern appears in a string |
re.search(pattern, string)
|
Web Scraping and Social Media Data Collection |
| Descriptive Statistics: Statistical Measurements and Probability Distributions | |||
binom.pmf()
|
Calculates the probability mass function (PMF) for a binomial distribution. It gives the probability of having exactly x successes in n trials with success probability p. |
binom.pmf(x, n, p)
Where: x is the number of successes in the experiment, n is the number of trials in the experiment, p is the probability of success. |
Discrete and Continuous Probability Distributions |
round()
|
Rounds a numeric result to a specified level of precision |
round(number, digits)
|
Discrete and Continuous Probability Distributions |
poisson.pmf()
|
Calculates probabilities associated with the Poisson distribution |
poisson.pmf(x, mu)
Where: x is the number of events of interest, mu is the mean of the Poisson distribution. |
Discrete and Continuous Probability Distributions |
norm.cdf()
|
Calculates probabilities associated with the normal distribution (returns the area under the normal probability density function to the left of a specified measurement) |
norm.cdf(x, mu, std)
Where: x is the measurement of interest, mu is the mean of the normal distribution, std is the standard deviation of the normal distribution. |
Discrete and Continuous Probability Distributions |
| Inferential Statistics and Regression Analysis | |||
t.ppf()
|
Generates the value of the t-distribution corresponding to a specified area under the t-distribution curve and specified degrees of freedom |
t.ppf
(area to left, degrees of
freedom)
|
Statistical Inference and Confidence Intervals |
bootstrap()
|
Performs bootstrap process to generate confidence interval |
bootstrap
(data, statistic,
confidence_level,
number_resamples)
|
Statistical Inference and Confidence Intervals |
norm.interval()
|
Calculates confidence interval for the mean when population standard deviation is known, given sample mean, population standard deviation, and sample size (uses normal distribution). Note: Standard error is the standard deviation divided by the square root of the sample size. |
norm.interval
(conf_level, sample_mean,
standard_error)
|
Statistical Inference and Confidence Intervals |
t.interval()
|
Calculates confidence interval for the mean when population standard deviation is unknown, given sample mean, sample standard deviation, and sample size (uses t-distribution). Note, standard error is the standard deviation divided by the square root of the sample size. |
t.interval
(conf_level,
degrees_freedom,
sample_mean,
standard_error)
|
Statistical Inference and Confidence Intervals |
proportion_confint()
|
Calculates confidence interval for a proportion (uses normal distribution) |
proportion_confint
(success, sample_size,
alpha)
|
Statistical Inference and Confidence Intervals |
ttest_1samp()
|
Returns the value of the test statistic and the two-tailed p-value for a one-sample hypothesis test using the t-distribution |
ttest_1samp
(data_array,
null_hypothesis_mean)
|
Hypothesis Testing |
ttest_ind_from_stats()
|
Returns the value of the test statistic and the two-tailed p-value for a two-sample hypothesis test using the t-distribution |
ttest_ind_from_stats
(sample_mean1,
sample_standard_deviation1,
sample_size1, sample_mean2,
sample_standard_deviation2,
sample_size2)
|
Hypothesis Testing |
np.array()
|
Creates a numerical array from a list-like object |
np.array(object)
|
Correlation and Linear Regression Analysis |
pearsonr()
|
Calculates the value of the Pearson correlation coefficient r |
pearsonr
(x_data, y_data)
|
Correlation and Linear Regression Analysis |
linregress()
|
Generates a linear regression model and provides slope, y-intercept, and other regression-related output |
linregress
(x_data, y_data)
|
Correlation and Linear Regression Analysis |
f_oneway()
|
Returns both the F test statistic and the p-value for the one-way ANOVA hypothesis test |
f_oneway
(Array1, Array2, Array3, …)
|
Analysis of Variance (ANOVA) |
| Time Series and Forecasting | |||
plot()
|
Generates a time series plot |
plot(dataframe)
|
Introduction to Time Series Analysis |
rolling()
|
Provides rolling window calculations |
rolling
(window=window)
|
Time Series Forecasting Methods |
mean()
|
Computes the average of a dataset |
mean(dataset)
|
Time Series Forecasting Methods |
diff()
|
Computes the first-order difference of data in a window |
diff(dataframe)
|
Time Series Forecasting Methods |
plot_acf()
|
Plots the ACF (autocorrelation function) for a time series, up to lag L |
Plot_acf
(time_series_data, lags=L)
|
Time Series Forecasting Methods |
STL()
|
Decomposes a time series with known period P into its components |
STL
(time_series_data,
period=P)
|
Time Series Forecasting Methods |
ewm()
|
Performs exponential moving average (EMA) smoothing |
ewm(dataframe)
|
Time Series Forecasting Methods |
adfuller()
|
Performs the Augmented Dickey-Fuller (ADF) test, which is a statistical test for checking the stationarity of a time series |
adfuller
(time_series_data)
|
Time Series Forecasting Methods |
ARIMA()
|
Fits an ARIMA(p, d, q) (AutoRegressive Integrated Moving Average) model to time series data |
ARIMA
(time_series_data,
order=(p, d, q))
|
Time Series Forecasting Methods |
| Decision-Making Using Machine Learning Basics | |||
LogisticRegression()
|
Creates a logistic regression model |
LogisticRegression()
|
Classification Using Machine Learning |
model.fit()
|
Trains a machine learning model on a given dataset |
model.fit
(feature_matrix,
target_vector)
|
Classification Using Machine Learning |
KMeans()
|
Sets up a k-means clustering model (Use model.fit() to fit the model to a dataset.) |
KMeans(n_clusters=k)
|
Classification Using Machine Learning |
DBSCAN()
|
Sets up a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) model (Use model.fit() to fit the model to a dataset.) |
DBSCAN(options)
|
Classification Using Machine Learning |
confusion_matrix()
|
Used to visualize the performance of a model by comparing actual and predicted values |
confusion_matrix
(target_values,
predicted_values)
|
Classification Using Machine Learning |
LinearRegression()
|
Fits a linear regression model to data |
LinearRegression()
.fit(feature_matrix,
target_vector)
|
Machine Learning in Regression Analysis |
predict()
|
Used on trained machine learning models to generate predictions for new data points |
predict(feature_matrix)
|
Machine Learning in Regression Analysis |
DecisionTreeClassifier()
|
Sets up a decision tree model (Use model.fit() to fit the model to a dataset.) |
DecisionTreeClassifier
(options)
|
Decision Trees |
ens.RandomForestRegressor()
|
Sets up a random forest model (Use model.fit() to fit the model to a dataset.) |
ens.RandomForestRegressor
(options)
|
Other Machine Learning Techniques |
GaussianNB()
|
Set up a Naïve Bayes classification model (Use model.fit() to fit the model to a dataset.) |
GaussianNB()
|
Other Machine Learning Techniques |
| Deep Learning and Artificial Intelligence (AI) Basics | |||
Perceptron()
|
Sets up a perceptron model (Use model.fit() to fit the model to a dataset.) |
Perceptron()
|
Introduction to Neural Networks |
train_test_split()
|
Splits dataset randomly into train and test subsets, using a proportion of P of the data for the test set |
train_test_split
(input_data_arrays,
target_data, test_size=P)
|
Introduction to Neural Networks |
StandardScaler()
|
Used to standardize features by removing the mean and scaling to unit variance |
StandardScaler()
|
Introduction to Neural Networks |
accuracy_score()
|
Calculates the accuracy of a classification model as the ratio of the number of correct predictions to the total number of predictions |
accuracy_score
(y_true, y_predicted)
|
Introduction to Neural Networks |
scaler.fit_transform()
|
Fits a scaler to the data and then transforms the data according to the fitted scaler |
scaler.fit_transform(array)
|
Introduction to Neural Networks |
scaler.transform()
|
Applies a previously fitted scaler to new data |
scaler.transform(array)
|
Introduction to Neural Networks |
tf.keras.Sequential()
|
Creates a linear stack of layers for building a neural network model |
tf.keras.Sequential
(layers, additional
options)
|
Backpropagation |
model.compile()
|
Used to configure the learning process of a neural network model before training |
model.compile
(optimizer, loss, metrics)
|
Backpropagation |
| Visualizing Data | |||
boxplot()
|
Creates a box-and-whisker plot |
plt.boxplot(array)
|
Encoding Univariate Data |
hist()
|
Creates a histogram |
plt.hist (array)
|
Encoding Univariate Data |
plot()
|
Creates 2D line plots such as a time series graph |
plt.plot
(x_data, y_data)
|
Graphing Probability Distributions |
bar()
|
Creates a bar chart |
plt.bar
(x_array, heights)
|
Graphing Probability Distributions |
imshow()
|
Displays an image on a 2D regular raster, such as a heatmap |
plt.imshow(array)
|
Geospatial and Heatmap Data Visualization Using Python |
heatmap()
|
Creates a heatmap visualization |
sns.heatmap(array)
|
Geospatial and Heatmap Data Visualization Using Python |
colorbar()
|
Adds a colormap to a figure |
plt.colorbar()
|
Multivariate and Network Data Visualization Using Python |
corr()
|
Calculates the pairwise correlations of columns in a DataFrame |
dataframe.corr()
|
Multivariate and Network Data Visualization Using Python |
add.subplot()
|
Adds a subplot to a figure stored in fig |
fig.add.subplot
(position)
|
Multivariate and Network Data Visualization Using Python |
ax.scatter()
|
Creates a scatterplot |
ax.scatter
(x_data, y_data)
|
Multivariate and Network Data Visualization Using Python |
| Reporting Results | |||
plot_tree()
|
Creates a visualization of a decision tree |
plot_tree
(estimator, feature_names)
|
Validating Your Model |
DataFrame.info()
|
Provides a concise summary of a DataFrame's structure and content |
DataFrame.info()
|
Validating Your Model |
DataFrame.drop()
|
Removes rows or columns from a DataFrame |
DataFrame.drop
(labels, axis=rows_columns)
|
Validating Your Model |
score()
|
Evaluates the performance of a trained model on a given dataset |
model.score
(feature_matrix,
true_labels)
|
Validating Your Model |
dt.get_depth()
|
Retrieves the depth of the decision tree, dt |
dt.get_depth()
|
Validating Your Model |
cross_val_score()
|
Evaluates a model's performance using cross-validation |
cross_val_score
(estimator, feature_matrix,
target_variable)
|
Validating Your Model |
GridSearchCV ()
|
Search for the best parameters for a specified estimator, with k-fold cross-validation |
GridSearchCV
(estimator, parameters, k)
|
Validating Your Model |


