Skip to main content
Engineering LibreTexts

16: Regular Expressions

  • Page ID
    122415
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    • 16.1: Regular Expressions
      This page explains the use of Python's regular expressions library for pattern searching and extraction in strings. It contrasts regular expressions with basic string methods, highlighting their power and precision through special characters. The text introduces basic concepts of regular expressions, emphasizing their complexity and the importance of practice, accompanied by examples that illustrate the use of the search() function for sophisticated matching.
    • 16.2: Character matching in regular expressions
      This page explains special characters in regular expressions, focusing on the period as a wildcard that matches any character. It provides examples like "F..m:" and "^From:.+@" to demonstrate searching for lines in a text file that start with "From:" and include an at-sign. The asterisk (*) and plus (+) symbols are discussed for their ability to match multiple characters, along with a note on their "greedy" behavior in matching patterns.
    • 16.3: Extracting data using regular expressions
      This page explains how to extract email addresses in Python using the `findall()` method from the `re` module with regex patterns. It includes examples and demonstrates reading lines from a file to extract valid emails, focusing on refining regex to eliminate invalid surrounding characters for cleaner outputs. Adjustments to the patterns lead to improved accuracy in email extractions.
    • 16.4: Combining searching and extracting
      This page explains how to use regular expressions to extract numerical data from specific text formats, such as lines beginning with "X-" or "Details: rev=". It covers constructing regex patterns, utilizing parentheses for capturing groups, and provides examples for extracting floating-point numbers and revision integers. The text also offers guidance on simplifying code for extracting hours from email timestamps using regex, emphasizing cleaner and more efficient programming techniques.
    • 16.5: Escape Character
      This page explains the use of regular expressions, emphasizing special characters for functions like line beginnings/ends and wildcards. It highlights the need to prefix certain characters, such as dollar signs and carets, with a backslash for literal matching. An example provided demonstrates how to match monetary amounts, and it clarifies that characters within square brackets are not regarded as special.
    • 16.6: Bonus section for Unix / Linux users
      This page discusses the history and functionality of regular expression searching in Unix and various programming languages, highlighting the command-line tool grep. It exemplifies grep's use in finding specific lines in files and notes differences between grep and Python's regex capabilities, particularly regarding the handling of non-blank characters.
    • 16.7: Debugging
      This page explains how Python's built-in documentation aids users in recalling method names through the interactive help system and the dir() command. It highlights the use of the 're' module to explore methods and access brief documentation, particularly for functions like re.search. Although the documentation is not exhaustive, it serves as a valuable resource when internet access is lacking.
    • 16.E: Regular Expressions (Exercises)
      This page outlines two programming exercises: the first simulates the Unix 'grep' command by prompting for a regular expression to count matching lines in a file (mbox.txt). The second exercise requires writing a program to find lines formatted as "New Revision: [number]", extract the numbers with regular expressions, and compute their average from files such as mbox.txt and mbox-short.txt.
    • 16.G: Regular Expressions (Glossary)
      This page defines key programming concepts including "brittle code," which is sensitive to input changes; "greedy matching," where regex characters match the largest string; "grep," a Unix command for searching text with regex; "regular expression," a language for complex search patterns; and "wild card," a character (often a period in regex) that matches any character.
    • 16.S: Regular Expressions (Summary)
      This page introduces regular expressions (regex) for defining string search patterns using special characters. It covers key elements like line boundaries (^ and $), wildcards (.), whitespace (\s), and quantifiers (* and +). It explains character sets for defined ranges and inversions, parentheses for extracting matches, and word boundaries (\b and \B).


    This page titled 16: Regular Expressions is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by via source content that was edited to the style and standards of the LibreTexts platform.