Skip to main content
Engineering LibreTexts

17.1: Regular Expressions in Pharo

  • Page ID
    43484
    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Regular expressions are widely used in many scripting languages such as Perl, Python and Ruby. They are useful to identify strings that match a certain pattern, to check that input conforms to an expected format, and to rewrite strings to new formats. Pharo also supports regular expressions due to the Regex package contributed by Vassili Bykov. Regex is installed by default in Pharo. If you are using an older image that does not include Regex the Regex package, you can install it yourself from SqueakSource (http://www.squeaksource.com/Regex.html).

    A regular expression1 is a pattern that matches a set of strings. For example, the regular expression 'h.*o' will match the strings 'ho', 'hiho' and 'hello', but it will not match 'hi' or 'yo'. We can see this in Pharo as follows:

    'ho' matchesRegex: 'h.*o'
    >>> true
    
    'hiho' matchesRegex: 'h.*o'
    >>> true
    
    'hello' matchesRegex: 'h.*o'
    >>> true
    
    'hi' matchesRegex: 'h.*o'
    >>> false
    
    'yo' matchesRegex: 'h.*o'
    >>> false
    

    In this chapter we will start with a small tutorial example in which we will develop a couple of classes to generate a very simple site map for a web site. We will use regular expressions

    1. to identify HTML files,
    2. to strip the full path name of a file down to just the file name,
    3. to extract the title of each web page for the site map, and
    4. to generate a relative path from the root directory of the web site to the HTML files it contains.

    After we complete the tutorial example, we will provide a more complete description of the Regex package, based largely on Vassili Bykov’s documentation provided in the package. (The original documentation can be found on the class side of RxParser.)


    1. http://en.Wikipedia.org/wiki/Regular_expression


    This page titled 17.1: Regular Expressions in Pharo is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.