Skip to main content
Engineering LibreTexts

5.1: Regular Expressions in Pharo

  • Page ID
    44009
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Regular expressions are widely used in many scripting languages such as Perl, Python and Ruby. They are useful to identify strings that match a certain pattern, to check that input conforms to an expected format, and to rewrite strings to new formats. Pharo also supports regular expressions due to the Regex package contributed by Vassili Bykov. Regex is installed by default in Pharo.

    A regular expression1 is a template that matches a set of strings. For example, the regular expression 'h.*o' will match the strings 'ho', 'hiho' and ' hello', but it will not match 'hi' or 'yo'. We can see this in Pharo as follows:

    'ho' matchesRegex: 'h.*o'        → true
    'hiho' matchesRegex: 'h.*o'      → true
    'hello' matchesRegex: 'h.*o'     → true
    'hi' matchesRegex: 'h.*o'        → false
    'yo' matchesRegex: 'h.*o'        → false
    

    In this chapter we will start with a small tutorial example in which we will develop a couple of classes to generate a very simple site map for a web site. We will use regular expressions (i) to identify HTML files, (ii) to strip the full path name of a file down to just the file name, (iii) to extract the title of each web page for the site map, and (iv) to generate a relative path from the root directory of the web site to the HTML files it contains. After we complete the tutorial example, we will provide a more complete description of the Regex package, based largely on Vassili Bykov’s documentation provided in the package.2


    1. http://en.Wikipedia.org/wiki/Regular_expression
    2. The original documentation can be found on the class side of RxParser.

    This page titled 5.1: Regular Expressions in Pharo is shared under a CC BY-SA 3.0 license and was authored, remixed, and/or curated by Alexandre Bergel, Damien Cassou, Stéphane Ducasse, Jannik Laval (Square Bracket Associates) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.