Regular Expressions

From Okapi Framework
Jump to navigation Jump to search

Regular expressions provide a concise and flexible way to match strings of text, such as particular characters, words, or patterns of characters

For example, the regular expression "\scar" matches all occurrences of the string "car" that are preceded by any white-space character, such as a space, a line-feed, or a tab. So in the string "In this cartoon, the car runs on bicarbonate", the match would be: "In this cartoon, the car runs on bicarbonate".

Regular expressions can perform very complex searches, using classes of characters, groupings, back-referencing, zero-width assertions and many different types of conditions and options.

Java Regular Expressions

For details on regular expression with Java, see: http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html.

Examples

The text matched by the expression is highlighted in yellow. Named groups and their corresponding matches are sometimes highlighted in other colors. All the examples assume no options are set, except is stated otherwise.

Expression: tag1|tag2
   Options: None
   Matches: Before <tag1> and <tag2> after
Expression: tag\b
   Options: None
   Matches: Before tag tagtag after
Expression: <.*>
   Options: None
   Matches: Before <tag1> and <tag2> after
Expression: <.*?>
   Options: None
   Matches: Before <tag1> and <tag2> after
Expression: colou?r
   Options: None
   Matches: Color, colour, color
Expression: (C|c)olou?r
   Options: None
   Matches: Color, colour, color
Expression: %(([-0+ #]?)[-0+ #]?)((\d\$)?)(([\d\*]*)(\.[\d\*]*)?)[dioxXucsfeEgGpn]
   Options: Ignore case: on
   Matches: %d files not found, including %s (%3.2d%% done)
   Matches: %1$d files not found, including %2$s (%3$*.*d%% done)
Expression: </?([A-Z0-9a-z]*)\b[^>]*>
   Options: Ignore case: on
   matches: Text in <b>bold</b> <a href='link.html'>Link</a> <img href="im.png"/>

SRX Regular Expressions

SRX, the standard format to define segmentation rules, also uses regular expressions.

See the "SRX and Java" page for details on limitations.