Regular Expressions

9 minute read Last updated on September 27, 2023

Regular Expressions

Regular Expressions (RegEx) are specially written strings that are used to search text. Regular Expressions have a formal syntax that define how the search is performed. They provide flexibility where a simple string match would not suffice.

Supported Operators: - The following table contains some of the common operators supported by Regular Expressions in Dispatcher Phoenix. Please note that this is not a complete list of all operators.

Match Modifiers - These characters will affect how the search is performed
Operator

\ General escape character

. (period) Match any character

* Match 1 or more of the previous character

? Match 0 or 1 of the previous character

[] Defines a set of characters to match i.e. [0-9] match the digits 0 to 9; [a-z] match lower case letter a to z; [A,E,I,O,U] match upper case vowels

[^] Defines a set that will not match i.e. [^0-9] do not match the digits 0 to 9

^ Match the start of a line

$ Match the end of a line

Character Classes

These characters will match against single characters, words or special characters.

\d Match any decimal digit (short form of [0-9])

\D Do not match any decimal digit ([^0-9])

\s Match any whitespace character (tab, newline, formfeed, carriage return, or space)

\S Do not match any whitespace character

\w Match any “word” character [a-z, A-Z, 0-9, _]

\W Do not match any “word” character [^ a-z, A-Z, 0-9, _]

\b Match a word boundary

\B Do not match a word boundary

\cx Match the control-x character where x is any character i.e. \cs matches the control-s character

\e Match the escape character (hex 1B)

\f Match the formfeed character (hex 0C)

\n Match the newline character (hex 0A)

\r Match the carriage return character (hex 0D)

\t Match the tab character (hex 09)

\ddd Match the character with the octal code ddd

\xhh Match the character with the hex code hh

For more information on regular expressions, please visit:

http://en.wikipedia.org/wiki/Regular_expression

For a tutorial to learn more about how to use regular expressions please visit:

http://www.regular-expressions.info/tutorial.html

The syntax supported by Dispatcher Phoenix is defined at the following link: http://developer.gnome.org/glib/stable/glib-regex-syntax.html

Using Special Characters In Regular Expressions

If you want to use the following special characters as a literal in a regular expression, you must use a backslash (\) to suppress their special meaning:

[ opening square bracket

\ backslash

^ caret

$ dollar sign

. period

| vertical bar or pipe symbol

? question mark

* asterisk

+ plus sign

( opening round bracket

) closing round bracket

For example, if you want to match 1+1=2, the correct regular expression is: 1\+1=2; otherwise the plus sign will have a special meaning.

The following examples of using regular expressions in Dispatcher Phoenix assume that the following text file is the file searched.

Content Search

  • To search for the string “test”, enter “test” into the search text field. Depending on the match case option, the search results would be as follows:

Content Search

  • To search for the word “test” it needs to be delimited with the word boundary operator \b, as such, a search using the string “\btest\b” will return the following results:

Content Search

  • To locate multiple strings with a numeric value the “\d” operator can be used. Using the search string “Test #\d” will produce the following result:

Content Search

The Parse Node has the ability to search for file names using regular expressions. Unlike the other parser nodes which operate on the contents of files, the Parser Node operates on file names.

For example, with the following list of files:

testfile1.txt

testfile2.xls

testfile3.docx

testfile4.doc

testfile5.psd

testfile6.pdf

testfile7.jpg

testfile8.tiff

Using a search string of “testfile\d” would match all of the files in the preceding list.

Advanced Functionality with Metadata Referencing

The parsing nodes (Insert, Parse and Distribute, Parse and Insert, and Parse and Replace) also provide advanced functionality when using regular expressions to give you more control over your search and insert, replace, or distribute operations.

With regular expressions and Dispatcher Phoenix’s metadata referencing feature, you can specify subgroups to extract information from subsections of the matched text; this information can then be used in substitution or insertion operations or to store those values as metadata for future use.

Subgroups are represented by a set of parentheses wrapped around a subsection of the regular expression. Subgroups are given a unique number (starting at number 1 and going from left to right) and can be referenced by that number; in addition, groups can be given “friendly names” (enclosed in sharp brackets “< >”) so that they can be referenced by a name instead.

For example:

Content Search

To reference subgroups, you would use the class of ‘parser’ with the subgroup number separated by a colon (:). In the example above, to reference the first subgroup, you would use: {parser:1}; to reference the second reference group, you would use either: {parser:2} or {parser:Value}, (depending on which regular expression you used).

To use this subgroup information in a replace or insert operation, you would use one of the following:

\1 - Where “1” is the numeric value of the subgroup

\g<1> - Where “1” is the numeric value of the subgroup

\g<Value> - Where “Value” is the subgroup’s friendly name

Specifying Page Level Metadata

The metadata for a specific page can be specified by adding a page number between two square brackets ([ ]). This will return the text of the first value for the subgroup found on the specified page. For example:

{bar1:Address[5]} would return the value of bar1:Address for page 5.

To specify document-level metadata, add a 0 between two square brackets or leave it blank. For example:

{bar1:Address[0]}

{bar1:Address}

If you are processing in a page-per-page manner (e.g., applying a Bates stamp or annotation), you can use ‘current’ between two square brackets to retrieve the data from the page being processed. For example:

{annotate:text[current]}

Specifying Metadata Occurrence Number

  • You can also specify an occurrence number using another pair of square brackets following the page-level brackets. For example:

    {value:bar[3] [2] would return the second occurrence of “value:bar” on page 3.

    {parser:Value[0][1]} would return the first document-level occurrence of “parser:Value”.

  • To specify the first value found, use: []

    For example, {bar1:zone.Address[]} would return the first barcode found in the “Address” zone, regardless of page.

  • To indicate that multiple values should be returned as a joined string, use: |

    For example, {bar1:zone.part number[]|\-} would return all values of bar1:zone.part number, separated by “-“.

System-Defined Variables

DATE TYPES

Variable: %a (Abbreviated weekday name)
Syntax: {date:%a}

Variable: %A (Full weekday name)
Syntax: {date:%A}

Variable: %b (Abbreviated month name)
Syntax: {date:%b}

Variable: %B (Full month name)
Syntax: {date:%B}

Variable: %d (Day of the month (01-31))
Syntax: {date:%d}

Variable: %H (Hour in 24h format (00-23))
Syntax: {date:%H}

Variable: %I (Hour in 12h format (01-12))
Syntax: {date:%I}

Variable: %j (Day of the year (001-366))
Syntax: {date:%j}

Variable: %m (Month as a decimal number (01-12))
Syntax: {date:%m}

Variable: %M (Minute (00-59))
Syntax: {date:%M}

Variable: %p (AM or PM designation)
Syntax: {date:%p}

Variable: %S (Second (00-61))
Syntax: {date:%S}

Variable: %u (Week number starts with Sunday as the first day of week (00-53))
Syntax: {date:%U}

Variable: %w (Week number starts with Monday as the first day of week  (0-6))
Syntax: {date:%w}

Variable: %y (Year, last two digits (00-99))
Syntax: {date:%y}

Variable: %Y (Year)
Syntax: {date:%Y}

Variable: %Z (Time zone name or abbreviation)
Syntax: {date:%Z}

Variable: %% (Time zone name or abbreviation)
Syntax: {date:%%}

MFP Panel

Variable: jcf_id (The file name (MFP’s mac address + time/stamp))
Syntax: {best:jcf_id}

Variable: job_id (The MFP’s job ID)
Syntax: {best:job_id}

Variable: mac (The MFP’s mac address)
Syntax: {best:mac}

Variable: mfp_address (The MFP’s IP address)
Syntax: {best:mfp_address}

Variable: num_files (The number of files that were sent as part of the job)
Syntax: {best:num_files}

Variable: pages (The number of pages scanned)
Syntax: {best:pages}

Variable: product_id (The SNMP ID for the device)
Syntax: {best:product_id}

Variable: product_name (The manufacturer and model name of the device)
Syntax: {best:product_name}

Variable: user_id (The numeric ID of the logged-in user)
Syntax: {best:user_id}

Variable: user_name (The MFP’s user name)
Syntax: {best:user_name}

EMAIL

Variable: body (Content of the email)
Syntax: {email:body}

Variable: cc (The email address of the carbon copied recipient)
Syntax: {email:cc}

Variable: date (The date and time that the message was received)
Syntax: {email:date}

Variable: from (The email address of the email’s author)
Syntax: {email:from}

Variable: in-reply-to (The message ID of the message that the email is a reply to)
Syntax: {email:in-reply-to}

Variable: message-id (The unique identifier of the message)
Syntax: {email:message-id}

Variable: received (The tracking information generated by mail servers, including the date and time that the message was received)
Syntax: {email:received}

Variable: references (The message ID of the message that the email is a reply to and the message ID of the message that the previous reply was a reply to, etc.)
Syntax: {email:references}

Variable: sender (The IP address of the sender)
Syntax: {email:sender}

Variable: subject (The message’s subject line)
Syntax: {email:subject}

Variable: to (The email address(es) of the message’s receipient(s))
Syntax: {email:to}

FILE LEVEL

Variable: fullname (File name and extension)
Syntax: {file:fullname}

Variable: name (The file name (up to the last period and not including the file extension))
Syntax: {file:name}

Variable: ext (The file extension (file name starting from the last period and going to the end))
Syntax: {file:ext}

Variable: size (The size of the file (in bytes))
Syntax: {file:size}

FILE SYSTEM

Variable: DesktopDirectory (i.e., C:\Users\username\Desktop)
Syntax: {fs:DesktopDirectory}

Variable: Personal (i.e., C:\Users\username\Documents)
Syntax: {fs:Personal}

Variable: ProgramFiles (i.e., C:\ProgramFiles)
Syntax: {fs:ProgramFiles}

Variable: LocalApplicationData (i.e., C:\Users\username\AppData\Local)
Syntax: {fs:LocalApplicationData}

Variable: ApplicationData (i.e., C:\Users\username\AppData\Roaming)
Syntax: {fs:ApplicationData}

Variable: CommonApplicationData (i.e., C:\ProgramData)
Syntax: {fs:CommonApplicationData}

Variable: CommonProgramFiles (i.e., C:\Program Files\Common Files)
Syntax: {fs:CommonProgramFiles}

Variable: System (i.e., C:\Windows\System32)
Syntax: {fs:System}

Variable: MyPictures (i.e., C:\Users\username\MyPictures)
Syntax: {fs:MyPictures}

Variable: MyMusic (i.e., C:\Users\username\MyMusic)
Syntax: {fs:MyMusic}

Variable: Favorites (i.e., C:\Users\username\Favorites)
Syntax: {fs:Favorites}

Variable: History (i.e., C:\Users\username\AppData\Local\Microsoft\Windows\History)
Syntax: {fs:Favorites}

Variable: Programs (i.e., C:\Users\username\AppData\Roaming\Microsoft\Windows\Start Menu\Programs)
Syntax: {fs:Programs}

Variable: Recent (i.e., C:\Users\username\AppData\Local\Microsoft\Windows\Recent)
Syntax: {fs:Recent}

LPR

Variable: host (Machine name)
Syntax: {lpr:host}

Variable: jobname (Original file name)
Syntax: {lpr:jobname}

Variable: jobnumber (Unique number of print job)
Syntax: {lpr:jobnumber}

Variable: print (File name as submitted during protocol)
Syntax: {lpr:print}

Variable: queue (Queue name specified)
Syntax: {lpr:queue}

Variable: source (Name of source file)
Syntax: {lpr:source}

Variable: user (Level of logged-in user)
Syntax: {lpr:user}

Variable: user (Level of logged-in user)
Syntax: {lpr:user}

SMTP

Variable: from (The sender’s mailbox (not necessarily the sender’s email address). The mailbox address is wrapped in angle brackets (e.g., <user@example.com>))
Syntax: {smtp:from}

Variable: rcpt (The destination mailbox, one per recipient at the time of receipt. This will be set for all of the To, CC, and BCC mailboxes.)
Syntax: {smtp:rcpt}

USER

Variable: domain (The Windows domain name)
Syntax: {user:domain}

Variable: name (Windows logged-in user name )
Syntax: {user:name}

Metadata Groups

The following metadata groups are created as page-level metadata:

Annotate

Variable: {annotate:variable}

Syntax: {annotate:variable[<page>]}

Example: {annotate:date[5]} - Finds the ‘date’ metadata on the 5th page

Advanced Bates Stamp

Variable: {bates:variable}

Syntax: {bates:variable[<page>]}

Example: {bates:counter[3]} - Finds the ‘counter’ metadata on the 3rd page

Advanced OCR

Variable: {ocr:zone.variable}

Syntax: {ocr:zone.variable[<page>]}

Example: {ocr:name[]} - Finds the ‘name’ zone in the document

Barcode Processing (Standard)

Variable: {bar1:zone.variable}

Syntax: {bar1:zone.variable[]}

Example: {bar1:zone.128[2]} Finds the ‘128’ zone on the 2nd page

Barcode Processing (2D)

Variable: {bar2:zone.variable}

Syntax: {bar2:zone.variable[]}

Example: {bar2:zone.128[2]} Finds the ‘128’ zone on the 2nd page