A regular expression is a string of characters that defines
pattern. In Perl,
=~
and
!~
are used as pattern binding operators.
In Perl following regular expression operators are used.
m//
Match Regular Expressions///
Substitute Regular Expressiontr///
Transliterate Regular Expression
Here, the forward slashes act as delimiters for the regular expression. You can use different delimiters according to your choice.
Match Regular Expression
The match operator,
m//
, is used to match a string to a regular expression. The following
example shows how to match a character sequence
"text"
against the scalar
$data
.
You can use any combination of matching characters to act as delimiters for an expression. For example, m{}, m(), and m>< are all valid regular expressions. In this context, the above example can be re-written as follows
Note that
m
can be ommitted from
m//
if the delimiters are forward slashes, otherwise you must use the m
prefix.
Note that the entire match expression returns true in a scalar
context if the expression matches. In a list context, the match
returns the contents of any grouped expressions. For example, when
extracting the hours, minutes, and seconds from a time string, we
can use
my ($hours, $minutes, $seconds) = ($time =~
m/(\d+):(\d+):(\d+)/);
Match Operator Modifiers
The modifier
/g
is used for global matching. The modifier
/i
can be used for case insensitive matching.
Matching Only Once
The
?PATTERN?
operator is identical to the m// operator except that it only
matches once within the string. For example, you can use this to
get the first and last elements within a list. Let us try the
following sample code.
In regular expression variables
$
represents the last grouping match,$&
represents the entire matched string,$`
represents everything before the matched string and$'
represents everything after the matched string.
Let us try the following code to understand in details.
Substitute Regular Expression
Let us try a program to replace a matching string. In the following
example program, we are going to replace all occurrences of pattern
test
with pattern
final
.
Transliterate Regular Expression
Translation is similar with substitution, but not identical.
Transliteration does not use regular expressions for search on
replacement values. It can be used as
tr/old_char_list/new_char_list/cds
and
y/old_char_list/new_char_list/cds
.
It replaces all occurrences of the characters in
old_char_list
with the corresponding characters in
new_char_list
. Let us try the following code.
We can also specify ranges of characters either by letter or
numerical value. For example, we may use
$string =~ tr/a-z/A-Z/;
to convert the string to upper case.
The modifier
/d
can be used to delete the characters matching
old_char_list
that do not have a corresponding entry in
new_char_list
. Let us try the following example
The last modifier
/s
removes duplicate sequences of characters that were replaced.
The ^ metacharacter matches the beginning of the string and the $ metasymbol matches the end of the string. Here are some brief examples.
The | character is just like the standard or bitwise OR within
Perl. It specifies alternate matches within a regular expression or
group. For example, to match "cat" or "dog" in an expression, you
might use
if ($string =~ /cat|dog/)
You can group individual elements of an expression together in
order to support complex matches. Searching for two people’s names
could be achieved with two separate tests, like
if (($string =~ /Martin Brown/) || ($string =~ /Sharon
Brown/))
. It could be written as
if ($string =~ /(Martin|Sharon) Brown/)
Grouping Matching
From a regular-expression point of view, there is no difference
between except, perhaps, that the former is slightly clearer.
$string =~ /(\S+)\s+(\S+)/;
and
$string =~ /\S+\s+\S+/;
. However, the benefit of grouping is that it allows us to extract
a sequence from a regular expression. Groupings are returned as a
list in the order in which they appear in the original. For
example, in the following fragment we have pulled out the hours,
minutes, and seconds from a string.
When above program is executed, it produces the following result -
1999/03/26
When above program is executed, it produces the following result
Time: 12:31:02, Date: 4/12/00
The \G assertion is actually just the metasymbol equivalent of the
pos function, so between regular expression calls you can continue
to use pos, and even modify the value of pos (and therefore \G) by
using pos as an lvalue subroutine.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.