Regular Expressions in Perl - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Regular Expressions in Perl

Share This

A regular expression is a string of characters that defines pattern. In Perl, =~ and !~ are used as pattern binding operators.

In Perl following regular expression operators are used.

  • m//Match Regular Expression
  • s///Substitute Regular Expression
  • tr///Transliterate Regular Expression

Here, the forward slashes act as delimiters for the regular expression. You can use different delimiters according to your choice.

Match Regular Expression

The match operator, m// , is used to match a string to a regular expression. The following example shows how to match a character sequence "text" against the scalar $data .

#!/usr/bin/perl $data = "This is a sample text document"; if ($data =~ /text/) { print "Matching found\n"; } else { print "Matching not found\n"; }

You can use any combination of matching characters to act as delimiters for an expression. For example, m{}, m(), and m>< are all valid regular expressions. In this context, the above example can be re-written as follows


#!/usr/bin/perl $data = "This is a sample text document"; if ($data =~ m[text]) { print "Matching found\n"; } else { print "Matching not found\n"; }


#!/usr/bin/perl $data = "This is a sample text document"; if ($data =~ m{text}) { print "Matching found\n"; } else { print "Matching not found\n"; }

Note that m can be ommitted from m// if the delimiters are forward slashes, otherwise you must use the m prefix.

Note that the entire match expression returns true in a scalar context if the expression matches. In a list context, the match returns the contents of any grouped expressions. For example, when extracting the hours, minutes, and seconds from a time string, we can use my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);

Match Operator Modifiers

The modifier /g is used for global matching. The modifier /i can be used for case insensitive matching.

Matching Only Once

The ?PATTERN? operator is identical to the m// operator except that it only matches once within the string. For example, you can use this to get the first and last elements within a list. Let us try the following sample code.


#!/usr/bin/perl @list = qw/food foosball subeo footnote terfoot canic footbridge/; foreach (@list) { $first = $1 if /(foo.*?)/; $last = $1 if /(foo.*)/; } print "First: $first, Last: $last\n";

In regular expression variables

  • $ represents the last grouping match,
  • $& represents the entire matched string,
  • $` represents everything before the matched string and
  • $' represents everything after the matched string.

Let us try the following code to understand in details.


#!/usr/bin/perl $string = "Hi, how are you?"; $string =~ m/ho/; print "Before: $`\n"; print "Matched: $&\n"; print "After: $'\n";







Substitute Regular Expression

Let us try a program to replace a matching string. In the following example program, we are going to replace all occurrences of pattern test with pattern final .


#/user/bin/perl $string = "This is a test code."; $string =~ s/test/final/; print "$string\n";

Transliterate Regular Expression

Translation is similar with substitution, but not identical. Transliteration does not use regular expressions for search on replacement values. It can be used as tr/old_char_list/new_char_list/cds and y/old_char_list/new_char_list/cds .

It replaces all occurrences of the characters in old_char_list with the corresponding characters in new_char_list . Let us try the following code.




#/user/bin/perl $string = 'This is a test code.'; $string =~ tr/a/o/; print "$string\n";

We can also specify ranges of characters either by letter or numerical value. For example, we may use $string =~ tr/a-z/A-Z/; to convert the string to upper case.

The modifier /d can be used to delete the characters matching old_char_list that do not have a corresponding entry in new_char_list . Let us try the following example




#!/usr/bin/perl $string = 'This is a test code.'; $string =~ tr/a-z/b/d; print "$string\n";

The last modifier /s removes duplicate sequences of characters that were replaced.




#!/usr/bin/perl $string = 'book'; $string =~ tr/a-z/a-z/s; print "$string\n";

The ^ metacharacter matches the beginning of the string and the $ metasymbol matches the end of the string. Here are some brief examples.



#!/usr/bin/perl $string = "Cats go Catatonic\nWhen given Catnip"; ($start) = ($string =~ /\A(.*?) /); @lines = $string =~ /^(.*?) /gm; print "First word: $start\n","Line starts: @lines\n";

The | character is just like the standard or bitwise OR within Perl. It specifies alternate matches within a regular expression or group. For example, to match "cat" or "dog" in an expression, you might use if ($string =~ /cat|dog/)

You can group individual elements of an expression together in order to support complex matches. Searching for two people’s names could be achieved with two separate tests, like if (($string =~ /Martin Brown/) || ($string =~ /Sharon Brown/)) . It could be written as if ($string =~ /(Martin|Sharon) Brown/)

Grouping Matching

From a regular-expression point of view, there is no difference between except, perhaps, that the former is slightly clearer. $string =~ /(\S+)\s+(\S+)/; and $string =~ /\S+\s+\S+/; . However, the benefit of grouping is that it allows us to extract a sequence from a regular expression. Groupings are returned as a list in the order in which they appear in the original. For example, in the following fragment we have pulled out the hours, minutes, and seconds from a string.



#!/usr/bin/perl $time = "12:05:30"; $time =~ m/(\d+):(\d+):(\d+)/; my ($hours, $minutes, $seconds) = ($1, $2, $3);



#!/usr/bin/perl $date = '03/26/1999'; $date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#; print "$date\n";

When above program is executed, it produces the following result - 1999/03/26

The \G Assertion The \G assertion allows you to continue searching from the point where the last match occurred. For example, in the following code, we have used \G so that we can search to the correct position and then extract some information, without having to create a more complex, single regular expression -


#!/usr/bin/perl $string = "The time is: 12:31:02 on 4/12/00"; $string =~ /:\s+/g; ($time) = ($string =~ /\G(\d+:\d+:\d+)/); $string =~ /.+\s+/g; ($date) = ($string =~ m{\G(\d+/\d+/\d+)}); print "Time: $time, Date: $date\n";

When above program is executed, it produces the following result Time: 12:31:02, Date: 4/12/00 The \G assertion is actually just the metasymbol equivalent of the pos function, so between regular expression calls you can continue to use pos, and even modify the value of pos (and therefore \G) by using pos as an lvalue subroutine.


Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.