Nov 21, 2009

Perl Regular Expresions Regex - Learn how to use regular expressions

free web hosting
Open Discussion > MODERATED AREA > Tutorials

Perl Regular Expresions Regex - Learn how to use regular expressions

Galahad
I've searched the Tutorials section, but haven't found a RegEx tutorial, so I thought I'd add one, since it's very usefull to know regular expressions if you're a programmer... If I overlooked a tutorial explaining RegEx, my bad, just erase this tutorial...

Ok, first off, regular expressions are a great functionality of Perl, but the can also be found in other languages and environments, such as linux shells, or PHP... I'm guessing most people would be interesten in useing regular expressions in PHP...

Let's go on with a complex matching regular expresions

CODE
my $var = "This is some text here that we need to search.";
if ($var =~ m/th/) {
  print "I found a th\n";
}

In the following example, regular expression will locate a 'th' in 'that', and not in 'This', because matching is case sensitive. If we wanted case insensitive matching, we would write this:
CODE
my $var = "This is some text here that we need to search.";
if ($var =~ m/th/i) {
  print "I found a th\n";
}

And now, match will be found at 'This'. Regular expressions always try to find a match as soon as possible, and to make it as large as possible.

Summary of regular expression matching
CODE
m/search_text/ - Find search_text
m/^search_text/ - Match search_text but only at the begining of the line. Operator ^ does this
m/search_text$/ - Match search_text but only at the end of the line. Operator $ does this.
m/^search_text$/ - Match search_text, but only if it's the entire text
m/search_text/i - Match search_text, but case insensitive


Of course, if regular expressions were this easy, there wouldn't be a need for a tutorial. Regular expressions are quite powerfull, and can match anything in the text. For example, these wildcards can be used in regular expressions to find anything:
CODE
. - Match any character
\w - Match words (alphanumeric characters and "_")
\W - Match non-words
\s - Match whitespace character
\S - Match non-whitespace character
\d - Match digit character
\D - Match non-digit character
\t - Match tab
\n - Match newline
\r - Match return
\f - Match formfeed
\a - Match alarm characted (bell, beep, and others)
\e - Match escape
\O45 - Octal characters match; in this case, it's 45 octal; Replace O with 0... I had to do this because PHP parses  as nothing, as you can see :)
\x6fa - Hexadecimal character match; in this case. it's 6FA hex

Also, combined with these wildcards, you can use repetition operators:
CODE
* - Match 0 or more times
+ - Match 1 or more times
? - Match 0 or 1 times
{n} - Match exactly n times
{n,} - Match at least n times
{n,m} - Match at least n, but not more than m times


Ok, I'll add a few examples for this so far:
CODE
$var =~ m/\+\d{1,3}\ \(\d{1,3}\)\ \d{3,4}-\d{3,4}/; # This example will match a telephone number in the following format +381 (21) 123-456 or +381 (21) 1234-567
$var =~ m/^Hello/; # This example will match "Hello, world", but not " Hello, world" or "hello, world", because search is case sensitive, and requires the line to begin with Hello
$var =~ m/galahad/i; # This line would match wherever a Galahad or galahad is found in text; search is case insensitive

Also, note how I escaped a space, a +, and brackets. I did this because they are also used by regular expressions, and escaping them makes regular expressions treat them as a common text. You escape a character with backslash (\)...Slashes also have to be escaped.

Ok, we're half way there... Now we go on to character groups and character classes...
What character groups do, is allow alternative phrases to be used. In the next example, it would be a match if we had a Susan, Marie, or Jennifer in the text
CODE
$var =~ m/(Susan|Marie|Jennifer)/;

Character groups also allow for retrieval of selected text, when used in selections, and placing them in scalars $1, $2, .. Buit I will cover that a bit later.

Character classes allow for character ranges. For example, this short line would match if we have names starting with A through N:
CODE
$var =~ m/^[A-N]/;

Character classes consist of one character, and one character only. The following will NOT work:
CODE
$var =~ m/^[Ab-Ne]/;


As per experience of others, character classes can be a bit quirky, so avoid using them, since character groups will almost always give you what you need. And now off to:

Selections AKA Parsing

Ok, we established that regular expressions are a mighty thing, but so far, they don't do anything spectacular. I mentioned character groups a bit earlier, and mentioned they can be used to retrieve selections. And here's how, and were regular expressions excell and get very usefull.

Say we have a phone number +381 (21) 123-456. Country code is 381, and area code is 21. And let's say we need all these in separate variables. Here's what we would do:
CODE
my $phone="+381 (21) 123-456";
$phone =~ /\+(.+)\ \((.+)\)\ (.+)/;
my $country = $1; # $country will contain 381
my $area = $2; # $area will contain 21
my $num = $3; # $num will contain 123-456

Pretty powerfull, huh? This is probably the best thing about regular expressions..

And one more thing you can do with regular expressions is...

Substitutions:

These are quite simple to master:
CODE
my $var = "Trap17 sucks";
$var =~ s/sucks/rules/; # $var now contains "Trap17 rules"


Other things to note:
- If you want to make your search case insensitive, just add an i at the end of the regular expression, eg. m/match/i
- If you want to change all instances of a word, add an g at the end of the regular expression, eg. s/to_replace/replacer/g
- You can combine i and g, and have s/to_replace/replacer/gi, or s/blank//gi; The last one replaces all occurences of blank, with nothing ("")
- =~ means matches
= !~ means does not match

And voila, you now have sufficient knowledge to make rather powerfull regular expressions, and incorporate them in your PHP scripts, or Perl scripts, or wherever. I hope you found this tutorial usefull. Also, don't hesitate to experiment with regular expressions, because, that's the best way to learn something. And of course, don't hesitate to ask questions, if any of this was unclear...

 

 

 


Comment/Reply (w/o sign-up)

optiplex
awesome tutorial! I love regex!
Best thing to use when u fetch remote content biggrin.gif

But in perl it is a little harder than php, in my opinion, but this tutorial
makes it so clear and easy!

Thanks!

Comment/Reply (w/o sign-up)

Galahad
You're very welcome...

Perl can be scary by itself, I know I was frustrated with how it works... It's completely different than conventional programming languages, but then, it's the same... If you catch my drift... An regular expressions can be particularly scary and frustrating... I still haven't got the full hang of it, but every day I get to know Perl a little better... It's aprticularly usefull to know Perl because here at Trap17 we have full cgi support, so we can make Perl scripts that do complex tasks... I plan on series of tutorials for Perl, from how to make a CGI script, connect to a MySQL database, to some other stuff... Just now, I'm working on a primitive mail junk filter... And thanks to RegEx, it's a breeze... It's not a smart filter, it doesn't have the ability to learn, but with a few good rules, whitelists and blacklists, it get's the job done much better and quicker than, for example, SpamAssasin... I suppose I could put it here, and make a tutorial out of it... There's an idea smile.gif I always like to help beginners get a hang of something new, and help them avoid stuff that made me cry with frustration and anger smile.gif

 

 

 


Comment/Reply (w/o sign-up)



Got an Opinion! Express your Views! (no registration):-
Add your Reply/ Opinion/ Views/ Comments/ Suggestion/ Questions/ Queries etc.
Posts with decent grammar & English will be accepted and please refrain from profanities.
For asking a Question, We recommend you to sign-up (for free) so that you can track the topic easily.

Nature of your Post*: Opinion/ Reply/ Comments
Question/Query
Feedback to us.
       
Name   Email
Title/Question*

This textarea will convert to Rich-Text automatically (IE, Firefox, Chrome)

Similar Topics

Keywords : Perl Regular Expresions Regex Regular Expressions


    Looking for perl, regular, expresions, regex, learn, regular, expressions

Searching Video's for perl, regular, expresions, regex, learn, regular, expressions
See Also,
advertisement


Perl Regular Expresions Regex - Learn how to use regular expressions

Affordable Web Hosting, Low cost Web Hosting - ComputingHost.com