Welcome Guest ( Log In | Register)



 
Reply to this topicStart new topic
> Need Help About Regular Expressions
Erdemir
post Aug 16 2008, 04:50 PM
Post #1


Super Member
*********

Group: [HOSTED]
Posts: 218
Joined: 12-May 08
From: Istanbul, Turkey
Member No.: 62,045



Hi,
In my website, I don't want to allow writing links to another sites but except some sites.
CODE
//This is the text which is sent by the guest
$variable='Some texts <a href="http://www.google.com/">Google</a><a href="http://www.microsoft.com/">Microsoft</a> ';

//And the following line replaces all <a html tags to "no links allowed" text.
$variable = preg_replace("!<a[^>]*(http|www|mailto)(.*)</a>!siU", "no links allowed", $variable);
echo($variable);


I want to allow only a few sites: google.com, trap17.com, yahoo.com, etc.
I want to disallow microsoft.com, hotmail.com and any other sites.
What are your suggestions?
What regular expression should I use? or other opinions?

This post has been edited by Erdemir: Aug 16 2008, 05:07 PM
Go to the top of the page
 
+Quote Post
jlhaslip
post Aug 16 2008, 06:32 PM
Post #2


A computer once beat me at chess, but it was no match for me at kick boxing.
Group Icon

Group: [MODERATOR]
Posts: 4,300
Joined: 24-July 05
From: Linix, DOS and Windows…the good, the bad and the ugly
Member No.: 9,787
Spam Patrol
myCENT:46.50



If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?

Regular expressions can very resource intensive on the server.
Go to the top of the page
 
+Quote Post
Erdemir
post Aug 16 2008, 06:55 PM
Post #3


Super Member
*********

Group: [HOSTED]
Posts: 218
Joined: 12-May 08
From: Istanbul, Turkey
Member No.: 62,045



QUOTE(jlhaslip @ Aug 16 2008, 09:32 PM) *
If you have a short list of acceptable links which you will allow, maybe a switch/case structure would be easier? Based on an Array of suitable values?

Regular expressions can very resource intensive on the server.

Ok, my allowed links array is here
CODE
$allowedlinks = array ("google.com", "trap17.com", "yahoo.com", "dmoz.org");

Now, how can we integrate switch/case with preg_replace or without preg_replace?

Thanks...

This post has been edited by Erdemir: Aug 16 2008, 06:55 PM
Go to the top of the page
 
+Quote Post
galexcd
post Aug 16 2008, 09:32 PM
Post #4


Define:EVIL PROGRAMMER (ē'vəl prō'grăm'ər)- n. An organism that converts caffeine into evil software.
***********

Group: [HOSTED]
Posts: 1,189
Joined: 25-September 05
From: Los Angeles, California
Member No.: 12,251
myCENT:39.85



I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

CODE
<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?trap17\.com))).)+["']?>(.*?)</a>


This should match all links that aren't wikipedia.org, google.com and trap17.com.
Go to the top of the page
 
+Quote Post
Erdemir
post Aug 16 2008, 09:59 PM
Post #5


Super Member
*********

Group: [HOSTED]
Posts: 218
Joined: 12-May 08
From: Istanbul, Turkey
Member No.: 62,045



QUOTE(galexcd @ Aug 17 2008, 12:32 AM) *
I know I said I wasn't good at regular expressions, however I've been poking around with your problem for the past hour and found something that may work:

CODE
<a href=["']?((?!((http://(www\.)?wikipedia\.org)|(http://(www\.)?google\.com)|(http://(www\.)?trap17\.com))).)+["']?>(.*?)</a>


This should match all links that aren't wikipedia.org, google.com and trap17.com.

Sorry, I couldn't use that directly in php. But your code was too helpful. I will try to use it in preg_replace(). Thanks.
Any more suggestions?
Go to the top of the page
 
+Quote Post
jlhaslip
post Aug 16 2008, 10:44 PM
Post #6


A computer once beat me at chess, but it was no match for me at kick boxing.
Group Icon

Group: [MODERATOR]
Posts: 4,300
Joined: 24-July 05
From: Linix, DOS and Windows…the good, the bad and the ugly
Member No.: 9,787
Spam Patrol
myCENT:46.50



A small note found on the php.net manual pages for the preg_match function:
QUOTE
Tip

Do not use preg_match() if you only want to check if one string is contained in another string. Use strpos() or strstr() instead as they will be faster.
Go to the top of the page
 
+Quote Post
galexcd
post Aug 16 2008, 10:53 PM
Post #7


Define:EVIL PROGRAMMER (ē'vəl prō'grăm'ər)- n. An organism that converts caffeine into evil software.
***********

Group: [HOSTED]
Posts: 1,189
Joined: 25-September 05
From: Los Angeles, California
Member No.: 12,251
myCENT:39.85



QUOTE(jlhaslip @ Aug 16 2008, 03:44 PM) *
A small note found on the php.net manual pages for the preg_match function:


Interesting find... However, I believe he wishes to replace the bad links with "No Links" or something similar. I think as you said previously the regexp engine is pretty resource heavy on the server and takes a bit longer to process. Perhaps a non regular expression method would be the best way. I might see if I can whip one up for you, however until then I think we still need to hear from one of our regular expression experts... *cough* rvalkass *cough*
Go to the top of the page
 
+Quote Post
truefusion
post Aug 17 2008, 12:13 AM
Post #8


Ephesians 6:10-17
Group Icon

Group: [MODERATOR]
Posts: 1,978
Joined: 22-June 05
From: somewhere... Where am i?
Member No.: 8,528
myCENT:18.27



The following should get the results you desire:
CODE
<?php

$allowedlinks = array("google.com", "trap17.com", "yahoo.com", "dmoz.org");

function site_filter($matches){
    global $allowedlinks;

    if (!preg_match("/(?:".implode("|", $allowedlinks).")\/?$/", $matches[1])){
        return "Link not allowed.";
    } else {
        return $matches[0];
    }
}

$variable='Some texts <a href="http://www.google.com/">Google</a><a href="http://www.microsoft.com/">Microsoft</a> ';

$variable = preg_replace_callback("/<a href=[\"']([^<]+)[\"']>[^<]+<\/a>/", "site_filter", $variable);

echo $variable;

?>

Would have been easier to work with if each anchor element was on its own line.
Go to the top of the page
 
+Quote Post
Erdemir
post Aug 17 2008, 07:01 PM
Post #9


Super Member
*********

Group: [HOSTED]
Posts: 218
Joined: 12-May 08
From: Istanbul, Turkey
Member No.: 62,045



QUOTE(truefusion @ Aug 17 2008, 03:13 AM)