On 6/20/07, Raimond Spekking <raimond.spekking(a)gmail.com> wrote:
Hmmm, it seems, that preg_quote() is doing too much:
with preg_quote which does not work:
"/http:\/\/+[a-z0-9_\-.]*(wiener-gasometer\\\.at\/index\\\.html\|dispatch\\\.opac\\\.d-nb\.de\|wikipedia\\\.org)/Si"
instead of
with str_replace which works already in SpamBlacklist:
"/http:\/\/+[a-z0-9_\-.]*(wiener-gasometer\.at\/index\.html|dispatch\.opac\.d-nb.de|wikipedia\.org)/Si"
But I am no regex expert, maybe I missed a parameter/point :-(
Just glancing at the code and your results, you probably want to
preg_quote() the individual URLs, before you concatenate them with
'|'. Make sure to use preg_quote( $url, '/' ) so it escapes the
delimiter '/' too. Incidentally, you may want to use a delimiter
other than / for URLs, just for prettiness.
So I'd change it something like:
$regexes = '';
- $regexStart = '/http:\/\/+[a-z0-9_\-.]*(';
- $regexEnd = ')/Si';
+ $regexStart = '!http://+[-a-z0-9_.]*(';
+ $regexEnd = ')!Si';
$regexMax = 4096;
$build = false;
foreach( $lines as $line ) {
// FIXME: not very robust size check,
but should work. :)
if( $build === false ) {
$build = $line;
} elseif( strlen( $build ) + strlen(
$line ) > $regexMax ) {
- $regexes .= $regexStart .
- str_replace( '/',
'\/', preg_replace('|\\\*/|', '/', $build) ) .
- $regexEnd;
+ $regexes .= $regexStart .
$build . $regexEnd;
- $build = $line;
+ $build = preg_quote($line, '!');
} else {
- $build .= '|' . $line;
+ $build .= '|' . preg_quote($line,
'!');
}
}
if( $build !== false ) {
- $regexes .= $regexStart .
- str_replace( '/', '\/',
preg_replace('|\\\*/|', '/', $build) ) .
- $regexEnd;
+ $regexes .= $regexStart . $build . $regexEnd;
}
Although I haven't tested that exact code.