[PLUG] Regular expressions

Sean, Sharon and Kyle Harbour sharbours at yahoo.com
Wed May 22 16:26:19 UTC 2002


Actually, I'm not sure. I've been using Visual REGEXP 2.2 to test
with on my Mandrake 8.2 workstation. The observed behavior seems to
be exactly what I see from the production Mandrake 8.1 servers using
this statement in the expression list on squidGuard 1.2. What I've
been doing is taking the expression list straight from Eric's
/squidGuard/blacklists/adult/expressions and trying to figure out how
and why it works. You probably want to grab a copy directly from
Eric's site, but here is a sanitized version:

(one|two|three)
(^|[-.\?+=/_0-9])(four|five|six)?(seven|eight|nine?|ten|el?ven)s?(twelve|thirteen|fourteen)?([-.\?+=/_0-9]|$)

Sample input:
http://groups.google.com/groups?q=one+two+three&ie=UTF8&oe=UTF8&hl=en&btnG=Google+Search
http://groups.google.com/groups?hl=en&lr=&ie=UTF8&oe=UTF8&q=nineteen+ten+eleven&btnG=Google+Search
http://groups.google.com/groups?hl=en&lr=&ie=UTF8&oe=UTF8&q=one+ten+twelve&btnG=Google+Search

Observed behavior:
A URL with any single pattern matching the first line will always
match.
A URL with any two matches from seven through el?ven on the second
line will always match.
Four through six don't seem to match anything.
Twelve through fourteen I have not verified as matching anything
either.

Sean Harbour

--- Paul Heinlein <heinlein at attbi.com> wrote:

> What regex engine did you use for testing? Perl? sed? [e]grep? In
> Perl, that regex would match any single string or line in a file;
> same
> with egrep and sed, though the latter wouldn't be able to
> comprehend
> it unless you escaped the metacharacters.
> 
> In every case, it should match the beginning of the line/string.
> Since 
> the second group is entirely optional -- and since you don't rule
> out 
> duplicate strings by anchoring the end of the string -- there are
> no 
> disqualifying strings.
> 
> Can you post the file/stdin that you passed to the regex engine and
> 
> the exact invocation of the utility you used to match it?




__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com




More information about the PLUG mailing list