Thread: [Pas-dev] perldoc perlre
Status: Beta
Brought to you by:
mortis
From: Mental <me...@ne...> - 2002-05-20 16:53:02
|
Hey, I suck at this stuff. I've condensed this down from 4 nested ifs to one. I'd like to go further, but 1. I'm not sure how 2. I'm not sure anybody'd be able to read it. # find the script, embed, applet and object tags anywhere in a # query parameter and defang them. No original content is lost, # so you can always decode_entities to restore the data. if ($x =~ /(<\s*script[\s+?.*?>|>].*?\<\s*?\/script\s*?>)| (<\s*embed[\s+?.*?>|>].*?\<\s*?\/embed\s*?>)| (<\s*applet[\s+?.*?>|>].*?\<\s*?\/applet\s*?>)| (<\s*object[\s+?.*?>|>].*?\<\s*?\/object\s*?>)/imx ){ $x = HTML::Entities::encode_entities($x); $q->param($p,$x); } } -- Mental (Me...@Ne...) |
From: Kyle R . B. <mo...@vo...> - 2002-05-20 17:08:44
|
> Hey, I suck at this stuff. I've condensed this down from 4 nested ifs to > one. I'd like to go further, but > 1. I'm not sure how > 2. I'm not sure anybody'd be able to read it. > > # find the script, embed, applet and object tags anywhere in a > # query parameter and defang them. No original content is lost, > # so you can always decode_entities to restore the data. > if ($x =~ /(<\s*script[\s+?.*?>|>].*?\<\s*?\/script\s*?>)| > (<\s*embed[\s+?.*?>|>].*?\<\s*?\/embed\s*?>)| > (<\s*applet[\s+?.*?>|>].*?\<\s*?\/applet\s*?>)| > (<\s*object[\s+?.*?>|>].*?\<\s*?\/object\s*?>)/imx ){ > $x = HTML::Entities::encode_entities($x); > $q->param($p,$x); > } > } A few quick questions, is HTML::Entities a core module or something you have to get from CPAN? If it's not core, please add it to the INSTALL instructions. Also, do we need to transliterate all of the possible html entities? Can we just do '<', and '>' and get the desired effect? Can we just encode the dangerous entites regardless of what else may be in the string (i.e. don't bother looking for SCRIPT/CODE/etc., just do the scrubbing)? Perhaps something like: (warning, this is pseudo code) sub scrubParams { my($self) = @_; my %map = ( '<' => '<', '>' => '>' ); foreach my $paramName ( $self->query()->param() ) { my @newValues = (); foreach my $param ( $self->query()->param($paramName) ) { foreach my $entity ( keys %map; ) { $param =~ s/$entity/ $map{$entity} /ge; push @newValues, $param; } } $self->query()->param($paramName,@newValues); } return 1; } That's a nice standard thing to do -- then if people actualy _want_ them unscrubbed, they can reverse the process...we are going on the assumption that angle brackets (gt/lt) won't be in query parameters very often. If they are, developers have to handle it explicitly -- kind of like tainted variables. If you really want to have it that way, the onus is on you to untaint the data as you see fit. To reverse, something like: sub unScrubParams { my($self) = @_; my %map = ( '<' => '<', '>' => '>' ); foreach my $paramName ( $self->query()->param() ) { my @newValues = (); foreach my $param ( $self->query()->param($paramName) ) { foreach my $entity ( keys %map; ) { $param =~ s/$entity/ $map{$entity} /ge; push @newValues, $param; } } $self->query()->param($paramName,@newValues); } return 1; } Does that fit with what you're trying to implement? Kyle -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |
From: Mental <me...@ne...> - 2002-05-20 17:29:08
|
On Mon, 2002-05-20 at 13:08, Kyle R . Burton wrote: > A few quick questions, is HTML::Entities a core module or something > you have to get from CPAN? If it's not core, please add it to the > INSTALL instructions. > Package: libwww-perl Priority: optional Section: interpreters Installed-Size: 884 Maintainer: Michael Alan Dorman <md...@de...> Architecture: all Version: 5.64-1 Depends: perl (>= 5.6.0-16), libnet-perl (>= 1:1.09), libdigest-md5-perl, libmime-base64-perl (>= 2.1), liburi-perl (>= 1.10), libhtml-parser-perl (>= 2.20), libhtml-tree-perl (>= 3.11) Recommends: libdigest-md5-perl, libmailtools-perl, libhtml-format-perl Suggests: libcrypt-ssleay-perl | libio-socket-ssl-perl Filename: pool/main/libw/libwww-perl/libwww-perl_5.64-1_all.deb Size: 293062 MD5sum: 6a3f908c2fd7654201319d2d4b83e028 Description: WWW client/server library for Perl Libwww-perl is a collection of Perl modules which provides a simple and consistent programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW clients, thus libwww-perl said to be a WWW client library. The library also contain modules that are of more general use, as well as a simple HTTP/1.1-compatible server implementation. . The URI modules have been split off into liburi-perl; install that package if you need them. I'll add libwww and lwp user agent to the INSTALL. Or is lwp stuff part of libwww? I forget. > Also, do we need to transliterate all of the possible html entities? > Can we just do '<', and '>' and get the desired effect? > > Can we just encode the dangerous entites regardless of what else may be in > the string (i.e. don't bother looking for SCRIPT/CODE/etc., just do the > scrubbing)? > I was trying to make this as non-invasive as possible. What if you're on a discussion board and want to embed a url in your message? Or an IMG SRC tag? Neither of those are malicious in and of themselves. Ultimately what I wanted to do was make this so it was something you could subclass and add your own rules to. If you're running a kids site, you could subclass it and immediately clean up (censor) any submitted data with the 7 dirty words. I was going to add a parameter to pas.conf that contained the name of the package to use to run the data through eventually. I just need to get around the tricky bit of needing to use that package in the Request.pm :/ Does this make sense? -- Mental (Me...@Ne...) |
From: Kyle R . B. <mo...@vo...> - 2002-05-20 17:57:09
|
> > A few quick questions, is HTML::Entities a core module or something > > you have to get from CPAN? If it's not core, please add it to the > > INSTALL instructions. > Package: libwww-perl > > I'll add libwww and lwp user agent to the INSTALL. > Or is lwp stuff part of libwww? I forget. I think the lwp stuff is part of libwww. Just adding that should be fine. Thanks. > I was trying to make this as non-invasive as possible. What if you're on > a discussion board and want to embed a url in your message? Or an IMG > SRC tag? Neither of those are malicious in and of themselves. Good point. > Ultimately what I wanted to do was make this so it was something you > could subclass and add your own rules to. If you're running a kids site, > you could subclass it and immediately clean up (censor) any submitted > data with the 7 dirty words. > > I was going to add a parameter to pas.conf that contained the name of > the package to use to run the data through eventually. I just need to > get around the tricky bit of needing to use that package in the > Request.pm :/ > > Does this make sense? Yes, it does. Could we just change it to: foreach ... { $param =~ s/<\s*SCRIPT\b/<SCRIPT/ge; $param =~ s/<\s*CODE\b/<CODE/ge; $param =~ s/<\s*APPLET\b/<APPLET/ge; } and just forget the trailing '>'? Or is that necessary? How to escape it without potentialy manipulating data we're not supposed to is a delicate issue. Kyle -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |
From: Justin B. <ju...@le...> - 2002-05-20 18:09:02
|
>>Ultimately what I wanted to do was make this so it was something you >>could subclass and add your own rules to. If you're running a kids site, >>you could subclass it and immediately clean up (censor) any submitted >>data with the 7 dirty words. >> >>I was going to add a parameter to pas.conf that contained the name of >>the package to use to run the data through eventually. I just need to >>get around the tricky bit of needing to use that package in the >>Request.pm :/ >> >>Does this make sense? > > > Yes, it does. Could we just change it to: > > foreach ... { > $param =~ s/<\s*SCRIPT\b/<SCRIPT/ge; > $param =~ s/<\s*CODE\b/<CODE/ge; > $param =~ s/<\s*APPLET\b/<APPLET/ge; > } > > and just forget the trailing '>'? Or is that necessary? How to escape > it without potentialy manipulating data we're not supposed to is a > delicate issue. not getting the trailing '>' my lead to a problem with the display of the page. also don't forget to get the '</SCRIPT>', etc. out of the code. hmm.. not translating all '>' or '<' can lead to some interesting pages. i could start posting in '</td>' or '</input>' or '</form>'. they're not viscious but, going back to the discussion board example, you could feasibly break a thread in a discussion board by passing some simple HTML. justin |
From: Kyle R . B. <mo...@vo...> - 2002-05-20 18:19:28
|
> > foreach ... { > > $param =~ s/<\s*SCRIPT\b/<SCRIPT/ge; > > $param =~ s/<\s*CODE\b/<CODE/ge; > > $param =~ s/<\s*APPLET\b/<APPLET/ge; > > } > > > > and just forget the trailing '>'? Or is that necessary? How to escape > > it without potentialy manipulating data we're not supposed to is a > > delicate issue. > > not getting the trailing '>' my lead to a problem with the display of the page. > also don't forget to get the '</SCRIPT>', etc. out of the code. > > hmm.. not translating all '>' or '<' can lead to some interesting pages. i > could start posting in '</td>' or '</input>' or '</form>'. they're not viscious > but, going back to the discussion board example, you could feasibly break a > thread in a discussion board by passing some simple HTML. Yes, that's right. For discussion boards, we could have a re-encode function that re-enabled alot of the 'safe' stuff. To go backwards for 'safe' tags, we could just have: foreach ... { foreach my $tag ( qw( a img p br pre code font ) ) { $param =~ s#<($tag.+?)>#<$1>#gi; $param =~ s#<(/$tag.+?)>#<$1>#gi; } ... } Gak, you always hear people say that you can't swat HTML with regexes...and they say that for good reason. The proper appraoch is to use a fully blown HTML parser...but that's really overly heavy-weight for this application... Discussion boards might want to turn off the blanket scrubber, and use their own, _or_ un-scrub the stuff they want to allow...I don't know...if we write one that scrubs appropriatly for the discussion board example, maybe we should just make that the default... k -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |
From: Kyle R . B. <mo...@vo...> - 2002-05-20 18:24:10
|
> Hey, I suck at this stuff. I've condensed this down from 4 nested ifs to > one. I'd like to go further, but > 1. I'm not sure how > 2. I'm not sure anybody'd be able to read it. > > # find the script, embed, applet and object tags anywhere in a > # query parameter and defang them. No original content is lost, > # so you can always decode_entities to restore the data. > if ($x =~ /(<\s*script[\s+?.*?>|>].*?\<\s*?\/script\s*?>)| > (<\s*embed[\s+?.*?>|>].*?\<\s*?\/embed\s*?>)| > (<\s*applet[\s+?.*?>|>].*?\<\s*?\/applet\s*?>)| > (<\s*object[\s+?.*?>|>].*?\<\s*?\/object\s*?>)/imx ){ > $x = HTML::Entities::encode_entities($x); > $q->param($p,$x); > } > } How about something like: foreach my $paramName ( $q->param() ) { my @values = (); foreach my $value ( $q->param($paramName) ) { foreach my $badTag ( qw( script embed applet object ) ) { $value =~ s|<\s*($badTag\b.[^>]+)>(.*?)<(\s*?/$badTag\s*)>|<$1>$2<$3>|igms; } push @values, $value; } $q->param($paramName,@values); } Is that any more readable? That only hits the tags we're interested in defanging... Don't forget that you can have mulitple values for a CGI parameter. -- ------------------------------------------------------------------------------ Wisdom and Compassion are inseparable. -- Christmas Humphreys mo...@vo... http://www.voicenet.com/~mortis ------------------------------------------------------------------------------ |
From: Justin B. <ju...@le...> - 2002-05-20 18:29:31
|
Kyle R . Burton wrote: >>Hey, I suck at this stuff. I've condensed this down from 4 nested ifs to >>one. I'd like to go further, but >>1. I'm not sure how >>2. I'm not sure anybody'd be able to read it. >> >> # find the script, embed, applet and object tags anywhere in a >> # query parameter and defang them. No original content is lost, >> # so you can always decode_entities to restore the data. >> if ($x =~ /(<\s*script[\s+?.*?>|>].*?\<\s*?\/script\s*?>)| >> (<\s*embed[\s+?.*?>|>].*?\<\s*?\/embed\s*?>)| >> (<\s*applet[\s+?.*?>|>].*?\<\s*?\/applet\s*?>)| >> (<\s*object[\s+?.*?>|>].*?\<\s*?\/object\s*?>)/imx ){ >> $x = HTML::Entities::encode_entities($x); >> $q->param($p,$x); >> } >> } > > > How about something like: > > foreach my $paramName ( $q->param() ) { > my @values = (); > foreach my $value ( $q->param($paramName) ) { > foreach my $badTag ( qw( script embed applet object ) ) { > $value =~ s|<\s*($badTag\b.[^>]+)>(.*?)<(\s*?/$badTag\s*)>|<$1>$2<$3>|igms; > } > push @values, $value; > } > $q->param($paramName,@values); > } > > Is that any more readable? That only hits the tags we're interested in > defanging... > > Don't forget that you can have mulitple values for a CGI parameter. how about this: make badTag configurable inside pas.conf. whatever the variable will be named, returns an array. so if someone wants it for all HTML tags, they just set the variable to '.*'. have that example & the 'script embed applet object' example. justin |