filterproxy-devel Mailing List for FilterProxy

Brought to you by: mcelrath

filterproxy-devel — Discussion of the development of FilterProxy

You can subscribe to this list here.

2001	Jan	Feb	Mar (2)	Apr (1)	May (1)	Jun (2)	Jul (2)	Aug (19)	Sep (1)	Oct (5)	Nov (2)	Dec
2002	Jan (9)	Feb	Mar (3)	Apr (5)	May (15)	Jun (1)	Jul (4)	Aug (3)	Sep (1)	Oct (1)	Nov	Dec
2003	Jan	Feb (1)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct (2)	Nov	Dec
2006	Jan (1)	Feb (1)	Mar (3)	Apr	May	Jun	Jul	Aug	Sep	Oct (2)	Nov (1)	Dec
2007	Jan	Feb (1)	Mar (1)	Apr (1)	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2009	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)

Flat | Threaded

1 2 3 4 > >> (Page 1 of 4)

[Filterproxy-devel] new film download site

From: get-films<ad...@ge...> - 2008-02-02 14:40:00

<HTML>
<HEAD>
<META NAME="GENERATOR" Content="Microsoft DHTML Editing Control">
<TITLE></TITLE>
</HEAD>
<BODY>
<P>&lt;html&gt;<BR>&lt;div align="center"&gt;<BR>&lt;a 
href="<A href="http://www.get-films.com%22%3e%3cimg/" 
target=_blank>http://www.get-films.com"&gt;&lt;img</A> 
src="http://img148.imageshack.us/img148/4692/banner2ci6.gif" alt="Get-Films.Com" 
/&gt;<BR>&nbsp;&lt;/div&gt;<BR>&lt;/html&gt;<BR><A 
href="news://freenews.netfront.net/"></A></P>
</BODY>
</HTML>

[Filterproxy-devel] =?iso-2022-jp?b?GyRCJTslVSVsN0c8KEhEIXcbKEI=?= =?iso-2022-jp?b?GyRCJWEhPCVrPHU/LhsoQg==?=

From: <toi...@ya...> - 2007-05-30 10:37:08

--------------------
$B%a!<%k$,FO$$$F$$$^$9(B

$B!&J~;R(B
$B!&#2#9:P(B
$B!&%;%U%l4uK>(B
$B!&%5%]L5$7(B

$B!V$3$l$+$i2q$($^$;$s$+!)D>$G$*$M$,$$$7$^$9!W(B

http://www.10wn.com/h/?hiroba2
--------------------

[Filterproxy-devel] =?iso-2022-jp?b?GyRCTCQ+NUJ6OS05cCIoPVUbKEI=?= =?iso-2022-jp?b?GyRCJE4lXiVzMyskYjojJCskaSQsS1xIVhsoQg==?=

From: <rtb...@ya...> - 2007-04-10 21:40:41

┌──────────────────────────────┐
<送信者>
Lwh...@gm...

配信停止連絡アドレス： Lwh...@gm...

※必ずお読みください。
本メールマガジンは、主にYahoo!メールアドレス宛にご案内を行ってい
ます。配信停止のご連絡をいただく際は、Yahoo!メールアドレスを差出
人としてメールをご返信下さい。そうでない場合、停止処理が正しく行
えませんのでご注意下さい。
└──────────────────────────────┘



〔今日の一句〕

春蘭マン

あなたはマン開

してますか？
　　　　　　　　　　　　　　○。。
━━━━━━━━━━━━━○※○━━━
　　　　　　　　　　　　   ○○

女性雑誌に多数取り上げられている

超話題のあのサイトが

いよいよ本格的に

男性会員を募集しはじめました！
http://black-6969.org.uk/pc/?ps4in1
※会えるかどうかの検証済み

このメールを読んでくれたアナタに真っ先にお知らせです

完全無料で素人だらけ！

フリメで登録も出来るから安心！

となりのあの娘も

近所の人妻も

み〜んな気軽に登録してるよ！

ちなみに僕も会えました（笑）

http://black-6969.org.uk/pc/?ps4in1
※会えるかどうかの検証済み

春もここからが本番！！

アナタもマン開して下さい！！

[Filterproxy-devel] =?iso-2022-jp?b?GyRCTCQ+NUJ6OS05cCIoJV4bKEI=?= =?iso-2022-jp?b?GyRCJWklPSVzI1MjRSNYNHVLPiROP006ShsoQg==?=

From: <ekq...@ya...> - 2007-03-03 06:07:34

┌──────────────────────────────┐
<送信者>
arz...@gm...

配信停止連絡アドレス： arz...@gm...

※必ずお読みください。
本メールマガジンは、主にYahoo!メールアドレス宛にご案内を行ってい
ます。配信停止のご連絡をいただく際は、Yahoo!メールアドレスを差出
人としてメールをご返信下さい。そうでない場合、停止処理が正しく行
えませんのでご注意下さい。
└──────────────────────────────┘

=====[PR] |Ψ| 毎日出会ってる？　http://www.3313ac.com/  |Ψ|  [PR]=====

　┌────────────────────┐
　｜２００７年３月の最新キャンペーン情報！！｜
　｜　　　！！最新３月２日更新版！！　　　　｜
　└┐！！無料出会い紹介サイト大量更新！！┌┘
　　└──────────────────┘

=====[PR] 　やっぱり巨乳がイイ！ http://www.1132ac.net/　 [PR]=====

☆３月２日
最新情報、新しいサイトをたちあげました！！！

・人妻、セフレ
・完全無料
にこだわって集めたサイト。


[今日子]31歳/短距離よりもマラソンのような、長時間のセッ○スが好きです。
[美　奈]34歳/希望は正直肉体的なことなのでそういうのがだめって人はごめんなさい。 






われながら、会心の一撃です。

セフレの殿堂　 http://www.3313ac.net/

SEX & Friends　http://www.3313ac.net/sex/



☆２月２８日

大反響のおかげで、このメルマガに
ご近所出会いの大手定番サイトからも紹介の依頼が来るようになりました。

１☆一度は聞いたことのある有名サイト
２☆大手だからこそのの安心感
３☆人妻・ＯＬ・女子大生、さらに地域を選べるから、

手軽に「ご近所」
安全に「隣りの県」

などなど自由に選択可能。無料で出会えるサイトを厳選、大紹介！！


読者のみなさまへの感謝のシルシとして・・・

　　　　　　
→→→→→　毎日出会ってる？　http://www.3313ac.com/



=====[PR] 　無料出会いの王道　　http://www.1234av.com/　 [PR]=====

カンゼンムリョウ　　　　ゼッタイ　デアエル
|完｜全｜無｜料|だから、|絶｜対|に出会える！

セフレの殿堂
http://www.3313ac.net/
参加している女性：　あや(25歳)、　真由美(25歳)、　詩織(29歳)

やっぱり巨乳がイイ！
http://www.1132ac.net/
参加している女性：　志保(25歳)、　祥子(24歳)、　ユリ(23歳)


=====[PR]  今日からモテモテ http://www.8848ac.net/    　　[PR]=====


カンゼンムリョウ　　　　ゼッタイ　デアエル
|完｜全｜無｜料|だから、|絶｜対|に出会える！


無料出会いの王道
http://www.1234av.com/
参加している女性：　里子(23歳)、　結衣(21歳)、　友美(24歳)

今日からモテモテ
http://www.8848ac.net/
参加している女性：　あゆみ(32歳)、　明美(29歳)、　葵(20歳)

[Filterproxy-devel] =?iso-2022-jp?b?GyRCRjhEZyRPR2MkbyRsJGsbKEI=?= =?iso-2022-jp?b?GyRCMFkkSyQiJGskcyRHJDkbKEI=?=

From: <mmk...@ya...> - 2007-02-21 13:37:59

怖がってたら何も始まりません…
当コミュニティーでは童貞の男性に興味をもたれている女性をターゲットに運営させていただいていますのでやり取りもスムーズ…怖がる事はありません。
http://friend-pop.com/1070
ここで童貞だった男性の体験談を少し紹介させていただきます。
「最初は疑ってたし、この年まで童貞だからずっとセックスなんてできないと思ってたけど…まさかタダで捨てれるとは思わなかった…それに…（以下省略）」
↓続きが気になる方はこちらへ↓
http://friend-pop.com/1070
もちろん童貞卒業後もお付き合いが続いているカップルがほとんどです。
人妻から独身女性まで…
幅広い層から貴方の希望女性をお探しください☆
http://friend-pop.com/1070
当コミュニティーは登録完全無料です。



配信拒否の方はこちら
re...@ok...

[Filterproxy-devel] =?iso-2022-jp?b?GyRCTDVOQT1QMnEkJEY7STgbKEI=?=

From: <ET...@HO...> - 2006-10-26 00:09:37

肌寒くなってきましたね。人肌恋しいですね。
友達探しの決定版特集！
http://o-oooo-o.com/ccc/

[Filterproxy-devel] =?iso-2022-jp?b?GyRCJCo1Uk1NJFgbKEI=?=

From: <FB...@HO...> - 2006-10-14 22:51:12

こんにちは。
http://livein21th.com/ze
いくぜくるぜからのお知らせです。
コミュニティを広げるきっかけ作りになる情報をお届けします。

[Filterproxy-devel] =?iso-2022-jp?B?GyRCIXk9dyROO1IkTjBsSFZCZ0BaJEokYiROIXkbKEI=?=

From: <ao...@ho...> - 2006-03-23 19:11:48

研究熱心なギャルが増加中！
http://j-movie.org/fresh/
問）
xi...@wa...

[Filterproxy-devel] =?iso-2022-jp?B?GyRCNElNfT8mJE4zJ01NIVo2WzVeRHMwRiFbGyhC?=

From: <sa...@ya...> - 2006-03-14 08:15:25

御社様のOLさんは大丈夫でしょうか？御確認下さい。
【newWEBnews最新号】
http://fragrantofsocks.com/kokuchi/

問い合わせ
da...@co...

[Filterproxy-devel] =?iso-2022-jp?B?GyRCRkMlQCVNOkc/NzlmGyhC?=

From: <sa...@ya...> - 2006-03-13 15:31:07

最近静かなブームとなっている
"逆援助交際"その真実とは？
http://staywithyou.be/ccc/
問）
faf...@12...

[Filterproxy-devel] =?ISO-2022-JP?B?GyRCJEskYyRVJHMidiQiJGokLCRIISMbKEI=?=

From: <in...@xg...> - 2006-02-08 19:51:53

$B$I$J$?$G$9$+(B(($B%N"O!@(B*)$B!)!)(B

$B$=$A$i$+$i%a%'%k$,$-$^$T$?$,K\J8$,$"$j$^$;$L(B(*$B%N&X%N(B)

$BCK@-$N$+$?$G$9$+!)(B

$B;d$OI1M}!!$H8@$$$^$9#o$I$3$N?M$@$m$&#o#o(B

$B$A$g$C$H$@$15$$K$J$k$N$GJV;v$^$C$F$_$h$C$H#o(B

http://xgcom.info/lvc/













15

$B3P$($,$J$$?M$O(B

in...@ok...

[Filterproxy-devel] Re: transfer-encoding buglet

From: Gisle A. <gi...@Ac...> - 2003-10-15 19:25:54

Bob McElrath <bob...@mc...> writes:

> 'http_connection' should be set by HTTP::Headers, or is there a better
> way to get this header in Methods.pm?  (The previous line checked for
> 'http_te' so i just followed suit...)

There is no HTTP::Headers involved at the Net:HTTP level.

> Nonetheless I think the response I am seeing (after proxy munging) is
> valid so shouldn't generate an error.

I agree with that and I will apply a patch that works and comes with a
test that demonstrates that it works.

Regards,
Gisle

[Filterproxy-devel] Re: transfer-encoding buglet

From: Bob M. <bob...@mc...> - 2003-10-14 20:28:29

Gisle Aas [gi...@Ac...] wrote:
> Bob McElrath <bob...@mc...> writes:
>=20
> > Gisle Aas [gi...@Ac...] wrote:
> > > Bob McElrath <bob...@mc...> writes:
> > >=20
> > > > This version doesn't cause a warning if the "Connection" header isn=
't
> > > > present.
> > >=20
> > > Perhaps this warnings shows becase nothing sets 'http_connection'. =
=20
> >=20
> > Of course.  The warning was a bug in my original patch.  The patch is
> > trying to fix the transfer-encoding bug indicated below.
>=20
> In the source I have there is nothing at all that set or reference
> 'http_connection' that your patch test for.  What version of LWP are
> you patching against?

That is why I checked if it was defined first.  The spec says that not
having a Transfer-Encoding of 'chunked' last is o.k. if there is also a
Connection: close.

'http_connection' should be set by HTTP::Headers, or is there a better
way to get this header in Methods.pm?  (The previous line checked for
'http_te' so i just followed suit...)

> Without the patch I get:
>=20
>     [gisle@eik lwp5]$ lwp-request -USe http://larve.net/people/hugo/2000/=
07/ml-mutt | head -30
>     GET http://larve.net/people/hugo/2000/07/ml-mutt
>     User-Agent: lwp-request/2.01
>    =20
>     GET http://larve.net/people/hugo/2000/07/ml-mutt --> 200 OK
>     Cache-Control: max-age=3D21600
>     Connection: close
>     Date: Tue, 14 Oct 2003 16:09:12 GMT
>     ETag: "8buij7:ub1p6un0"
>     Server: Jigsaw/2.2-20010823 jre/1.2.2_009 javacomp/1.2.15
>     ...
>=20
> The TCP dump shows this conversation:
>=20
>     GET /people/hugo/2000/07/ml-mutt HTTP/1.1
>     TE: deflate,gzip;q=3D0.3
>     Connection: TE, close
>     Host: larve.net
>     User-Agent: lwp-request/2.01
>    =20
>     HTTP/1.1 200 OK
>     Cache-Control: max-age=3D21600
>     Connection: close
>     Date: Tue, 14 Oct 2003 16:09:12 GMT
>     Transfer-Encoding: deflate,chunked
>     Opt: "http://www.w3.org/2000/P3Pv1";ns=3D11
>     Content-Location: http://larve.net/people/hugo/2000/07/ml-mutt.html
>     Content-Type: text/html;charset=3Dus-ascii
>     Etag: "8buij7:ub1p6un0"
>     Expires: Tue, 14 Oct 2003 22:09:12 GMT
>     Last-Modified: Thu, 16 Jan 2003 04:11:55 GMT
>     Server: Jigsaw/2.2-20010823 jre/1.2.2_009 javacomp/1.2.15
>     11-PolicyRef: /2000/08/p3p-policyref
>    =20
>     400
>     [binary data]
>=20
> Are you perhaps talking via a proxy server of some kind?

Ah...there is a 'transparent' caching proxy server for the campus.  Not
too transparent, eh?

=46rom other hosts not behind the UW proxy I see the same behavior you
see.  LWP 5.65 - 5.69 on debian and redhat.

Nonetheless I think the response I am seeing (after proxy munging) is
valid so shouldn't generate an error.

Cheers,
Bob McElrath [Univ. of California at Davis, Department of Physics]

    "Knowledge will forever govern ignorance, and a people who mean to
    be their own governors, must arm themselves with the power knowledge
    gives. A popular government without popular information or the means
    of acquiring it, is but a prologue to a farce or a tragedy or
    perhaps both."
        - James Madison

[Filterproxy-devel] Re: FilterProxy

From: Bob M. <mce...@dr...> - 2003-04-23 18:22:00

dar santos [ju...@ho...] wrote:
> Yes. Got it running. It seems so weird that I rather not spend my time=20
> tracing the proble. I reinstalled woody and used a diffrent http source t=
o=20
> update apt-get then there it was. Maybe the initial source I got had some=
=20
> un-updated packages. Well anyway. Congrats. Its Impressive and fast knowi=
ng=20
> that its based on perl. Its performance is comparable to any C based=20
> apllications.Greatwork. Now i can start finding what I had initially=20
> planned of porting it to win32.=20

I'm glad you like it.  :)

> Anyway, I noticed the Imagecomp(based on=20
> imagemagick convert I presume). If Im not mistaken Its activation is not =
by=20
> default and the user should be the one to make it function.

That module was contributed without a config page, and I have never
really used it so it is kind of decaying...  I wrote a config page for
it though and put it in CVS.  It will be in the next release, when I get
around to it...

> I dont know if you would be interested , correct me if Im wrong, the whol=
e=20
> idea of http compression (text/html, xml etc) like the implementation of=
=20
> mod-gzip would be less significant to those using the dialup. Mainly mode=
ms=20
> have hardware compression and there is also software base(Stac,LZS,MPPC).=
=20
> To compress an already compressed content(gzip encoded)would be useless i=
f=20
> not add overhead to the browsing process. I was able to come accross some=
=20
> datas wherein instead of gzip compression html and others are parsed or=
=20
> rewritten(I dont know if this should be the right term I use). I tested i=
t=20
> several times and its really impressive that size reduction is up to 40-5=
0%=20
> . And still the output is plain html(not compressed.) If you would find=
=20
> interest in these I would gladly lookup again the datas and send them.

On the contrary, the speedup over a modem is simply astounding.  I don't
fully understand why, but it is visably faster (by my measurements, 5
times faster or more).  You have to have FilterProxy running on a
fast-connected server so that it can feed compressed stuff over the
modem.  Just try it.  ;)

I think gzip is a more efficent algorithm for compressing text than any
used by a modem (typical compression ratios for HTML are 5x to 10x).
Not only that but modems suffer from latency.  It can take 300ms to
fetch a 0 byte file from a server.  Image-heavy pages are the pits over
a modem.  By removing ads, FilterProxy typically reduces the number of
connections your browser needs to make to render the page, thereby
speeding it up significantly.

It is possible to also parse HTML and rewrite it to be smaller.  The
typical HTML file contains a lot of whitespace, comments, etc that can
be removed without changing the appearance of the page.  However,
parsing HTML is extremely CPU intensive.  In my tests it would take
several seconds to do this on a modern CPU.  There are other tools out
there that do this (even perl modules).  If you are interested in
pursuing this, I would definitely accept such a module, but I think it
would be slow.

The slowness of parsing HTML is why I chose a regex-based method to
strip ads.  If I used a full HTML parser (like the perl module
HTML::Parser) it would be extremely slow.

> Thanks very much and Ill write as soon as I can manage to make your progr=
am=20
> run on win32 or win64 that is.

Great!  ;)

Cheers,
Bob McElrath [Univ. of Wisconsin at Madison, Department of Physics]

    "You measure democracy by the freedom it gives its dissidents, not the
    freedom it gives its assimilated conformists." -- Abbie Hoffman

[Filterproxy-devel] Re: FP0.30

From: Bob M. <bob...@mc...> - 2003-02-07 15:50:19

R.D. Hammond [mu...@da...] wrote:
>=20
> Im a little confused by your reply. Do you advocate a newer than 0.3
> version of FP? or an older vesion of HTML::Mason ?
>=20
> A cvs of what? from where ? debian packages arnt much good
> outside of debian afaik. Im on nBSD 1.6.1 for m68k and i386.
>=20
> http://filterproxy.sourceforge.net/ should be updated if 0.3 isnt current.

I just checked my FP dir and found a pile of changes over the past year
that I just committed.  (Look in ChangeLog) You may want to grab the
latest CVS and let me know if you have trouble.  (Also try the
'configure' script I wrote)

I think I may have time to make a release this weekend or next.

The reason I haven't made a release in a long time is that there have
been complaints that upgrading is a bitch due to merging my conf file
with the user's on upgrade.  I want to fix this, any ideas?  Anyone want
to work on it?  ;)

Cheers,
Bob McElrath [Univ. of Wisconsin at Madison, Department of Physics]

    "You measure democracy by the freedom it gives its dissidents, not the
    freedom it gives its assimilated conformists." -- Abbie Hoffman

[Filterproxy-devel] Re: Hello

From: Bob M. <mce...@dr...> - 2002-10-21 23:08:16

Piotr Duszy?ski [do...@wp...] wrote:
> Hello!
>=20
> I've tried to use your FilterProxy progs but there's a new HTML-Mason
> module in CPAN and there's some message:
>=20
> Can't locate object method "new" via package "HTML::Mason::Parser" (perha=
ps you forgot to load "HTML::Mason::Parser"?) at ./FilterProxy.pl line 203.
>=20
> Can you fix this problem?

Fixed already, please grab FilterProxy from CVS or from the Debian
package page:
    http://ftp.debian.org/debian/pool/main/f/filterproxy/filterproxy_0.30-4=
.tar.gz

until I get off my butt and release a new version.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

    "The surest way to corrupt a youth is to instruct him to hold in higher
    esteem those who think alike than those who think differently."=20
        -- Nietzsche

[Filterproxy-devel] Re: Problems getting FilterProxy running

From: Bob M. <mce...@dr...> - 2002-09-27 03:16:44

Glenn [gl...@so...] wrote:
> Hello, I was wondering if perhaps you have a few minutes to help me get
> FilterProxy working. If not, don't worry, I'm sure you're busy!=20
>=20
> After downloading FilterProxy, I installed the necessary packages
> (libxml, libxslt) from my Mandrake CD's, then started up the CPAN shell
> to install the Perl modules. After some hair pulling, I *think* I have
> everything installed correctly, although HTML::Mason has been giving me
> fits. Apparantly it has some dependencies that it doesn't report, but at
> this point, I can run "test HTML::Mason" from the CPAN shell, and it
> reports "OK" at the end. However, when I try running FilterProxy.pl,
> this is what I get:

If you're feeling adventurous, grab the 'configure' script out of CVS.
It should install any necessary dependencies for you when run.

> [glenn@rainwalker FilterProxy-0.30]$ ./FilterProxy.pl
> Loaded module: Compress
> Loaded module: DeAnim
> Loaded module: Header
> Module ImageComp not loaded because:
>         Can't locate Image/Magick.pm in @INC (@INC contains: .
> /usr/lib/perl5/5.6.1/i386-linux /usr/lib/perl5/5.6.1
> /usr/lib/perl5/site_perl/5.6.1/i386-linux /usr/lib/perl5/site_perl/5.6.1
> /usr/lib/perl5/site_perl /home/glenn/FilterProxy-0.30/FilterProxy) at
> /home/glenn/FilterProxy-0.30/FilterProxy/ImageComp.pm line 9.
>         BEGIN failed--compilation aborted at
> /home/glenn/FilterProxy-0.30/FilterProxy/ImageComp.pm line 9.
>         Compilation failed in require at (eval 30) line 1.
>         BEGIN failed--compilation aborted at (eval 30) line 1.Loaded
> module: Rewrite
> Loaded module: Skeleton
> Loaded module: Source
> Module XSLT not loaded because:
>         Can't locate XML/LibXSLT.pm in @INC (@INC contains: .
> /usr/lib/perl5/5.6.1/i386-linux /usr/lib/perl5/5.6.1
> /usr/lib/perl5/site_perl/5.6.1/i386-linux /usr/lib/perl5/site_perl/5.6.1
> /usr/lib/perl5/site_perl /home/glenn/FilterProxy-0.30/FilterProxy) at
> /home/glenn/FilterProxy-0.30/FilterProxy/XSLT.pm line 27.
>         BEGIN failed--compilation aborted at
> /home/glenn/FilterProxy-0.30/FilterProxy/XSLT.pm line 27.
>         Compilation failed in require at (eval 34) line 1.
> Can't locate object method "new" via package "HTML::Mason::Parser"
> (perhaps you forgot to load "HTML::Mason::Parser"?) at ./FilterProxy.pl
> line 203.
>         BEGIN failed--compilation aborted at (eval 34) line
> 1.[glenn@rainwalker [glenn@rainwalker FilterProxy-0.30]$
>=20
> I didn't install the ImageMagick module on purpose (plus it fails when I
> try), and the XML::LibXML and XML::LibXSLT modules also fail when I
> tried to install them.=20

The above errors relating to ImageMagick and XML... are not fatal, and
FilterProxy will still run.  (I made it not print those messages unless
debug is on for the next version -- everyone thinks something is
horribly wrong when they see them)

> Do you have any advice for me to try? I'm pretty much out of ideas,
> which is unfortunate, because I'd really like to give FilterProxy a try.
> Anyway, thanks for your time-

I have made the fixes to HTML::Mason (they changed some interfaces out
from under me).  You can either get the code out of CVS, or grab the
debian tarball:
    http://ftp.debian.org/debian/pool/main/f/filterproxy/filterproxy_0.30-4=
.tar.gz
which has this fix.

I'll try to release a new version soon.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

    "The purpose of separation of church and state is to keep forever from
    these shores the ceaseless strife that has soaked the soil of Europe in
    blood for centuries." -- James Madison

[Filterproxy-devel] Re: filterproxy 0.29 suggestions

From: Bob M. <mce...@dr...> - 2002-08-27 16:29:42

Pilaszy Istvan [pi...@hs...] wrote:
>=20
> Hi!
>=20
> I'm using FilterProxy 0.29, and I'm very statisfied with it, and I think
> it has very good source code, and it is very well developed.
> I use it with wwwoffle, and I have same suggestions, which You might
> apply to filterproxy, if You like it.

Sure!  BTW the latest version is 0.30, released in January.  0.31 is
coming Real Soon Now...

BTW I just looked at the wwwoffle page...it does some interesting
things.  It would be interesting to incorporate wwwoffle's functionality
into FilterProxy.  ;)

> 1.
> I want to redirect some URL to another, and I want to use some alias too.
> (wwwoffle can solve redirection, but only with fix strings, I
> can't use regexps)

I have thought about this too...from my TODO file:

Mapper module:
  works at orders 10,-10 and re-writes url's.  for instance: Get
  "printer-friendly" version of articles at various news sites.  Block
  requests to known advertiser's domains that may have slipped through
  Rewrite/BlockBanner.  (variables useful here) Example:
  http://www.byte.com/column/(\w+) -> http://www.byte.com/printableArticle?=
doc_id=3D$1

> For example:
> I want to redirect http://i/ to http://info.sch.bme.hu/
> I want to alias http://*/robots.txt to http://127.0.0.1:8888/robots.txt.

Are you using a robots.txt file with FilterProxy?  Why?  Have spiders
found your FilterProxy?  A robots.txt file might be something good to
include with the FilterProxy distribution, for those that don't run with
'localhostonly'.

> redirect: the proxy responses 302, and the right location, and
> then netscape tries to get the new url.
> alias: no redirection, netscape sees as if http://*/robots.txt existed,
> and does not know, that it receives a local file.

I think it is best to always use redirection, so that netscape always
sees the actual location of the file.  http://*/ is not a valid URL as
specified in the RFC, and might get you into trouble.

> I made a solution for redirect, but it needed to make some modification on
> FilterProxy.pl.
>=20
> I introduced $CONFIG->{order} =3D [ 0 ]; and I wrote a new modul: Alias.
>=20
> Alias::filter compares the requested URL to some strings, and if it is ne=
eded,
> it rewrites the URL, and if it modificated the URL, then it returns(!)
> a new HTTP::Response object. The handle_filtering method realize,
> that it(the handle_filtering method) gave a HTTP::Request object,
> and it(the handle_filtering method) received a HTTP::Response object,
> and it(the handle_filtering method) returns this object.
> The caller compares this to the original $req object, ...
> This call is before
>     $res =3D $agent->request($req, \&data_handler);
>=20
> I made here a branch:
>=20
> # Send the request.
> my $res;
> my($req_new)=3D&handle_filtering($req, 0);
> if($req_new =3D=3D $req) {
>   $res =3D $agent->request($req, \&data_handler);
> } else {
>   $res=3D$req_new;
> }

You've got the right idea but this will fail if a filter modified $req
(the Header module, for instance).

> I made this ugly hack, because I can't solve in any other way
> to replace a HTTP::Request object with a HTTP::Response object
> in a module. (With pointers to $req, it would be more simple).

I think redirects should be done at the -10 order, and a way to check
is:
    if($req_new =3D~ /HTTP::Request/) {
        # ... do request stuff... (FilterProxy.pl 0.30 lines 636..696
    } elsif($req_new =3D~ /HTTP::Response/) {
        # our request was changed to a response.  Send it straight to
        # the client
        #...execute the code starting after line 696
    }

> (Really I made 2 modules, because I could not implement properly the s///
> operation, and the first module was not able to make the
> http://*/robots.txt to http://127.0.0.1:8888/robots.txt.
> The second module has one parameter, and executes this as a perl-program,
> and this perl-program can substitute the strings.)
>=20
> If You would like to see the modules and the .HTML files (for configurati=
on),
> I will post it.

Yes, please.

> May $CONFIG->{order} =3D [ 0 ] be used for internal response ( ie. no out=
er
> proxy or server will be asked to give the reply) ?

I don't think I've used order 0 for anything yet...

But it seems to me that redirects should occur first thing (order -10)
and filters orders -9..-1 will never see the request, since it got
redirected anyway.  Any modifications that filters -9..-1 would make
will be lost anyway once it gets redirected.  Filters orders 1..10
should see the redirect response generated by your module.

> 2. Rewrite.pm
> I needed the 's///' regex for rewrite, and I realized, that it is very ea=
sy to
> implement it:
> I choose the area with the mathers etc., and then I apply the regex
> to this area. I introduced a new keyword 'apply_regex' instead of the 'as=
'.
> Here are the modifications:
>=20
> (for filterproxy 0.29)
>             ...
>             } elsif(($operation eq "rewrite") and ($keyword eq "as"||
>               $keyword eq "apply_regex")) {
>                last;
>             } else {
>             ...
>=20
>           } elsif($operation eq 'rewrite') {
>             if($filter =3D~ /\G\s*(.*)$/g) {
>               my($replacement) =3D $1;
>               if($FilterProxy::CONFIG->{debug}) {
>                 logger(DEBUG, "  Rewrite rewriting by rule $key: '",
>                     substr($$content_ref, $start, $end-$start),
>                     "' ".$keyword." '", # !!!
>                     $replacement, "'\n");
>               }
>               $nsuccess++;
>               if($keyword eq "apply_regex") { # !!!
>                 my($what)=3Dsubstr($$content_ref,$start,$end-$start);
>                 eval "\$what =3D~ $replacement";
>                 substr($$content_ref, $start, $end-$start) =3D $what;
>                 pos($$content_ref) =3D $start+length($what);
>               } else {
>                 substr($$content_ref, $start, $end-$start) =3D $replaceme=
nt;
>                 pos($$content_ref) =3D $start+length($replacement);
>               }
>             } else {
>             ...
>=20
> It's insecure (because of the eval), I know , but I think it is very usef=
ul
> for making tricky modifications, and with some check, it can be made
> secure.

This looks good to me.  Do you think you could generate a diff for the
above changes?
    diff FilterProxy-0.29/Rewrite.pm FilterProxy/Rewrite.pm

I think this could also be adapted to implement a feature requested on
the sourceforge site: to have the submatch variables $1 $2, ... work
inside rules.

Could you send me some example rules you wrote using apply_regex?  I
wonder if it wouldn't be better to add this functionality to the
existing 'regex' finder:

    rewrite tag <font size=3D-1> regex s/-\d+/0/

> My goal is to make it possible, that every image has a link,
> and when I click on this link, it will disables loading that image foreve=
r.
> And when I have a long list of images to be disabled, I will be able to w=
rite
> general rules, which images to disable.

What about images that are already a link?  I wonder if you could use
some javascript magic like onMouseOver and onClick to send a message to
FilterProxy, in a way that wouldn't interfere with the web page's normal
operation.

What do you have against images?  ;)

Your ideas look good.  If you want to generate diffs against 0.30, we
can further refine them and I will include them in the next release.

Also you may want to join the filterproxy-devel list.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

"No nation could preserve its freedom in the midst of continual warfare."
    --James Madison, April 20, 1795

[Filterproxy-devel] Re: status of FilterProxy?

From: Bob M. <mce...@dr...> - 2002-08-15 20:52:39

Attachments: IEfix.patch

Reinier Post [rp...@wi...] wrote:
> On Thu, Aug 15, 2002 at 10:01:51AM -0500, Bob McElrath wrote:
> 
> > It sounds like you want XSLT.  FilterProxy has an XSLT module.
> 
> Yes, I think I'll use XSLT, but I want more.
> 
> > > So far all I've done is install and start FilterProxy.  The configuration
> > > screens more or less appear to work, but I haven't discovered how to actually
> > > filter something yet.
> > 
> > Well just point your browser at the proxy and everything that goes
> > through it is filtered... ;)
> 
> IE 6 hangs on me on every document I tried after the first few.
> Mozilla has no problem.  I would like to use this thing in reverse proxy
> mode though but it's probably best to put Squid or the Apache proxy in front of it.

This was just reported to me a few days ago.

Attached is a patch that may fix it.  Please apply it to 0.30 and let me
know the results.  Warnings may appear in the log file.  If you can,
please send them to me, it will help debug the problem.  (I don't use
any MS stuff...)

The guy that reported it sent me some debug logs.  At this point it
looks like IE is making a request and then very quickly closing the
connection, which isn't nice.

> > I've tried to self-document the Skeleton module to make it easier for
> > people writing new modules.  You should take a look at that if you want
> > to write modules.  But it sounds like using the existing XSLT module
> > might be more what you're looking for.
> 
> Except that XSLT is hairy ...

Yeah.  That's definitely true.  ;)

Cheers,
-- Bob

Bob McElrath (rsm...@st...) 
Univ. of Wisconsin at Madison, Department of Physics

"No nation could preserve its freedom in the midst of continual warfare."
    --James Madison, April 20, 1795

[Filterproxy-devel] Re: FilterProxy and HTML::Mason::Parser

From: Bob M. <mce...@dr...> - 2002-08-03 16:02:46

Adam Duck [du...@in...] wrote:
> Hello.
>=20
> Oh well, I was using FilterProxy all the time under Linux and have now
> switched to FreeBSD. I have perl 5.8.0 installed and grabbed
> HTML::Mason from the ports collection (p5-HTML-Mason-1.12.1) - but
> CPAN has the same problems, I installed that, too afterwards
> (bsdpan-HTML-Mason-1.1201). Anyway, FilterProxy says:
>=20
> Can't locate object method "new" via package "HTML::Mason::Parser"
> (perhaps you forgot to load "HTML::Mason::Parser"?) at
> ./FilterProxy.pl line 203.
>=20
> which is correct because Mason/Parser.pm says:
>=20
> die "The Parser module is no longer a part of HTML::Mason.  Please see ".
>     "the Lexer and Compiler modules, its replacements.\n";
>=20
> So, where do I go from here?

Yeah, I know.  The Mason people changed their interface (grr).  I
haven't had a chance to look at it yet.

If you do have a chance to look at it, I'll happily accept a patch.  ;)
It might be as simple as s/Parser/Lexer/g.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

"No nation could preserve its freedom in the midst of continual warfare."
    --James Madison, April 20, 1795

[Filterproxy-devel] Re: .net Advice

From: Bob M. <mce...@dr...> - 2002-07-31 17:08:48

dotson [do...@re...] wrote:
> Bob,
> I've run the proxy on my Unix build machine.
> Next I am going to attempt to install it under Cygwin
> and then under XP itself. Can you point me any info regarding
> these installs?

Sorry, I know next to nothing about Windoze.  I did receive a report a
long time ago that someone had gotten it working under windows, but more
recently someone mentioned that it just doesn't work.  :(  I don't know.
Please let me know your success/failure.

> Have you turned your proxy into a web service. I did something
> like this years ago with PICS for a company called NetShepherd. We used P=
ICS
> definitions to send rules to the server which then expanded selections ba=
sed
> on rules. It wasn't about censorship (which is what everybody in the press
> thought--we weren't netNanny) but trying to build intelligent communities=
 of
> information based on filtered rules.

There was a company that tried to turn FP into a service. (kplab.com,
they called the service "linearC")   They have gone under however, part
of the internet boom.  They wrote new modules too that I haven't seen.
Unfortunately since they only operated a service and never distributed a
product, they aren't obligated to release their code.  :(  There are
others though that run spam-filtering services:

http://www.eclipse.net/adfilter/        (junkbuster)  free, even.

(ok that's the only one I found)

> Anyway, I've been screen scraping and annotating documents with xml/xslt =
and
> have thought about some type of proxy filter to speed things up (although
> I'm not convinced that it will).

So what exactly are you doing with xslt?

If you're rewriting/stripping ads and such, I'd be interested to see
your stylesheets.  While the XSLT module was implemented, I don't have
much in the way of stylesheets.

I hope in the next version to implement a central server, so when you
write rules you can choose to upload them to the mothership.  You can
then also go to sourceforge and browse other people's rules, and
download them into filterproxy.  This may go toward your "community"
idea.  Right now it's too much trouble to copy rules in and out of
filterproxy and mail them to the -devel or -users list.  Practically,
nobody sends me rules.  :(

So do you want to give me a brief spiel on what PICS is and why it might
be useful for FilterProxy?  I looked at w3.org/PICS briefly, but I don't
immediately see how it could relate to ad-filtering.  (It seems like it
would duplicate my URL regex functionality)

If you're interested in any of the above ideas and know perl, I could
use some help.  ;)  I'm pretty busy these days.  :(  You know, that
whole graduating thing.  Trouble.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

"No nation could preserve its freedom in the midst of continual warfare."
    --James Madison, April 20, 1795

[Filterproxy-devel] Re: stupid questions concerning filterproxy

From: Bob M. <mce...@dr...> - 2002-07-16 23:21:09

Andreas Banze [an...@ba...] wrote:
> I'm neither a perl nor a http-crack, so I'm mainly searching for your
> help/advice:
>=20
> I need a proxy that allows me to check binary data for viruses. It seems
> that filterproxy is the right toy to do it.
>=20
> Problem: Because I'm not a http crack (and I didn't find it it the docs I
> read up to now): The mime type _should_ represent the correct description
> for the content, right? So if the content is identified by the mimeheader=
 as
> text/html it should not be usable as binary when downloaded?

Well remember that if you're worried about downloading malicious code
from a malicious server, the mime-types can be set to anything by the
server's operator.  Heck, they could be "cartoon/mickeymouse".

I presume you're also worried about stupid users, who may download
something, save it to disk, and manually execute it.  Then it doesn't
matter what the mime-type was.  By the time it's saved to disk the
mime-type is lost.

BTW identifying a windoze executable is easy.  See the 'file' command
under any good unix.  A FilterProxy module could check for the signature
of an executable first, and pass everything through that is not
executable.  (should be fast)

> Or is the mimetype only a hint that may be overridden by the extension of
> the file (probably it'll not work in the browser but with "save as" the f=
ile
> is usable)?

No browser should pay attention to the extension of the file.  This is a
bad, bad violation of the HTTP spec, and such browsers should be burned
at the stake.  There is NO INFORMATION in the URL about the file type.
That's what the Content-Type and Content-Encoding headers are for.

I believe IE uses extensions.  Not sure about Netscape 4.x but I
wouldn't be surprised.  (But we're all using mozilla anyway,
right?!?!?!?!  ;)

> If the first is correct than writing a filter for filterproxy to scan all
> binaries for viruses would be correct. If the second is correct I would n=
eed
> to add overhead by checking the filetype of every transferred file.
> (Overhead should be minimal but unfortunately I have many users).

You could write a filter to scan binaries for viruses.  This would be
pretty simple.  Don't trust the mime-type, filter ALL data that comes
through.  (I think one famous IE hack is a mime-type text/html with an
extension .exe -- IE stupidly executes it)  But I've never really used
windows, I'm no expert on virii.

The hard part is maintaining a database of signatures for virus
binaries.  Do you have a ready source for such a thing?  (but see below)

> I didn't dig too much in your sources, but I think it is possible to
> exchange the mimeheader and the content while the proxy works (e.g. to
> exchange the binary header and the binary file with a webpage that states
> you've got a virus).

This should be no problem.  Change the response to some HTML, or maybe
send a redirect.

> So for the second question: How good is filterproxy in the means of load =
and
> scalability? Is it mature enough to be used in such a way or should I stop
> draeming and get viruswall or another expensive content filtering system?

I don't know.  As far as HTML filtering, I would not recommend
FilterProxy for large deployment.  Filtering HTML is a CPU intensive
task (consider that it can take your browser ~seconds to load a complex
page).  The majority of the load comes from parsing HTML.  On my
computer FilterProxy takes ~0.05 seconds on most pages.  So you do the
math for your situation, I think 20 users would be a practical maximum
on this machine.  (That's on my 500MHz alpha -- recent AMD/Intel PC's
are *much* faster) 99.9% of the time it takes is parsing HTML.

Now, just checking for virus signatures could be much faster than this.
But you should benchmark the signature-checking code to evaluate if
FilterProxy would be fast enough.  The HTTP proxy portion of FilterProxy
is very fast, but I have never benchmarked it with high load.  Maybe
turn off all filtering, and use apache's dbench or something to test it.
(If you do any testing, please copy me the results ;)

> thanks in advance for any (even short) anwer (sorry for not using the
> mailinglist but I already listed on too many of them and for one question
> it's a little bit to much of an effort - I promise that I'll subscribe for
> further questions.

No biggie.  ;)  I'm the only one that sends stuff to that list anyway.

Here's something a web search turned up:
    http://www.amavis.org/
It's a virus/mail scanner in perl.  Shouldn't be hard to write a
FilterProxy module that uses it.  From a FilterProxy module, you can get
the entire document in a $scalar.

If you want to write such a module, take a look at html/Skeleton.html
and FilterProxy/Skeleton.pm, which are heavily commented and should give
you a starting point.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

[Filterproxy-devel] Re: filterproxy bugs

From: Bob M. <mce...@dr...> - 2002-07-12 17:27:52

klas jarnefelt [bl...@dj...] wrote:
> i got the old version working again with some help from charon, though
> i'm still not exactly sure what the problem was.

Great!  My debian machine is far away right now, and I just realized
I've never actually done an apt-get install filterproxy.  I tried it,
and it failed.  :(  I'll deal with it when I get home in a month.

> the error with ultimatemetal.com+compress module seems to be opera
> only.  is there any way i can make it work with the compress module
> turned off?

Can you send me some log snippets or something?  It might be useful to
go into the Header module and turn on dumping headers to the log file,
and send me that.  The Compress module is enabled, and configured to
filter .* on the main page, right?

To get it to work without the Compress module, you have to configure
your browser not to send the
    Accept-Encoding: gzip
header, which just about all browsers send.  Alternatively you could
delete this header using FilterProxy's Header module.

If the server sees that header (and is configured to send compressed
content), it will send compressed content back to you.

> thanks a bunch for helping. =3D)

No prob.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

[Filterproxy-devel] Re: filterproxy bugs

From: Bob M. <mce...@dr...> - 2002-07-11 18:56:58

klas jarnefelt [bl...@dj...] wrote:
> so i installed filterproxy and all dependencies from debian sid, tried
> it out on a few pages and it seemed to be working. then i tried adding
> some filtering rules for ultimatemetal.com, but they wouldn't work.
> using the "edit filtering for this page" bookmark only showed garbage
> for source, and checking the process list i had a defunct filterproxy
> session that wouldn't die until i killed the other one. (note that
> filtering other pages than um.com still worked while the process was
> defunct)

Defunct processes are normal.  They are not a problem.  They are reaped
the next time filterproxy sees any activity.  (i.e. you load another
page)  I initially had written filterproxy to reap these processes
immediately, but due to the non-re-entrancy of signal handlers in linux
and perl, this could cause crashes.  So it's defunct processes or
crashes.  ;)

The 'edit filtering' bookmark seems to work fine for me on this site.
(for the URL http://www.ultimatemetal.com/forum/) I'm using FilterProxy
0.30.

If you get garbage, it may be because the Compress module isn't enabled.
Chances are ultimatemetal.com is sending gzipped pages, and if Compress
isn't enabled then FP can't decompress them.  This also means that the
pages won't be filtered!  This is a common problem.  I've updated some
of the docs to make this more clear.

> so i went back to reading your page, found some info about modules from
> cpan and figured installing those might solve my problem, but after
> installing a load of modules and their dependencies filterproxy wouldn't
> start at all:
>=20
> Can't locate object method "new" via package "HTML::Mason::Parser"
> (perhaps you forgot to load "HTML::Mason::Parser"?) at
> /usr/sbin/FilterProxy line 196.

Ack!  They've made Mason incompatible with filterproxy!  Well the simple
solution is to fall back to the older version.  Hopefully the debian
dependencies are correct and the debian package libhtml-mason-perl will
work.  If not, report a bug against debian.  The package maintainer is
ch...@de....

> so i added use HTML::Mason::Parser; right after use HTML::Mason; and get
> this instead:
>=20
> Starting FilterProxy: The Parser module is no longer a part of
> HTML::Mason.  Please see the Lexer and Compiler modules, its
> replacements.
>=20
> i hope this is useful. i'd really like to get it working again -
> filterproxy is the only software i've found so far that does what i
> want.

In the next version I'll update FP to work with the new Mason.  Thanks
for pointing this out.

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

[Filterproxy-devel] Re: (originally) new mozilla vulnerability

From: Bob M. <mce...@dr...> - 2002-06-27 15:10:22

Sean Maguire [ma...@mi...] wrote:
> Hello again,
>=20
> In response to your offer, pointers on matching stylesheet elements with =
rewrite=20
> rules would be welcome.  I have not yet figured out how to set up an appr=
opriate=20
> XSLT stylesheet (while I am sure it would be a valuable skill, I have not=
 yet=20
> had any other need to deal with them), and have failed to have even the m=
ost=20
> explicit matches work on the code example (tested with lynx using FilterP=
roxy).
>=20
> Since there is a fixed version of Mozilla out now, it is less of a concer=
n.=20
> However, considering the potential for sneaking in unpleasant formatting =
with=20
> stylesheets, I would like to be able to effectively rewrite those element=
s also.

Well the most straightforward thing to do would be to add a new keyword.
Each Rewrite keyword (tag/tagblock/regex) is a function in
FilterProxy/Rewrite.pm.  The simplest similar function in this file to
use as an example is probably "regex".  These functions are passed:
    my($content_ref, $spec_ref, $start, $end) =3D @_;

The first is a reference to the document.  The second is a reference to
the rule being used.  I use the /\G/g construct to parse $spec_ref so
that pos($$spec_ref) should point to the next keyword after your new
keyword when your function is done.

The $start and $end are the position of any existing match.  You should
use pos($$content_ref) =3D $end;  and then m/\G/g to start searching where
the last attempt left off. =20

This function should return 2 numbers, corresponding to the beginning
and end of the matched substring.  If none is found, it should return
undef for both.

> Great program that FilterProxy, btw.

Thanks!

BTW, did the regex rule I sent you in my last mail work?

Cheers,
-- Bob

Bob McElrath (rsm...@st...)=20
Univ. of Wisconsin at Madison, Department of Physics

4 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 3 4 > >> (Page 1 of 4)