pycs-devel Mailing List for Python Community Server (Page 16)
Status: Alpha
Brought to you by:
myelin
You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(3) |
Oct
(1) |
Nov
(70) |
Dec
(41) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(20) |
Feb
(9) |
Mar
(36) |
Apr
(11) |
May
(3) |
Jun
(6) |
Jul
(3) |
Aug
(13) |
Sep
(2) |
Oct
(32) |
Nov
(4) |
Dec
(7) |
| 2004 |
Jan
(14) |
Feb
(16) |
Mar
(3) |
Apr
(12) |
May
(1) |
Jun
(4) |
Jul
(13) |
Aug
(1) |
Sep
(2) |
Oct
(1) |
Nov
(2) |
Dec
(3) |
| 2005 |
Jan
(7) |
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
(2) |
Jul
|
Aug
(5) |
Sep
|
Oct
|
Nov
(2) |
Dec
(1) |
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
(1) |
Sep
(2) |
Oct
(7) |
Nov
(18) |
Dec
(22) |
| 2007 |
Jan
(10) |
Feb
(11) |
Mar
(1) |
Apr
(6) |
May
(5) |
Jun
(5) |
Jul
(14) |
Aug
(28) |
Sep
(4) |
Oct
(6) |
Nov
(9) |
Dec
(8) |
| 2008 |
Jan
(10) |
Feb
(19) |
Mar
(38) |
Apr
(17) |
May
(13) |
Jun
(7) |
Jul
(36) |
Aug
(15) |
Sep
(2) |
Oct
|
Nov
|
Dec
|
|
From: Phillip P. <pp...@my...> - 2003-03-13 10:23:04
|
http://www.pycs.net/allyourrss.html a little mini-aggregator for pycs.net. In CVS now, as /rss ... do what you will. If anyone feels like hacking it to use a template (Cheetah or something) to generate the output, that would be cool. Adding it to the Makefile so it's installed with PyCS, and getting it to use pycs_paths.py would be handy too ;) Cheers, Phil |
|
From: Phillip P. <pp...@my...> - 2003-03-13 10:00:49
|
Hi, I've decided that the safest way to run htsearch is to fork inside the module script, run htsearch in the child process, and let the OS clean up after it. (how to do this in Python, for ppl who're interested: http://www.myelin.co.nz/post/2003/3/13/#200303135) However, I found that the module handler catches SystemExit exceptions, meaning that the child processes weren't being allowed to exit properly. I've changed pycs_module_handler to just re-raise if it gets a SystemExit, but it looks like Medusa also catches it. Here's a quick diff to get Medusa to re-raise too: RCS file: /cvsroot/oedipus/medusa/http_server.py,v retrieving revision 1.10 diff -u -r1.10 http_server.py --- http_server.py 18 Dec 2002 14:55:44 -0000 1.10 +++ http_server.py 13 Mar 2003 09:58:20 -0000 @@ -495,6 +495,8 @@ # This isn't used anywhere. # r.handler = h # CYCLE h.handle_request (r) + except SystemExit: + raise except: self.server.exceptions.increment() (file, fun, line), t, v, tbinfo = asyncore.compact_traceback() I guess we should push this one over to the Medusa people too ... (I haven't put any of the search stuff into CVS yet BTW, but will soonish hopefully). Cheers, Phil |
|
From: Georg B. <gb...@mu...> - 2003-03-12 13:34:31
|
Hi! > Congratulations! One more patch (the timezone in the logging was wrong and it did log in gmt instead of localtime) later and now it looks quite good: http://muensterland.org/statistics/ Nice. Next thing would be to find a way to do that per user. Maybe I just split the stuff by user path and run single instances of webalizer, or I just put in some grouping for some of the users. > In other news, we almost have another search engine backend available: > http://www.myelin.co.nz/post/2003/3/13/#200303131 Fine! The context in search results is one thing I miss with swish++. bye, Georg |
|
From: Phillip P. <ph...@my...> - 2003-03-12 13:07:38
|
> But it works. I now have a nice and shiny combined log with remote host
> IPs, referrers and user agent informations, but it is created on the
> community server. And it uses all rewriting rules, so I get only
> normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and
> so I could set up webalizer to just sum up stuff for one user. Or do other
> nice things with that :-)
Congratulations!
In other news, we almost have another search engine backend available:
http://www.myelin.co.nz/post/2003/3/13/#200303131
I just realised that ht://Dig has a number of classes using static member
variables that don't seem to be cleaned up properly, so I'm going to have to
change all that if we want to ever be able to do more than one search per
PyCS process (reloading _htsearch.so might help, but I bet I'd end up with
one hell of a memory leak). Ahh, CGI ...
Cheers,
Phil :)
|
|
From: Georg B. <gb...@mu...> - 2003-03-12 13:00:20
|
Hi! > I already have a hack working (not yet checked in, though) that will > patch the http_request objects in a way that they log in the combined > log format (with referrers and user-agent info). I currently investigate > how > complicated it would be to get Apache pass on the client address in a > header, so I could use that in the logging to replace the apache machine > header. Ok, it is now working. I have added a new vhostfrom rule to the rewrite.conf.default and added several patches in pycs.py and pycs_rewrite_handler.py. The main problem is, that medusa doesn't give a nice way to specify what class to use for http requests. So to do all this nicely, I would have to overload the full hierarchy and make changes to several methods and classes. To prevent that (as that would likely break with newer releases where the inner workings change), I just patch some class objects with setattr. This will break with newer versions, too, if some key components change. But that's only very small code added, and only actually one dependency on inner workings at all: I assume that http_request objects have a header and _header_cache instance variable like they do now. So if someone want's to dig into the code, be warned. It is butt ugly ;-) But it works. I now have a nice and shiny combined log with remote host IPs, referrers and user agent informations, but it is created on the community server. And it uses all rewriting rules, so I get only normalized URLs (/users/xxxxxx/ stuff). This can be splitted by user and so I could set up webalizer to just sum up stuff for one user. Or do other nice things with that :-) bye, Georg |
|
From: Phillip P. <ph...@my...> - 2003-03-12 11:59:16
|
> > I only analyse what comes in from Apache, because that gives me the
> > client IP address.
>
> I am currently working out how to solve that, too. :-)
>
> I already have a hack working (not yet checked in, though) that will patch
> the http_request objects in a way that they log in the combined log format
> (with referrers and user-agent info). I currently investigate how
> complicated it would be to get Apache pass on the client address in a
> header, so I could use that in the logging to replace the apache machine
> header.
You could always continue the ~~vhost~~ thing and turn it into
~~vhost~~/ip.address/server/path ...
BTW this may be useful:
http://httpd.apache.org/docs/mod/mod_headers.html
Now, can we get it to take input from mod_rewrite? :-)
Cheers,
Phil
|
|
From: Georg B. <gb...@mu...> - 2003-03-12 11:42:26
|
Hi! > Nothing from my end. In fact I totally ignore the logs coming out of > the PyCS process ;-) > > I only analyse what comes in from Apache, because that gives me the > client IP address. I am currently working out how to solve that, too. :-) I already have a hack working (not yet checked in, though) that will patch the http_request objects in a way that they log in the combined log format (with referrers and user-agent info). I currently investigate how complicated it would be to get Apache pass on the client address in a header, so I could use that in the logging to replace the apache machine header. This would allow me to create full combined logs for the machine and so split that up to produce statistics for user directories with all informations that would be available from the apache machine. Actually I don't like running webalizer on the apache machine because there it doesn't have the rewritten addresses. Since I use manila style host names, I get a lot access to stuff like /weblog/index.html - but can't tell wether that's for hugo.muensterland.org, witch.muensterland.org or pyds.muensterland.org :-/ bye, Georg |
|
From: Phillip P. <pp...@my...> - 2003-03-12 11:33:53
|
Hi, > If you know of something that exists and might break with this change, > notify me and I will have to make this logging behaviour configureable. Nothing from my end. In fact I totally ignore the logs coming out of the PyCS process ;-) I only analyse what comes in from Apache, because that gives me the client IP address. > Another change in CVS is that now there is the /status activated in > medusa. It's only a simple status page and doesn't include too much > information, but I think we should support it with our own handlers, in > the long run. Might be a nice place for a quick glance on how your server > performs. Good point. When I coded the server in the first place, I turned off everything I didn't immediately need, because I was in a hurry and didn't want to have to bother checking to make sure it was secure. Then, I never went back to do the extra work and get it all going again ... ;-) So if you think /status is OK, I don't mind having that that turned back on again. Cheers, Phil |
|
From: Georg B. <gb...@mu...> - 2003-03-12 11:12:47
|
Hi! > I am unsatisfied on how PyCS currently does logging: it's all in one big > file and _before_ rewriting takes place. This makes up for very ugly > URLs when PyCS runs behind an Apache. My idea is to provide common log > file format per user, but _after_ rewriting takes place. I think I found it. pycs-rewrite_handler.py doesn't change the request.request field on rewriting. I changed this so that it now constructs a new request and put's it in there. This should work out nicely, as it doesn't change anything else in the system, just the field and code that depends on that (and that should - in my opinion - get the rewritten address). But this change (just checked it into CVS) might break stuff that depends on the access.log written by pycs. So if you have a log analyzer working on your pycs-generated access.log, things have changed and you won't find original URIs in there. I checked Phils make_referer.py script, that reads the apache log files and so isn't influenced by my change. But there might be other stuff outside. If you know of something that exists and might break with this change, notify me and I will have to make this logging behaviour configureable. Another change in CVS is that now there is the /status activated in medusa. It's only a simple status page and doesn't include too much information, but I think we should support it with our own handlers, in the long run. Might be a nice place for a quick glance on how your server performs. bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-10 14:29:15
|
Hi! I am unsatisfied on how PyCS currently does logging: it's all in one big file and _before_ rewriting takes place. This makes up for very ugly URLs when PyCS runs behind an Apache. My idea is to provide common log file format per user, but _after_ rewriting takes place. Now the question: how is that accomplished? The logging currently is added on instantiation of the http_server, so it looks like that way can't be used, but a manual log handler should be added. Is there an easy way in medusa to do that? The base idea is to be able to do webalizer like stuff with those logfiles per user and so enrich the informations a user can get from his cloud. At least hits/pages/files and stuff like that should be doable (although you don't get hosts/views, as the IP get's lost when it runs behind an Apache). I think about overviews of hits/pages/files/kbs per hour, per day, weekly and stuff like that. bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-04 15:48:12
|
Hi! Stumbled over several small bugs in the named modules, mostly wrong quoting/unquoting stuff. Should now work better. And changed several of the regexps to better recognize search urls where the searchterm isn't the first parameter (couldn't show up before, as the medusa bug prevented parameters after the first parameter to show up :-) ). bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-04 12:43:59
|
Hi! Actually the medusa bug in PyCS wiped out a bug in PyDS. Now that PyCS works right, PyDS stumbles :-) This is quite weird, two bugs working together to create an actual working experience. I happened to hear about that before, but never seen it life myself. Wew. Ok, so you need to do a small patch to PyDS/MacrosTool.py in order to get PyDS working again nicely with the count.py counter - without the patch, you will not have any more referers in your referers list. So it's not that big a problem, but annoying it is. Just edit the PyDS/MacrosTool.py and change the lines with the %%26 in them (two assignements to html) to use & instead of the %%26. This will be fixed in the 0.4.15 I am currently working on and am releasing hopefully today. bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-04 11:10:59
|
Hi! > ... so will forward it on to them. Unless you get there first ;-) Since I am now at work and only have limited access to home email, you can send in the patch :-) bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-04 11:09:20
|
Hi! > Shall we forward this to AMK? I just subscribed to the -dev list and will send the patch in, as soon as I am subscribed. bye, Georg |
|
From: Phillip P. <pp...@my...> - 2003-03-04 11:05:27
|
On Tue, Mar 04, 2003 at 11:59:06AM +0100, Georg Bauer wrote: > > path, qs = urllib.splitquery(request) > > if '%' in path: > > request = unquote(path) + qs > > Ok, I tried a bit, this is what looks like to work (and is currently used > on muensterland.org to test it): > > in the top of http_server.py: > > from urlllib import unquote, splitquery > > in the found_terminator method: > > rpath, rquery = splitquery(request) > if '%' in rpath: > if rquery: > request = unquote(rpath)+'?'+rquery > else: > request = unquote(rpath) > > Did I miss some problem? Or should this really work? Looks like it should work to me... just joined the medusa-dev mailing list: http://mail.python.org/mailman/listinfo/medusa-dev ... so will forward it on to them. Unless you get there first ;-) Cheers, Phil |
|
From: Georg B. <gb...@mu...> - 2003-03-04 11:00:26
|
Hi! > path, qs = urllib.splitquery(request) > if '%' in path: > request = unquote(path) + qs Ok, I tried a bit, this is what looks like to work (and is currently used on muensterland.org to test it): in the top of http_server.py: from urlllib import unquote, splitquery in the found_terminator method: rpath, rquery = splitquery(request) if '%' in rpath: if rquery: request = unquote(rpath)+'?'+rquery else: request = unquote(rpath) Did I miss some problem? Or should this really work? bye, Georg |
|
From: Phillip P. <pp...@my...> - 2003-03-04 10:56:16
|
> > IMHO this should become: > > > > if '%' in uri: > > path, qs = urllib.splitquery(uri) > > url = unquote(path) + qs > > Hmm. It's in line 469ff - there is actually "request" in the code, nur > "uri" - are you using an older/newer version than 0.5.3? I'm running the latest CVS version: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/oedipus/medusa/http_server.py?rev=1.10&content-type=text/vnd.viewcvs-markup Looks like request is 'GET /foo/bar/... HTTP/1.1' and uri is just the URI part of that. The latest change (2 months old) was to move the unquoting down a bit in the code so it only unquoted uri, not request, fixing part of the problem (URLs with spaces ended up decoding totally wrongly) but not all of it (thus our difficulty now). > path, qs = urllib.splitquery(request) > if '%' in path: > request = unquote(path) + qs > > So we don't unquote if there is only quoting in the query? This should fix > the problem, I think. If Id idn't overlook some problem, that is ... Yeah. Not that it makes much of a difference; if s doesn't have any '%'s, unquote(s) should IMHO equal s anyway ... Shall we forward this to AMK? Cheers, Phil :) |
|
From: Georg B. <gb...@mu...> - 2003-03-04 10:41:34
|
Hi! > I take it that the offending bit is: > > # unquote path if necessary (thanks to Skip Montanaro for > pointing > # out that we must unquote in piecemeal fashion). > if '%' in uri: > uri = unquote (uri) Yep, that's the bugger. > IMHO this should become: > > if '%' in uri: > path, qs = urllib.splitquery(uri) > url = unquote(path) + qs Hmm. It's in line 469ff - there is actually "request" in the code, nur "uri" - are you using an older/newer version than 0.5.3? path, qs = urllib.splitquery(request) if '%' in path: request = unquote(path) + qs So we don't unquote if there is only quoting in the query? This should fix the problem, I think. If Id idn't overlook some problem, that is ... bye, Georg |
|
From: Phillip P. <ph...@my...> - 2003-03-04 10:23:48
|
OK, more on this.
I take it that the offending bit is:
# unquote path if necessary (thanks to Skip Montanaro for
pointing
# out that we must unquote in piecemeal fashion).
if '%' in uri:
uri = unquote (uri)
IMHO this should become:
if '%' in uri:
path, qs = urllib.splitquery(uri)
url = unquote(path) + qs
That should unquote the path, but leave the rest to do later. What do you
think?
Cheers,
Phil :)
|
|
From: Phillip P. <pp...@my...> - 2003-03-04 10:06:19
|
On Tue, Mar 04, 2003 at 09:01:26AM +0100, Georg Bauer wrote: > >We could always fix it ourselves and send a patch to the Medusa > >maintainers - there seems to be a reasonable amount of activity going > >on in that project, so I'm sure they'd be happy to hear from us ... > > Sure, we can. But I have to admit that I don't have an idea how to do > that _right_, the only ideas coming up to me currently are bad and ugly > hacks (like tearing the request apart, unquoting partial stuff, > reconstructing it - must be the binary/textfile issues Hal pointed me > to, those make my brain hurt ;-) ). But I am not sure that things won't > break. Hmm. Do you have a nice idea? If yes, go ahead :-) Hmm ... I'll take a look. I didn't think it was that hard -- basically, given an HTTP request: >>> import urllib >>> url = 'http://foo.com/bar/baz?' + urllib.urlencode((('baz','boz'), ('abc', 'a=b&c?d'))) >>> url 'http://foo.com/bar/baz?baz=boz&abc=a%3Db%26c%3Fd' We can just split by &, then by =, then unquote to get the values: >>> path, qs = urllib.splitquery(url) >>> path 'http://foo.com/bar/baz' >>> qs 'baz=boz&abc=a%3Db%26c%3Fd' >>> bits = qs.split('&') >>> bits ['baz=boz', 'abc=a%3Db%26c%3Fd'] >>> for bit in bits: ... key,value = urllib.splitvalue(bit) ... (key, urllib.unquote(value)) ... ('baz', 'boz') ('abc', 'a=b&c?d') That gives you all the bits out of the query string ... presumably Medusa gets the rest right already ... (BTW doesn't Medusa give us a copy of the full query string anyway? In PyCS I think each script calls pycs_http_util to split it up ...) Cheers, Phil :) BTW - here's the raw code for the above, if you want to hack around: import urllib url = 'http://foo.com/bar/baz?' + urllib.urlencode((('baz','boz'), ('abc', 'a=b&c?d'))) url path, qs = urllib.splitquery(url) path qs bits = qs.split('&') bits for bit in bits: key,value = urllib.splitvalue(bit) (key, urllib.unquote(value)) so I guess: def urldecode(url): path, qs = urllib.splitquery(url) return [(key,urllib.unquote(value)) for key,value in qs.split('&')] |
|
From: Georg B. <gb...@mu...> - 2003-03-03 14:03:27
|
Hi! There is a bug in medusa that creates problems for PyCS and PyDS when passing URIs as parameters to handlers via GET methods. Medusa unquotes the request in the http_server.py module in the http_channel class in the found_terminator method. It unquotes the _full_ request line, not only the command and path parts. This produces problems when one of your parameters you try to pass in is an URI, like is the case with the counter script that creates the referer entries. This is the reason why in the referer lists URIs only show their first parameter. The problem is, the unquote removes the quote-protection from the parameter values. Since we interpret the query part after the global unquote, the before protected additional parameters of the passed in URI now become parameters of the called URI. I don't have a good idea how to fix this without touching medusa (which I wouldn't like to do, as this complicates setup), and so contacted the upstream author on it and left the bug in the system. But if the upstream author doesn't come up with something, we will have to fix that ourselves, as it really creates problems. Anyone of you with a good idea? bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-03 08:21:31
|
Hi! >> How about some Fink-based instructions for use under Mac OS X/ In >> fact, why not release the whole thing under Fink? Hmm. Actually with the advent of 10.2, Fink isn't reall neaded for PyCS/PyDS, since Python 2.2 is already included with OS X. And so we only need packages to install the stuff that is installed after Python. I started with a .pkg installer for PyDS for OS X, but that produced a 24 MB monster, so I dropped that idea. I now switched my machine at home to OS X 10.2, so I think I will switch to using the builtin python some time in the future and reevaluate that area. fink installations should be able to use most of the debian infrastructure that's in PyDS, and since PyCS needs the same things, it should be possible to share most packages (only the pycs*.deb and pyds*.deb are different, all other packages are identical), so if one is willing to do that, go ahead. I did no fink packages up to now and don't think I will do some, at least not before I have native OS X packages ready. Actually installing stuff with OS X got much simpler with 10.2, it's mostly python setup.py install now. Only problems left are metakit, where I still have to send up some build process patches upstream and PIL, where you need an installed libjpeg development library. bye, Georg |
|
From: Phillip P. <pp...@my...> - 2003-03-03 03:50:51
|
PyCS? That would be cool, although I don't have a mac, so it would be rather hard for me ;) Georg Bauer uses one most of the time, I think, so you might want to ask him. I'll forward this message to the dev mailing list - let's see what he has to say. Cheers, Phil On Sun, Mar 02, 2003 at 08:39:12PM -0600, Alan Sill wrote: > Hi, > > How about some Fink-based instructions for use under Mac OS X/ In > fact, why not release the whole thing under Fink? > > Thanks, > Alan > > On Friday, February 28, 2003, at 06:27 AM, Phillip Pearson wrote: > >On Thu, Feb 27, 2003 at 05:47:33AM -0600, Alan Sill wrote: > >>What sort of security can I use (I know, that's not the idea but ... > >>;-/ ) to authenticate users and/or limit postings to originate from > >>certain domains, etc.? The university and lab are pretty paranoid > >>about this sort of thing, even though everything we do is > >>public-domain, non-classified, non-military etc. > > > >Hmm ... if you run it behind an Apache server, and firewall off port > >5445, you can use Apache access controls to limit access by IP. > >PyCS's authentication module lets you restrict access to certain blogs > >by login name -- I think there are some notes on how it works in > >pycs_auth_handler.py. > > > >(Set the default user to have no access, and then you can create > >logins, etc...) > > > >>Also, I notice you dn't seem to use a backing database (at least I > >>can't recall being asked to set one up). Your public site must > >>generate a lot of traffic -- how do you keep that straight? > > > >It doesn't generate all that much, actually ;) > > > >The data is stored in a MetaKit database, and people's blogs are > >stored as static HTML in the /var/lib/pycs/www/users/* > >directories... so it's pretty quick. The updates page doesn't seem as > >fast as it could be, and I have my doubts about some of the hit count > >functions, but I'm not having any trouble. You'll probably see more > >load in your lab than I do on my site, but I really doubt you'll faze > >the server. > > > >(I think it should be able to serve at least 50 hits/second when > >showing blogs etc, and maybe handle 10-50 writes a second. So until > >you've got 1000-10000 users, you shouldn't have any trouble!) |
|
From: Georg B. <gb...@mu...> - 2003-03-01 20:00:25
|
Hi! > Hmm. Looks fine to me, except that several files are missing (images and > stylesheet). This might be due to the fact that PyDS isn't really > Window-aware and needs a $HOME directory for the user and want's to > create a .PyDS directory there and populate it. So all this might go bad Hal Winer gave a bug report on this, too, and I think I might have that one nailed down, now. Read through the description at http://pyds.muensterland.org/, there now is a description to resolve at least the upstreaming-doesn't-find-files problem. bye, Georg |
|
From: Georg B. <gb...@mu...> - 2003-03-01 08:44:32
|
Hi! I just added some translations for the search form that Phil added and made a small fix so that swish.py now correctly ignores the # lines that are output by search++. bye, Georg |