[Filterproxy-devel] Re: : : FilterProxy XSLT
Brought to you by:
mcelrath
From: Bob M. <mce...@dr...> - 2002-01-11 20:14:22
|
Mario Lang [la...@zi...] wrote: > Bob McElrath <mce...@dr...> writes: >=20 > > So as long as all the xsl files are in one directory (or in a dir > > relative to xsl/), we should be able to include them by specifying a > > relative path. > >=20 > > In the example you sent me, (for slashdot) I'm seeing black-on-black te= xt. > > Do you get that too? > I realized it after a sighted coworker pointed it out to me :) Are you truly blind? Is FilterProxy+XSLT a useful tool for you? That would be neat! How are you able to navigate, edit, and create complex documents like XML? = Do you use a braille terminal, or is speech synthesis useful? How do you hand= le all the wacky characters in xml like <>:, etc.? Braille doesn't have characters for those, does it? Please forgive my curiosity, I don't mean to intrude. I don't think I've e= ver met a blind computer user before! I can't imagine using a computer without sight! I use very high resolution on my monitor and small fonts to get the most information on my screen, doing it without sight must be very challenging! > It has to do with the html somehow being wrongly parsed, but I never > really checked why. I use XSL primarily for extracting > text content out of overdesigned webpages. Just as > that slashdot example shows... What is interesting to me is that you could use it to remove ads by examini= ng the content. The only problem is that it's very site-specific. > ... Ahh, I found it. The original page has the same > color and bgcolor set to 000000. Seems that they used some other tricks > to make the text readable... So after extracting that stable, > those tricks get lost it seems. >=20 > Maybe we can add sanity checks to the xsl file in some way,=20 > but I am fairly new to xsl either. In fact, as the comment sections of > those now in CVS show, they are written by T.v.Raman for the Emacspeak > audio desktop. I'm very new to it also. Someone pointed it out to me a long time ago, as = the "right" way to do what I was trying to do with the Rewrite module. However, since XSLT won't let you use regexes, I think both have their place. > > It's pretty fast too, on my machine that slashdot rule only takes 0.3s.= Neato. > > ;) (the rewrite rules on slashdot take ~3s for me) > Thats basicly because we really use libxslt and libxml for > the hard work. Although I am suprised that it is really that much faster, > xslt is a quite heavy operation.=20 My machine is pretty slow, comparatively, at doing Rewrite rules (it's an alpha). Comparable x86 machines often beat me in Rewrite times by a factor= of more than 3. I'm not sure why. Probably has to do with poor code optimiza= tion on the alpha. Parsing html/xml, with it's matched-nested-tag structure, is very computationally intensive. I've toyed with the idea of "treeing" the docum= ent first, to make multiple traversals much faster. I'm sure this is how the C libxml/libxslt does it. The perl module HTML::Parser does this too, but somehow ends up being slower than my regexes anyway. (initial versions of FilterProxy used HTML::Parser, but I found I could write a faster regex, and since then I've made many speed enhancements) Frankly, the speed hit in parsing matched-tag documents (like XML) is why I= 've been totally skeptical of people using XML for everything (i.e. xml-rpc). = But anyway...XSLT is still a cool idea. > I thank you for those fixes. Saved me alot of time.. No prob. I was excited to play with XSLT! > > Problem is that errors/warnings generated by Mason don't have accurate = line > > numbers.=20 >=20 > I already had to experience that. And its no fun, truly. I was using Mason 0.89, they're up to 1.0.4. I'll have to do some more che= cks to see if the problem still exists with newer versions. I'll bitch at them= if it's still giving bad line numbers. > > BTW, you should join the filterproxy-devel list, and we can continue > > discussions there, so others can see what you've done! ;) > Done :-) I've added you to filterproxy-devel. It seems you didn't get subscribed... > I tried to send the reply for you to the list, but > sf.net seems to have problems. I cant send to that list, > and there is no archive on the sf.net webpages. It should work now. Geocrawler was written by a stoned monkey with massive brain contusions. I= t's almost completely useless as an archive anyway, and flaky at best in actual= ly archiving any messages. For a while the mailing lists showed up under "public forums" and that was really cool, but they broke it. I think the sourceforge people are using u= s as guinea pigs for all their ideas, then when they get the code perfect, the remove the feature from the public site and put the feature in their (close= d) commercial code. Cheers, -- Bob Bob McElrath (rsm...@st...)=20 Univ. of Wisconsin at Madison, Department of Physics |