Thread: [Xweb-developers] Re: Changes to XWeb
Brought to you by:
peterbecker
|
From: Peter B. <pe...@pe...> - 2003-11-05 21:28:00
|
[moved to the dev list -- for those tuning in: Hendrik submitted a few patches and I plan to do some changes to XWeb together with Jon (mentioned below). I tried to add some hints about the preceeding discussions, it was only two mails back and forth] Hendrik Lipka wrote: >Wednesday, November 5, 2003, 2:51:30 PM, you wrote: > > > >>I have thought about making big architectural changes for a long time by >>now, up to a complex processing model with streams, meta-data, >>multi-plexing and others. But a first step would be getting closer to >>the model of Lagoon or Transmorpher with simple one input, one output >>processors in a queue and implicit SAX- to ASCII-stream conversions. >> >> > >Personally, I'm happy with the current model. Its easy to unterstand, and >fulfills most needs, I think. > > It is limiting, e.g. you can't do things like first XSLT, then SVG transformation. Same for FOP. And you can't combine the power of UNIX tools into the process by running stuff like sed or scripting languages at some stages. Freemarker would be another thing I'd like to support (http://freemarker.sourceforge.net/). To do this properly, XWeb needs a general notion of a toolchain, not just for XSLT. >>Some pointers: >>- http://transmorpher.inrialpes.fr/docs/compare.html >>- http://meganesia.int.gu.edu.au/~pbecker/xweb/processingModel.html >> >> > >Interesting read. > > > >>But it would probably be handy for people who want to manage the whole >>process around this with Ant -- like the ftp or http uploads, maybe >> >> > >Sound like me :) > > > >>My suspicion is that you will end up rendering again quite often. But at >>least in the situation of a typo-fix and similar local changes it should >>be ok. We would need a way to define extra dependencies, though -- at >> >> > >Something like a <depenson> element below the entries should do. When a >style changes, everything is rebuild, otherwiese the dependencies are used. > > Yes, that was what I was thinking of. >>Libraries needed in 1.3, which are part of 1.4: >>- RegExp library like ORO >> >> > >I think ORO is the most complete RegEx library around, and its only 65k. > > The main thing I'd like to get rid of is Xerces since it is so huge. Do you know how close the ORO API is to the JDK 1.4 RegEx approach? >>- something for logging like log4j >> >> > >Log4J ist also one of the most complete loggers, and not _that_ large. > > It is not just about size, it is also about being mainstream. But of course Jakarta is reasonably mainstream and there are other things I might want to use from them, e.g. the CLI or FileUpload stuff from Commons. The Commons Logging package might be a good idea, too -- esp. in the case we want to go a mixed 1.3/1.4 route. >>- something for image output like JIMI (another bigger lib in the >>current XWeb) >> >> > >JDK1.4 has some image processing classes, but the ImageIO lib is still >required :( > > ImageIO is part of the JDK -- I use it in some other programs to export PNGs and JPGs. No extra libraries needed. And the API is a lot nicer than JIMI or JAI (the latter being incredibly bad in design). [discussing the idea of an <entryset>, which uses regular expressions or globbing and gets expanded to a number of <entry>s] >>globbing bit might require writing some matcher of our own, but that is >>easy, too. >> >> > >You are really optimistic :) My first test case would be something like >source='??some*.?htm?' target='$1next$2.html' >Renaming just the extensions would be easier... > > You could always map it to something like "..some.*\..htm." and run the RegExp machinery. I find the glob format a lot easier for simple things like matching file names and I think there are many people who use XWeb but don't know much about RegExp. Forgetting to escape the dot would be a first problem. [adding the file name also as id] >>>My website has a list of downloads, and with such generated IDs I could >>>make sure all links are indeed correct. >>> >>> >>Couldn't you just id the section (or better directory) and do a match on >>"\\directory[id='downloads']\file"? >> >> > >I wanted to generate the links just by giving the ID. > > I am a bit afraid of namespace pollution. If you just use a file name as ID, there are lots of IDs generated. And if you want to use the file name anyway, you can just put it into a URL. The internal linking feature of the generic stylesheet could be extended to evaluate something like href="!downloads/myFile.pdf". Admittably a bit more typing than just "!myFile.pdf", but as I said: the other option is a huge collection of IDs around. Another option would be doing both with optional ID generation, possibly with an id pattern attribute on the <entryset>. This could look like this: <fileset sourceFiles="*.xml" targetFiles="$1.pdf" type="docbookPDF" ids="pdf_$1"/> The @ids would be optional and no ids would be generated if it is missing. >>Ok -- I don't think Jon is subscribed yet. Jon: can you make sure you >>are on the list so we can all take it there? I suspect some of the old >> >> > >He was not on CC... > > My sentbox claims so. He should get this and we'll meet tomorrow anyway. >>I'll go through the changes and might commit some to the trunk. After >>that I'll post some update on the list. >> >> > >I stay tuned. > > Back soon. Peter |
|
From: Hendrik L. <hen...@gm...> - 2003-11-06 08:24:49
|
Wednesday, November 5, 2003, 9:46:56 PM, you wrote: > The main thing I'd like to get rid of is Xerces since it is so huge. Do > you know how close the ORO API is to the JDK 1.4 RegEx approach? Just looking into it: It seems to support the full set of POSIX RegEx (the type of RegEx supported is not stated, so I assume POSIX). But it does not support substitution, so this must be done manually (grouping is supported, though). I still would prefer ORO: it is the most complete lib out there, its small, and fairly stable. >>>- something for logging like log4j > It is not just about size, it is also about being mainstream. But of Log4j is the most mainstream you can get for logging. Its small, fast, stable, and pretty configurable (I always use the notion of 2 log targets= : one for the application [structured by the application components], one f= or debugging [structured by the Java classes]. Its really simple to do). It got a lot of tools (just have alook at the supported Appenders...).And it supports Java since 1.2, too. Commons Logging and the util.logging from 1= .4 are just inferior. > ImageIO is part of the JDK -- I use it in some other programs to export > PNGs and JPGs. No extra libraries needed. And the API is a lot nicer=20 > than JIMI or JAI (the latter being incredibly bad in design). I seem to have confused this with the JAI Image I/O (from http://developer.java.sun.com/developer/earlyAccess/jai_imageio/), which = is what I use (additionally to the Java Imaging Utulities) >>You are really optimistic :) My first test case would be something like >>source=3D'??some*.?htm?' target=3D'$1next$2.html' > You could always map it to something like "..some.*\..htm." and run the > RegExp machinery. Thats what ORO is doing internally for glob expressions. The problem ther= e is that for substitution with RegEx, one has to use () expressions in the matcher RegEx to generate the groups used for substitution. And as they a= re regular chars in a glob expression, one cannot specifiy them for the substitution... Possible solution: find all '.' and '*' expressions in th= e glob expression, and suround them with () in the generated RegEx. > I find the glob format a lot easier for simple things ACK > like matching file names and I think there are many people who use XWeb= =20 > but don't know much about RegExp. Forgetting to escape the dot would be= =20 > a first problem. > [adding the file name also as id] [Discussing the ID generation when copying multipel files at once] > I am a bit afraid of namespace pollution. If you just use a file name a= s=20 > ID, there are lots of IDs generated. And if you want to use the file=20 > name anyway, you can just put it into a URL. The internal linking=20 I wanted the ID to make sure the link is correct (I will get a warning if it is not because of a typo). Maybe a link checker for internal link woul= d solve this? > Another option would be doing both with optional ID generation, possibl= y > with an id pattern attribute on the <entryset>. This could look like th= is: > <fileset sourceFiles=3D"*.xml" targetFiles=3D"$1.pdf" type=3D"docbook= PDF" > ids=3D"pdf_$1"/> > The @ids would be optional and no ids would be generated if it is missi= ng. Thats what I was thinking anyway... hli --=20 M=F8=F8se trained to mix concrete and Hen= drik Lipka sign complicated insurance forms hendrik.lipka@= gmx.de www.hendrikli= pka.de |
|
From: Peter B. <pe...@pe...> - 2003-11-06 09:22:00
|
Hendrik Lipka wrote:
>Wednesday, November 5, 2003, 9:46:56 PM, you wrote:
>
>
>
>>The main thing I'd like to get rid of is Xerces since it is so huge. Do
>>you know how close the ORO API is to the JDK 1.4 RegEx approach?
>>
>>
>
>Just looking into it:
>It seems to support the full set of POSIX RegEx (the type of RegEx
>supported is not stated, so I assume POSIX). But it does not support
>substitution, so this must be done manually (grouping is supported,
>though). I still would prefer ORO: it is the most complete lib out there,
>its small, and fairly stable.
>
>
I just finished the 1.4 version -- but it wouldn't be hard to change at
all, at the moment it is more about getting things going. I did the $n
replacements manually, but it is just this bit:
for (int j = 1; j <= matcher.groupCount(); j++) {
targetName = targetName.replaceAll("\\$"+ String.valueOf(j),
matcher.group(j));
}
Not too hard at all :-)
>>>>- something for logging like log4j
>>>>
>>>>
>>It is not just about size, it is also about being mainstream. But of
>>
>>
>
>Log4j is the most mainstream you can get for logging.
>
Maybe at the moment, but I think the "official" API might take over quickly.
>Its small, fast,
>stable, and pretty configurable (I always use the notion of 2 log targets:
>one for the application [structured by the application components], one for
>debugging [structured by the Java classes]. Its really simple to do).
>
How do you configure log4j? I have mixed feelings about the JDK
approach: on one hand you can configure it externally and in extreme
detail, which means you can turn specific logging parts on on a client
machine. On the other hand the logging.properties file is not the best
place to fiddle around with (once you found the right one) and giving
specific logging options via command line is a bit painful.
>It
>got a lot of tools (just have alook at the supported Appenders...).
>
The plain and XML output the JDK produces seems good enough for me -- I
don't see why we need to log into a database or an IM network in XWeb.
And I don't think that would be hard to do with the JDK logging
framework. I'd actually write a custom one anyway since I want to log
into a frontend with output that allows interaction.
>And it
>supports Java since 1.2, too. Commons Logging and the util.logging from 1.4
>are just inferior.
>
>
What exactly do the 345kb give me in comparison to util.logging? Don't
get we wrong -- I don't say log4j is not an option at all, I just want
to know what it gives me, since I don't know much about it.
>>ImageIO is part of the JDK -- I use it in some other programs to export
>>PNGs and JPGs. No extra libraries needed. And the API is a lot nicer
>>than JIMI or JAI (the latter being incredibly bad in design).
>>
>>
>
>I seem to have confused this with the JAI Image I/O (from
>http://developer.java.sun.com/developer/earlyAccess/jai_imageio/), which is
>what I use (additionally to the Java Imaging Utulities)
>
>
JAI is even more evil than JIMI :-) Why the heck do you want to reduce
an OO environment down to a command line interface as done there
(http://java.sun.com/products/java-media/jai/forDevelopers/jai-apidocs/index.html)?
javax.imageio is the most sensible API from the three I tried (imageio,
JAI, JIMI).
>>>You are really optimistic :) My first test case would be something like
>>>source='??some*.?htm?' target='$1next$2.html'
>>>
>>>
>>You could always map it to something like "..some.*\..htm." and run the
>>RegExp machinery.
>>
>>
>
>Thats what ORO is doing internally for glob expressions. The problem there
>is that for substitution with RegEx, one has to use () expressions in the
>matcher RegEx to generate the groups used for substitution. And as they are
>regular chars in a glob expression, one cannot specifiy them for the
>substitution... Possible solution: find all '.' and '*' expressions in the
>glob expression, and suround them with () in the generated RegEx.
>
>
Currently it looks like this:
if (!"regex".equals(child.getAttributeValue("mode"))) {
// @todo we probably need more escaped here
sourceFilesAttrib =
sourceFilesAttrib.replaceAll("\\.", "\\.");
sourceFilesAttrib =
sourceFilesAttrib.replaceAll("\\*", "(.*)");
sourceFilesAttrib =
sourceFilesAttrib.replaceAll("\\?", "(.)");
}
I have to add the other regexp special characters for the escapes.
>>I find the glob format a lot easier for simple things
>>
>>
>
>ACK
>
>
>
>>like matching file names and I think there are many people who use XWeb
>>but don't know much about RegExp. Forgetting to escape the dot would be
>>a first problem.
>>
>>
>
>
>
>>[adding the file name also as id]
>>
>>
>
>[Discussing the ID generation when copying multipel files at once]
>
>
>>I am a bit afraid of namespace pollution. If you just use a file name as
>>ID, there are lots of IDs generated. And if you want to use the file
>>name anyway, you can just put it into a URL. The internal linking
>>
>>
>
>I wanted the ID to make sure the link is correct (I will get a warning if
>it is not because of a typo). Maybe a link checker for internal link would
>solve this?
>
>
The problem is that this gets resolved in the stylesheet. I added some
checking into the latest versions I used. I think they never made it
back into XWeb -- I'll check that and add them if necessary. But the
only way I found to give feedback is to put stuff on stdout via
xsl:message -- which gets lost in XWeb's verbosity. I think the problem
can only be fixed by a better reporting in general. One idea would be to
collect all stdout from the stylesheets (if possible) and to enlist it
in the end independent from the rest of the feedback. Another idea would
be reducing the verbosity of XWeb itself.
>>Another option would be doing both with optional ID generation, possibly
>>with an id pattern attribute on the <entryset>. This could look like this:
>> <fileset sourceFiles="*.xml" targetFiles="$1.pdf" type="docbookPDF"
>>ids="pdf_$1"/>
>>The @ids would be optional and no ids would be generated if it is missing.
>>
>>
>
>Thats what I was thinking anyway...
>
>
Haven't done that bit yet, will add it now. I got this far:
<fileset sourceFiles="*.png" type="copy"/>
<fileset sourceFiles="*.png" targetFiles="$1_th.png" type="thumbnail"/>
Which copies a bunch of images and generates a set of thumbnails. My
example application is a gallery section for an XWeb site, where I want
to be able to drop in an PNG or an SVG and get it added to a thumbnail
page as well as copied across.
Note that I also reduce the need for attributes -- the @targetFile(s) is
now optional everywhere, it defaults to the file name of the source(s).
Are there any good ideas around how to determine order for the
corresponding <entryset>s? I want to do something like this:
<entryset sourceFiles="*.xhtml" targetFiles="$1.html" names="$1"
ids="$1" type="XHTML"/>
Which should fill up a whole section with XHTML documents, using the
file names as part of the navigation. In that case order is somehow
relevant, and it is neither alphabetical nor chronological. One idea I
had was going alphabetical and then using this instead:
<entryset mode="regex" sourceFiles="(\d*)(\D.*).xhtml"
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>
Which should work, but the syntax is not really obivous unless someone
knows regex quite well. The input files would then have to have
preceeding numbers indicating their order. Alternatively this could be used:
<entryset mode="regex" sourceFiles="(.)(.*).xhtml"
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>
Which is the same trick with a single leading character to indicate
order. Another variant would be a prefix separated by some character, e.g.:
<entryset mode="regex" sourceFiles="([^ ]) (.*).xhtml"
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>
Any other ideas? Any ideas how to make that approach more accessible,
i.e. to get the same without the regexs? I don't really mind naming the
files in a certain scheme, but judging from the user feedback many find
the environment variables a killer, so I don't want to come up with
redular expressions :-)
Peter
|
|
From: Hendrik L. <hen...@gm...> - 2003-11-06 10:11:23
|
Thursday, November 6, 2003, 10:18:03 AM, you wrote:
> for (int j =3D 1; j <=3D matcher.groupCount(); j++) {
> targetName =3D targetName.replaceAll("\\$"+ String.valueOf(j)=
,=20
> matcher.group(j));
> }
Hmm. Better do it backwards, otherwise $10 and $1 would get mixed (I'm no=
t
sure how to specify a '0' after a '$1', though...)
> Maybe at the moment, but I think the "official" API might take over qui=
ckly.
Don't think so. log4j is _really_ widespread. All larger projects I know
are using it (sometimes hidden behind an own logging layer to make easy
replacement possible).
> How do you configure log4j?
Depends: if I have a XML config file already there for the application, t=
he
log4j config is just part of it (using a <log4j:configuration>
element). If not, I just specify a properties file containing the
configuration. In the most simple cases, I configure everything
programatically.
> The plain and XML output the JDK produces seems good enough for me -- I
> don't see why we need to log into a database or an IM network in XWeb.=20
Think about someone integrating XWeb into a larger application - maybe th=
ey
need such a thing. And if someone runs XWeb via a cron job, IM or mail
notification would be handy...
Log4j also allows for very flexible output formatting (on the screen, kee=
p
everything short, but be verbose in the debug log). I also like the
'appender additivity' - the debug log should contain all application log
messages, but not vice versa.
> And I don't think that would be hard to do with the JDK logging=20
> framework. I'd actually write a custom one anyway since I want to log=20
> into a frontend with output that allows interaction.
I try to never re-invent the whell...
> What exactly do the 345kb give me in comparison to util.logging? Don't
> get we wrong -- I don't say log4j is not an option at all, I just want=20
> to know what it gives me, since I don't know much about it.
- it does not force me to JDK 1.4
- it is complete
- appender additivity
- flexible output formatting
- flexbile configuration
- widely used, and very stable
- many tools and enhancement available
- the de-facto standard for logging
I think I could live with the java.util.logging, as long as it allows for=
a
flexible logging scheme and easy configuration.
> if (!"regex".equals(child.getAttributeValue("mode"))) {
> // @todo we probably need more escaped here
> sourceFilesAttrib =3D=20
> sourceFilesAttrib.replaceAll("\\.", "\\.");
> sourceFilesAttrib =3D=20
> sourceFilesAttrib.replaceAll("\\*", "(.*)");
> sourceFilesAttrib =3D=20
> sourceFilesAttrib.replaceAll("\\?", "(.)");
> }
> I have to add the other regexp special characters for the escapes.
Maybe you can look into the ORO class transforming the glob into a regula=
r
expression... As I said - why re-invent the wheel? It is hard to do it
right, and easy to make the same mistakes again...
> Are there any good ideas around how to determine order for the
> corresponding <entryset>s? I want to do something like this:
> <entryset sourceFiles=3D"*.xhtml" targetFiles=3D"$1.html" names=3D"$1=
"=20
> ids=3D"$1" type=3D"XHTML"/>
> Which should fill up a whole section with XHTML documents, using the=20
> file names as part of the navigation. In that case order is somehow=20
> relevant, and it is neither alphabetical nor chronological.
I would skip this feature. If a users wants a specifiy order, he should
specify it explicitly.
Maybe your regex solution could be optional, but only for advanced users.
Another idea: add a 'order' attribut, like:
<entryset mode=3D"transform" order=3D"number" sourceFiles=3D"*.xhtml"
targetFiles=3D"*.html" names=3D"*" ids=3D"*" type=3D"XHTML"/>
=20
where the 'order' attribute specifies how to determine the navigation
order. It could map to plugin-classes / predefined classes.
hli
--=20
M=F8=F8se trained to mix concrete and Hen=
drik Lipka
sign complicated insurance forms hendrik.lipka@=
gmx.de
www.hendrikli=
pka.de
|
|
From: Peter B. <pe...@pe...> - 2003-11-06 13:09:21
|
Hendrik Lipka wrote:
>Thursday, November 6, 2003, 11:33:28 AM, you wrote:
>
>
>
>>syntax. I was thinking about using Ant-style syntax in some other spots
>>later on, i.e. the "$10" would be "{$10}" instead.
>>
>>
>
>In this case, it would be nice to have it done properly and to introduce
>named replacement, to be able to insert any value into a filename...
>
>
Were would the other values come from?
One thing I was thinking of is introducing parameters for the
documentStyles -- if you look at this site section as an example:
http://www.kvocentral.org/publications/index.html -- each year is
created from the same input file using the same stylesheet with just a
different parameter "year", which filters publications out of the input
file (BibTeXML). For each page I need a different documentStyle at the
moment, since there is not way to attach a parameter to the entries. At
the moment there are a bunch of definitions like this:
<documentStyle type="XMLPublications2003">
<xsl stylesheet="layout/generic.xsl" navigationElement="html">
<parameter name="nav.main.pos" value="left"/>
<parameter name="nav.sec.pos" value="nested"/>
<parameter name="nav.sec.visible" value="current"/>
<parameter name="nav.sec.firstEntry" value="section"/>
<parameter name="style.markup.firstLetter" value="on"/>
<parameter name="style.markup.linkTypes" value="on"/>
<parameter name="feature.include.footer"
value="footer.xml"/>
<parameter name="feature.internalLink.token" value="!"/>
<xsl stylesheet="layout/publications.xsl">
<parameter name="year" value="2003"/>
<xsl stylesheet="layout/replaceAuthors.xslt"/>
</xsl>
</xsl>
</documentStyle>
Good thing I don't change much in there and the copy and paste happens
only once a year, but it is definitely not the way I'd like to do it.
The first thing I'd like to change is the option to set a
property/parameter/variable as child of an <entry>, so the innermost
<parameter> of the <xsl> would be:
<parameter name="year" value="{$year}"/>
The other bit that's missing is reuse of tasks, in this case the layout
part is the same for most types on the site (apart from copying and the
BibTeX creation). Instead of giving the parameters for the layout every
time (and changing it everywhere if needed), I'd like to call other
tasks. The syntax I think of would be serial, not nested and I think of
introducing <task> as an element. A <documentStyle> would then define
the images to be rendered plus a task to call. Tasks can call each other
to allow maximum reuse.
[..log4j descriptions...]
Ok, it does sound quite handy and slightly better than util.logging.
Here are the main differences I have figured out so far -- + is for
log4j, - is for util.logging:
+ no search for the right logging.preferences. With util.logging you
have to either edit the file in the libs dir of the JVM you are
currently using or give some command line property to point to another
configuration. Not what I want my users to do, I'd like to send them a
file and say: put this in that dir and send me the file called XYZ after
you ran the program
- syntax is simpler in util.logging. Properties format seems good
enough, XML overkill.
+ more available outputs
- download size
+ works with older JDKs/JVMs
Not much difference as far as I can tell, but maybe it is not too bad an
idea if I just try it and check the details. The move to 1.4 could be
postponed for a while.
>>Do you have a pointer to a good log4j tutorial? Otherwise I can google
>>for one.
>>
>>
>
>Starting here is a good idea:
>http://jakarta.apache.org/log4j/docs/documentation.html
>The short manual, and the JavaDoc are a good start. Additionaly,
>http://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/log4j/log4j.html
>makes a good read.
>
>You can also ask me for examples - I will the go and look for some of my
>code parts...
>
>
Ok, I'll read some of that and then just try.
>>Yeah -- maybe I should just go with the regex things for now since that
>>fits neatly in there, if people can't handle it but want to do something
>>similar, we can always add easier options later.
>>
>>
>
>I think many users will be happy with specifying order explicitly, as
>normally you won't have this much entries in the navigation. In most cases,
>only such directories as images or downloads contains too mucn entries to
>use a 'copy many files' tasks...
>
>
But the number scheme would give me a nice feeling of being back to
Basic V2 again :-)
I'll leave the <entryset> option with the usual regex replacement
abilites, but no extra options for the order for now -- let's see how
people use it. That's always the fun part -- I am quite astonished what
people made with XWeb so far. It was originally written for my sites,
then I published it and did the smart move of writing a manual, so now I
get emails from people I don't know every fortnight or so :-)
Peter
|