Incremental update of a website
Brought to you by:
peterbecker
It would be a nice feature, if the pages would be only transformed with xsl, when they have changed.
i.e:
xweb - file has been changed --> all pages are rebuilt.
a stylesheet has been changed --> all pages are rebuilt.
else --> pages are only rebuilt, when the source files have been changed.
just like make - files... ;-)
Logged In: YES
user_id=41603
This would indeed be a useful feature for bigger websites.
Since I am currently not aware of big websites using XWeb
;-) and implementing a global option like this (and it
should be an option) is a little tricky (only the first one)
I won't give it high priority.
PeterB
Logged In: YES
user_id=680
Something like this would be all but necessary if you wanted
an upload utility built into XWeb. Otherwise the uploader
wouldn't know what'd changed, so would have to upload the
entire site for every little update. And with Evil Dial-up
Connections still quite common...
Oh, and you could prolly use Jakarta's Ant for this.
Logged In: YES
user_id=41603
This feature is definitely more tricky than it seemed on
the first look. The problem is, that changes in the
makefile (website.xweb) would affect every single result if
dependencies are handled on the file level. This is not
true for other makefiles, they usually don't have this
dependency.
The simple form David proposes could be implemented without
too much effort, but I am still not convinced that it is
really useful -- it makes sense when you try to create your
content, but you can check this by calling the XSLT
processor yourself. Of course the trick with the just-copy
XSLT stylesheet (I usually put it in the examples as
debug.xsl) is not exactly obvious, but it works quite well.
It would make sense to put the makefile into the output,
run a treediff to the current version and then to decide
which parts to recreate. Even this would be not as useful
as it might seem, since every change in the <structure>
part will affect all pages that use the [navigationElement]
attribute -- thus having a dependency from file
addition/deletion to nearly all pages in a typical site.
It would reduce the number of buttons rendered, which is a
significant amount of time, esp. since I have to create
temporary files for this due to problems with the Batik DOM
implementation.
About the upload feature: by now I think this should not be
part of the backend. If you use the command line you should
be able to find some tool that handles the synchronization
for you (I should look for such thing and put it in the
links section, if possible I'll add something in the
distribution). In the long run I want to have nice GUI
frontends -- they could add this feature, too. I'd like to
keep the backend more focused on its real job -- the good
ol' UNIX way. It took me a while to believe that the '-z'
on tar is superflous but a nice script and some magic(4)
would do the same thing -- see SuSEs 'less' as example
(uses gzip transparently if needed) ;-)
Unfortunately all people that announced their interest in
doing the frontends gave up quite fast. My code is not that
bad :-(
Logged In: YES
user_id=117495
The most simple dependency is:
-- xweb file hasn't changed --> no images have to be rendered.
I know all the other dependencies are harder to find, like
-- sourcefile hasn't changed --> sourcefile don't has to be processed if no stylesheet has changed.
Logged In: YES
user_id=41603
Ok, the main file -> buttons dependency is simple, assuming
we copy the main file into the output (or store a timestamp
somewhere). Although I am still not convinced that this
itself will help much I will see that I implement it once I
have global options (planned for 0.5, mainly to allow
control of the log/debug output).
Here is the plan:
If incremental processing is set (Java Properties) XWeb will
copy the makefile into the main output directory, which is
first only used as timestamp. Then recognizing the following
dependencies can be implemented in order:
1) makefile -> images
2) makefile, stylesheet(s), input -> output
3) <structure>, <documentStyle>, stylesheet(s) for this,
input -> output
I don't think (1) will help much but it is a start. (2)
would mean adding some code to figure out if a
<documentStyle> has to be updated (needs additional
timestamps for stylesheets), then comparing input and output
and (3) means introducing a diff on the makefile.
Don't expect too much of this for 1.0, I'd like to get to
the full release soon and after a break I'll try to change
the document processing part to a more process oriented view
anyway -- the result will be similar to SVGs filter elements
and will allow streaming XML and binaries and forking output
(e.g. for embedded SVG/MathML), I'll try to document this
idea in the new website. It is an extension of the concept
used for creating http://xml.apache.org (xml-stylebook, I am
in contact with its author).
This will change the dependencies again, but maybe it is
more clear then: a processor could be marked "dirty" (e.g.
since stylesheet or definition in makefile has changed),
every document of a type that uses this processor has to be
recreated independent of the input/output timestamps.
Logged In: YES
user_id=117495
Why copy the makefile to the output directory? You could just check if the .xweb file is newer
than 1 or more of the output (.html) - files, then you know it has changed.
BTW: stylesheets: I think this could be tricky, because you'll have to follow all the xsl:include's too,
so you'd need extra xsl processing...
Timestamps for stylesheet: If the stylesheet is newer than 1 (or more) of the generated output files,
all have to be recompiled.
Logged In: YES
user_id=41603
Comparing the makefile to all output files (not just the
HTML) would be a lot to do -- both from the implementation
point of view (creating all names for images) and the
runtime point of view (even my small sites have a large
number of files due to mouseOver and other images). This
might end in a worse performance than without incremental
updates. Copying the makefile should be easy and allows to
switch to the diffs later.
I wasn't yet really aware of the <xsl:include> problem, but
of course this makes the problem even larger. The same
applies for external entities in XML input and maybe
xinludes later.
The argument with the stylesheet is correct: it should be
possible to assign each <documentStyle> a timestamp: the
newest stylesheet used. If either the input file or the
documentStyle is newer than the output file it has to be
recreated.
But I think this discussion shows that getting really
interesting results out of an incremental approach is hard.
Although it would be nice to update just a single page
without recreating the rest (e.g. due to a fix of a typo) I
can't see to get there easily, even if we ignore some of the
more critical problems like included stylesheets (by just
defining that incremental updates are limited there).
I'll keep an eye on this problem but I don't think I'll put
much work into it since I consider other aspects more
important/promising. Of course the usual OSS rule applies:
whoever volunteers can do it himself ;-)
Logged In: YES
user_id=41603
Discussion continued on mailing list:
http://www.geocrawler.com/lists/3/SourceForge/12472/0/5996300/
Logged In: NO
gfbfgbfgb