Re: [Xweb-developers] Re:Changes to XWeb

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hendrik Lipka wrote:

>Wednesday, November 5, 2003, 9:46:56 PM, you wrote:
>
>  
>
>>The main thing I'd like to get rid of is Xerces since it is so huge. Do
>>you know how close the ORO API is to the JDK 1.4 RegEx approach?
>>    
>>
>
>Just looking into it:
>It seems to support the full set of POSIX RegEx (the type of RegEx
>supported is not stated, so I assume POSIX). But it does not support
>substitution, so this must be done manually (grouping is supported,
>though). I still would prefer ORO: it is the most complete lib out there,
>its small, and fairly stable.
>  
>
I just finished the 1.4 version -- but it wouldn't be hard to change at 
all, at the moment it is more about getting things going. I did the $n 
replacements manually, but it is just this bit:

     for (int j = 1; j <= matcher.groupCount(); j++) {
          targetName = targetName.replaceAll("\\$"+ String.valueOf(j), 
matcher.group(j));
     }

Not too hard at all :-)

>>>>- something for logging like log4j
>>>>        
>>>>
>>It is not just about size, it is also about being mainstream. But of
>>    
>>
>
>Log4j is the most mainstream you can get for logging. 
>
Maybe at the moment, but I think the "official" API might take over quickly.

>Its small, fast,
>stable, and pretty configurable (I always use the notion of 2 log targets:
>one for the application [structured by the application components], one for
>debugging [structured by the Java classes]. Its really simple to do). 
>
How do you configure log4j? I have mixed feelings about the JDK 
approach: on one hand you can configure it externally and in extreme 
detail, which means you can turn specific logging parts on on a client 
machine. On the other hand the logging.properties file is not the best 
place to fiddle around with (once you found the right one) and giving 
specific logging options via command line is a bit painful.

>It
>got a lot of tools (just have alook at the supported Appenders...).
>
The plain and XML output the JDK produces seems good enough for me -- I 
don't see why we need to log into a database or an IM network in XWeb. 
And I don't think that would be hard to do with the JDK logging 
framework. I'd actually write a custom one anyway since I want to log 
into a frontend with output that allows interaction.

>And it
>supports Java since 1.2, too. Commons Logging and the util.logging from 1.4
>are just inferior.
>  
>
What exactly do the 345kb give me in comparison to util.logging? Don't 
get we wrong -- I don't say log4j is not an option at all, I just want 
to know what it gives me, since I don't know much about it.

>>ImageIO is part of the JDK -- I use it in some other programs to export
>>PNGs and JPGs. No extra libraries needed. And the API is a lot nicer 
>>than JIMI or JAI (the latter being incredibly bad in design).
>>    
>>
>
>I seem to have confused this with the JAI Image I/O (from
>http://developer.java.sun.com/developer/earlyAccess/jai_imageio/), which is
>what I use (additionally to the Java Imaging Utulities)
>  
>
JAI is even more evil than JIMI :-) Why the heck do you want to reduce 
an OO environment down to a command line interface as done there 
(http://java.sun.com/products/java-media/jai/forDevelopers/jai-apidocs/index.html)?

javax.imageio is the most sensible API from the three I tried (imageio, 
JAI, JIMI).

>>>You are really optimistic :) My first test case would be something like
>>>source='??some*.?htm?' target='$1next$2.html'
>>>      
>>>
>>You could always map it to something like "..some.*\..htm." and run the
>>RegExp machinery.
>>    
>>
>
>Thats what ORO is doing internally for glob expressions. The problem there
>is that for substitution with RegEx, one has to use () expressions in the
>matcher RegEx to generate the groups used for substitution. And as they are
>regular chars in a glob expression, one cannot specifiy them for the
>substitution... Possible solution: find all '.' and '*' expressions in the
>glob expression, and suround them with () in the generated RegEx.
>  
>
Currently it looks like this:

                if (!"regex".equals(child.getAttributeValue("mode"))) {
                    // @todo we probably need more escaped here
                    sourceFilesAttrib = 
sourceFilesAttrib.replaceAll("\\.", "\\.");
                    sourceFilesAttrib = 
sourceFilesAttrib.replaceAll("\\*", "(.*)");
                    sourceFilesAttrib = 
sourceFilesAttrib.replaceAll("\\?", "(.)");
                }

I have to add the other regexp special characters for the escapes.

>>I find the glob format a lot easier for simple things
>>    
>>
>
>ACK
>
>  
>
>>like matching file names and I think there are many people who use XWeb 
>>but don't know much about RegExp. Forgetting to escape the dot would be 
>>a first problem.
>>    
>>
>
>  
>
>>[adding the file name also as id]
>>    
>>
>
>[Discussing the ID generation when copying multipel files at once]
>  
>
>>I am a bit afraid of namespace pollution. If you just use a file name as 
>>ID, there are lots of IDs generated. And if you want to use the file 
>>name anyway, you can just put it into a URL. The internal linking 
>>    
>>
>
>I wanted the ID to make sure the link is correct (I will get a warning if
>it is not because of a typo). Maybe a link checker for internal link would
>solve this?
>  
>
The problem is that this gets resolved in the stylesheet. I added some 
checking into the latest versions I used. I think they never made it 
back into XWeb -- I'll check that and add them if necessary. But the 
only way I found to give feedback is to put stuff on stdout via 
xsl:message -- which gets lost in XWeb's verbosity. I think the problem 
can only be fixed by a better reporting in general. One idea would be to 
collect all stdout from the stylesheets (if possible) and to enlist it 
in the end independent from the rest of the feedback. Another idea would 
be reducing the verbosity of XWeb itself.

>>Another option would be doing both with optional ID generation, possibly
>>with an id pattern attribute on the <entryset>. This could look like this:
>>  <fileset sourceFiles="*.xml" targetFiles="$1.pdf" type="docbookPDF"
>>ids="pdf_$1"/>
>>The @ids would be optional and no ids would be generated if it is missing.
>>    
>>
>
>Thats what I was thinking anyway...
>  
>
Haven't done that bit yet, will add it now. I got this far:

 <fileset sourceFiles="*.png" type="copy"/>
 <fileset sourceFiles="*.png" targetFiles="$1_th.png" type="thumbnail"/>

Which copies a bunch of images and generates a set of thumbnails. My 
example application is a gallery section for an XWeb site, where I want 
to be able to drop in an PNG or an SVG and get it added to a thumbnail 
page as well as copied across.

Note that I also reduce the need for attributes -- the @targetFile(s) is 
now optional everywhere, it defaults to the file name of the source(s).

Are there any good ideas around how to determine order for the 
corresponding <entryset>s? I want to do something like this:

  <entryset sourceFiles="*.xhtml" targetFiles="$1.html" names="$1" 
ids="$1" type="XHTML"/>

Which should fill up a whole section with XHTML documents, using the 
file names as part of the navigation. In that case order is somehow 
relevant, and it is neither alphabetical nor chronological. One idea I 
had was going alphabetical and then using this instead:

  <entryset mode="regex" sourceFiles="(\d*)(\D.*).xhtml" 
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>

Which should work, but the syntax is not really obivous unless someone 
knows regex quite well. The input files would then have to have 
preceeding numbers indicating their order. Alternatively this could be used:

  <entryset mode="regex" sourceFiles="(.)(.*).xhtml" 
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>

Which is the same trick with a single leading character to indicate 
order. Another variant would be a prefix separated by some character, e.g.:

  <entryset mode="regex" sourceFiles="([^ ]) (.*).xhtml" 
targetFiles="$2.html" names="$2" ids="$2" type="XHTML"/>

Any other ideas? Any ideas how to make that approach more accessible, 
i.e. to get the same without the regexs? I don't really mind naming the 
files in a certain scheme, but judging from the user feedback many find 
the environment variables a killer, so I don't want to come up with 
redular expressions :-)

   Peter