Re: [Atox-user] Question about parsing?
Status: Pre-Alpha
Brought to you by:
mlh
From: Magnus L. H. <ma...@he...> - 2004-04-23 15:28:23
|
Gregory Serpolet <GREGORY.SERPOLET@BULL.NET>: > > Hello to all atox users, Hi! Not that many of us, yet ;) > I'm trying to parse an Ascii file into an XML one. My XML output > format has already been defined. In that case, Atox seems like a possible solution. (I should warn you that the current implementation is a bit slow if your format is reasonably complex and your files are sort of big... I *hope* to improve this in the future, but it might mean recoding large chunks in C or the like, so don't count on it :) > I whish to insert some contents of the ascii file into some specific > XML tag. Right. > The problem is that some of the tags are inserted at the beginning of > the text and the ascii file's structure doesn't follow the same > decomposition. I see. It seems like XSLT would be useful here. If you check out the upcoming 0.5 release (a working version with documentation can be found in CVS) you'll see that XSLT fragments can now be used inside Atox format files. [snip] > I'm just discovering you tool "atox" and i see that 's a top-down a > parser. Yes, the Atox parsing itself is mainly suited for making the structure that exists in the file explicit. However, once you've done that, XSLT can do almost anything. Atox has been designed with this in mind -- anything you can easily do in XSLT is not a priority in Atox. But, as I said, in 0.5 you can now put XSLT templates inside your format file, to keep everything in one file and make sure your Atox output is correct (i.e. you won't have to go through a semi-correct format before using XSLT to "fix it"). You can, of course, also use XSLT afterward to create various outputs from your XML format (e.g. an XHTML representation or the like). > Is there some subtle solutions which could perform this kind of > parsing. OK, I'm not sure I understand your file format 100% (e.g., are there more than one bug report in the ASCII file?) but I'll just give an example of how you can reorder stuff (using a simplified version of your format). I can try to work with your exact format if you give me some details :) So, let's assume the ASCII file only contains the following (without the indent): Bug Report: blablabla Machine ID: 20234165 Date: 31616561 Here is a possible format file (without the ax:format stuff): <Collect> <Bug> <ax:del>Bug Report:\s+</ax:del> (?=\n) </Bug> <MachineID> <ax:del>Machine ID:\s+</ax:del> (?=\n) </MachineID> <Date> <ax:del>Date:\s+</ax:del> (?=\n|\Z) </Date> </Collect> <xsl:template match="Collect"> <Collect> <xsl:apply-templates select="MachineID"/> <xsl:apply-templates select="Date"/> <xsl:apply-templates select="Bug"/> </Collect> </xsl:template> Note that my XSLT here probably isn't ideal -- it doesn't process the whitespace in the Collect element, so all three child-elements will be put on a single line -- but it demonstrates how this can be done, at least. (To use xsl templates in Atox, you'll have to add xmlns:xsl="http://www.w3.org/1999/XSL/Transform" to the ax:format tag, alongside the xmlns:ax declaration.) Feel free to ask if you need clarifications, or if you need any features that aren't present. -- Magnus Lie Hetland "Oppression and harassment is a small price to pay http://hetland.org to live in the land of the free." -- C. M. Burns |