Re: [Xmldb-org-general] Native XML DB for storing and querying large files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Jochen Wiedmann <joc...@fr...> wrote:

If it wouldn't be for Open source, I'd recommend my employers database 
Tamino.

Thanks for the sugestion. I saw that Tamino is one of the most known product in this field.

> > And I wrote implementation of XML:DB which is able to store and query 
> > large files.
> I'd be interested how you did implement the query? 

Okey, my supervisor said to me that I have the right to share information about it.

I call that project : myx 

Myx is implementation of XML:DB API (XML database repository layer) using MySQL. Querying is possible using dom4j and Saxon. Myx is capable of storing and querying large documents.

So, I just store whole XML file into MySQL CLOB's, and implemented XML:DB. You might ask : Why don't you use file system? MySQL have privilegies and user/password, and it is easyly to put database to the Internet. We could also use FTP server for that, I must agree. At the moment we have two columns: name char(256), content largetext. But we might add some extra information, i.e. some indexing in general, we can extract information using XSTL and add some extra information to every XML document. Then easily querying a lot of documents.

I did two query implementation. One is with Saxon and other one is with dom4j. Saxon is good because it is able to do XPath for large files.

Today I added support for xmldbGUI (thanks to Jeff). xmldbGUI It is still a bit buggy. I.e. when you try to export large document, you get OutOfMemoryError, because xmldbGUI use DOM to export XMLResource (damn DOM, it makes problem for large files!!!).

Here is implementation of XMLDB's query resource service. Whole project's source  might be OpenSource, if somebody wants it (send me email).

   /**
     * @see org.xmldb.api.modules.XPathQueryService#queryResource(java.lang.String, java.lang.String)
     */
    public ResourceSet queryResource(String id, String query)
            throws XMLDBException {
        String content = (String) col.getResource(id).getContent();
        InputSource is=new InputSource(new StringReader(content));
        List matched;
        SAXSource ss = new SAXSource(is);
        try {
        XPathEvaluator xpe = new XPathEvaluator(ss);

        //         Compile the XPath expressions used by the application
        XPathExpression extract = xpe.createExpression(query);
        matched = extract.evaluate();
        } catch(XPathException e) {
            e.printStackTrace();
            throw new XMLDBException(ErrorCodes.VENDOR_ERROR, "XPathException");
        }

        ResourceSetImpl rs=null;

        //we have to convert this list of Nodes into ResourceSet
        rs = new ResourceSetImpl(col);

        for (Iterator iter = matched.iterator(); iter.hasNext();) {
            // Get the next matching line
            NodeInfo ni = (NodeInfo) iter.next();
            XMLResourceImpl xmlResource = new XMLResourceImpl(col,
                    "QueryResult", "QueryResult", ni.getStringValue());
            rs.addResource(xmlResource);

        }
        return rs;
    }

> To me the 
> specification of XML:DB seems to be problematic because of the Iterator? 

It is not problematic.

> To me one would prefer an event based model, which could allow to pass 
> control on the input stream to the caller at the beginning of the file?

Well, ... this I can't understand!

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com