From: Makoto O. <on...@ac...> - 2007-02-18 09:17:39
|
Dinesh, I fixed the xsort bug and modified filestream.cpp so that xsort detects errors of the file system such as "no space left on devices in /tmp". The latest release is xmlltk1.11. URL http://sourceforge.net/projects/xmltk makoto From: "Venkataramanaidu, Dinesh (Dinesh)** CTR **" <vd...@al...> Subject: RE: [Xmltk-devs] Sorting a huge XML file Date: Wed, 24 Jan 2007 18:13:51 +0530 Message-ID: <3BE...@ii...> >Hello Makoto, > >It's wonderful that you could run series of tests and able to simulate the >problem. > >We are actually working on a prototype which is purely based on >XML streams. XML sorting is an important step in the prototype. We are >looking at using >xsort utility to perform this stream based XML sorting, provided it can >perform >sorting using reasonable memory. > >Based on your solution we ran few more tests to study the behavior of xsort >utility further. Attached please find the details of our test results. In >summary >we found that the amount of memory size to be allocated is roughly >equivalent >to the size of the blocks we are sorting. For example to sort blocks of 1GB >we need to have ~900MB of RAM pre-allocated. > >Hope you will agree with me that Pre allocating such a huge memory in >advance >for sorting streams is not very nice. Can we do something in >disk-sort-merge >algorithm to alleviate this problem?. Is this something very difficult bug >to fix? > >I once again truly appreciate your help in this regards. > >Thanks, >Dinesh.V > > >-----Original Message----- >From: Makoto ONIZUKA [mailto:oni...@la...] >Sent: Sunday, January 21, 2007 11:16 AM >To: vd...@al... >Cc: xml...@li... >Subject: Re: [Xmltk-devs] Sorting a huge XML file > > >Dinesh, > >I hit the same problem. As far as I know/guess: >- the initial memory size of xsort is fixed at 38MB. >- xsort with default memory setting works if the output > size doesn't exceed around 100MB. >- so, there are some bugs in disk-sort-merge algorithm > in xsort. I can't fix it soon. > >A easiest solution is to set the memory size. >xsort -m {memory size MB} -c "top" -e "*" -k ...; > >makoto > > >From: Makoto ONIZUKA <oni...@la...> >Subject: Re: [Xmltk-devs] Sorting a huge XML file >Date: Sat, 20 Jan 2007 15:48:58 +0900 (JST) >Message-ID: <200...@la...> > >>Dinesh, >> >>I have tested the xsort using a synthetic XML data >>that satisfies your condition; the total size of >>BLOCK1 and BLOCK2 exceeds 100MB. >> >>Two exapmles below work correctly. >> xsort -c "top" -e "*" -k ... >> xsort -c "top/*" -e "*" -k ... >> >>So, can you give me more details of your example? >> >>makoto >> >> >>From: "Venkataramanaidu, Dinesh (Dinesh)** CTR **" ><vd...@al...> >>Subject: [Xmltk-devs] Sorting a huge XML file >>Date: Wed, 10 Jan 2007 15:37:22 +0530 >>Message-ID: ><3BE...@ii...> >> >>>Hello XML Toolkit developers, >>> >>>While looking for a tool to sort XML tags i hit upon >>>xsort utility in sourceforge. This tool impressed me very much after >>>my initial testing of this tool with sample XML files. >>>Now i am trying to use xsort utility against bigger XML files, >>>and encountering a problem. >>> >>>In the below XML template, if the total size of descendents of BLOCK1(or >>>BLOCK2) >>>exceeds ~100MB then running xsort on this XML file simply returns <TOP/>, >>>while the same thing works perfectly fine when the descendents >>>size less than 100MB.(For my testing, I am just embedding XML files of >>>certain >>>size as the descendents for each of the BLOCK.) >>> >>> >>><TOP> >>> <BLOCK1 id="1"> >>> ...... >>> </BLOCK1> >>> <BLOCK2 id="2"> >>> ...... >>> </BLOCK2> >>></TOP> >>> >>>Inputs from your side regarding this issue will be of great help to me to. >>>My advance thank for your inputs. >>> >>>Thanks, >>>Dinesh.V >>> >>>------------------------------------------------------------------------- >>>Take Surveys. Earn Cash. Influence the Future of IT >>>Join SourceForge.net's Techsay panel and you'll get the chance to share >your >>>opinions on IT & business topics through brief surveys - and earn cash >>>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>>_______________________________________________ >>>Xmltk-devs mailing list >>>Xml...@li... >>>https://lists.sourceforge.net/lists/listinfo/xmltk-devs |