From: Venkataramanaidu, D. (Dinesh)** C. ** <vd...@al...> - 2007-01-24 12:44:48
|
Hello Makoto, It's wonderful that you could run series of tests and able to simulate the problem. We are actually working on a prototype which is purely based on XML streams. XML sorting is an important step in the prototype. We are looking at using xsort utility to perform this stream based XML sorting, provided it can perform sorting using reasonable memory. Based on your solution we ran few more tests to study the behavior of xsort utility further. Attached please find the details of our test results. In summary we found that the amount of memory size to be allocated is roughly equivalent to the size of the blocks we are sorting. For example to sort blocks of 1GB we need to have ~900MB of RAM pre-allocated. Hope you will agree with me that Pre allocating such a huge memory in advance for sorting streams is not very nice. Can we do something in disk-sort-merge algorithm to alleviate this problem?. Is this something very difficult bug to fix? I once again truly appreciate your help in this regards. Thanks, Dinesh.V -----Original Message----- From: Makoto ONIZUKA [mailto:oni...@la...] Sent: Sunday, January 21, 2007 11:16 AM To: vd...@al... Cc: xml...@li... Subject: Re: [Xmltk-devs] Sorting a huge XML file Dinesh, I hit the same problem. As far as I know/guess: - the initial memory size of xsort is fixed at 38MB. - xsort with default memory setting works if the output size doesn't exceed around 100MB. - so, there are some bugs in disk-sort-merge algorithm in xsort. I can't fix it soon. A easiest solution is to set the memory size. xsort -m {memory size MB} -c "top" -e "*" -k ...; makoto From: Makoto ONIZUKA <oni...@la...> Subject: Re: [Xmltk-devs] Sorting a huge XML file Date: Sat, 20 Jan 2007 15:48:58 +0900 (JST) Message-ID: <200...@la...> >Dinesh, > >I have tested the xsort using a synthetic XML data >that satisfies your condition; the total size of >BLOCK1 and BLOCK2 exceeds 100MB. > >Two exapmles below work correctly. > xsort -c "top" -e "*" -k ... > xsort -c "top/*" -e "*" -k ... > >So, can you give me more details of your example? > >makoto > > >From: "Venkataramanaidu, Dinesh (Dinesh)** CTR **" <vd...@al...> >Subject: [Xmltk-devs] Sorting a huge XML file >Date: Wed, 10 Jan 2007 15:37:22 +0530 >Message-ID: <3BE...@ii...> > >>Hello XML Toolkit developers, >> >>While looking for a tool to sort XML tags i hit upon >>xsort utility in sourceforge. This tool impressed me very much after >>my initial testing of this tool with sample XML files. >>Now i am trying to use xsort utility against bigger XML files, >>and encountering a problem. >> >>In the below XML template, if the total size of descendents of BLOCK1(or >>BLOCK2) >>exceeds ~100MB then running xsort on this XML file simply returns <TOP/>, >>while the same thing works perfectly fine when the descendents >>size less than 100MB.(For my testing, I am just embedding XML files of >>certain >>size as the descendents for each of the BLOCK.) >> >> >><TOP> >> <BLOCK1 id="1"> >> ...... >> </BLOCK1> >> <BLOCK2 id="2"> >> ...... >> </BLOCK2> >></TOP> >> >>Inputs from your side regarding this issue will be of great help to me to. >>My advance thank for your inputs. >> >>Thanks, >>Dinesh.V >> >>------------------------------------------------------------------------- >>Take Surveys. Earn Cash. Influence the Future of IT >>Join SourceForge.net's Techsay panel and you'll get the chance to share your >>opinions on IT & business topics through brief surveys - and earn cash >>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >>_______________________________________________ >>Xmltk-devs mailing list >>Xml...@li... >>https://lists.sourceforge.net/lists/listinfo/xmltk-devs >> |