From: Andrej v. d. Z. <and...@gm...> - 2010-03-26 10:47:19
|
Hi, I am a happy user of libxmlpp for some time now. One thing I could not find out yet. How can I ignore empty text nodes? The problem is that I construct a dom document by reading two XML files and merging them together with import_node(). I also remove some nodes manually with remove_child(). Finally I do a doc->write_to_stream_formatted(cout, "UTF-8") and end up with ugly output with holes in them like this: <?xml version="1.0" encoding="UTF-8"?> <session xmlns="XXX" version="1.0" clientName="Belastingdienst" projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO TEST3"> <hosts> <host name="apmvsq1" cluster="apmvsp1+2" type="mainframe" smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser" hardware="ToBeOverWrittenByParser" serviceLevel="ToBeOverwittenByParser" cpuSpeed="1" cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015" cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/> </hosts> <msrs><msr xmlns=XXX" type="SU" guiName="SU1" ignoreEteValidator="0"><parser-logs><ete filename="SU1/20100304-GBV-M2-SU1.ete"/><resource filename="SU1/20100304-GBV-M2-SU1.mf" hostname="apmvsq1"/></parser-logs></msr></msrs></session> I understand that these are empty text-nodes, but I wish to ignore them. How can I do that without writing my own version of write_to_stream_formatted()? I just wish to ignore them altogether. Thank you, Andrej |
From: Murray C. <mu...@mu...> - 2010-03-26 10:52:12
|
On Fri, 2010-03-26 at 19:47 +0900, Andrej van der Zee wrote: > Hi, > > I am a happy user of libxmlpp for some time now. One thing I could not > find out yet. How can I ignore empty text nodes? By checking for them in your application when you read the XML document. It's entirely up to your application to decide whether white space is interesting. > The problem is that I > construct a dom document by reading two XML files and merging them > together with import_node(). I also remove some nodes manually with > remove_child(). Finally I do a doc->write_to_stream_formatted(cout, > "UTF-8") and end up with ugly output with holes in them like this: > > <?xml version="1.0" encoding="UTF-8"?> > <session xmlns="XXX" version="1.0" clientName="Belastingdienst" > projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO > TEST3"> > <hosts> > <host name="apmvsq1" cluster="apmvsp1+2" type="mainframe" > smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser" > hardware="ToBeOverWrittenByParser" > serviceLevel="ToBeOverwittenByParser" cpuSpeed="1" > cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015" > cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/> > > > > > > > > </hosts> > <msrs><msr xmlns=XXX" type="SU" guiName="SU1" > ignoreEteValidator="0"><parser-logs><ete > filename="SU1/20100304-GBV-M2-SU1.ete"/><resource > filename="SU1/20100304-GBV-M2-SU1.mf" > hostname="apmvsq1"/></parser-logs></msr></msrs></session> > > > I understand that these are empty text-nodes, but I wish to ignore > them. How can I do that without writing my own version of > write_to_stream_formatted()? I just wish to ignore them altogether. > > Thank you, > Andrej > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Libxmlplusplus-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libxmlplusplus-general -- mu...@mu... www.murrayc.com www.openismus.com |
From: Andrej v. d. Z. <and...@gm...> - 2010-03-26 11:07:15
|
Hi, > > Yes. I guess you are doing that somehow anyway during your "merge" of > two documents. Actually I am using import_node(). > I guess I'd accept a patch that adds an > Element::remove_empty_text_nodes() method. The you could call that on > get_root_node(). I will have a look when I have time. Though, I am doubting the correctness of the "doc->write_to_stream_formatted()" method now. I expect the output to be formatted, but if you look at the last few nodes, they are not formatted at all. See below. Am I misunderstanding something? Thank you, Andrej -------------- output of write_to_stream_formatted(), is it really formatted? ---------------- <?xml version="1.0" encoding="UTF-8"?> <session xmlns="http://www.contentional.eu" version="1.0" clientName="Belastingdienst" projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO TEST3"> <hosts> <host name="apmvsq1" cluster="apmvsp1+2" type="mainframe" smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser" hardware="ToBeOverWrittenByParser" serviceLevel="ToBeOverwittenByParser" cpuSpeed="1" cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015" cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/> </hosts> <msrs><msr xmlns="http://www.contentional.eu" type="SU" guiName="SU1" ignoreEteValidator="0"><parser-logs><ete filename="SU1/20100304-GBV-M2-SU1.ete"/><resource filename="SU1/20100304-GBV-M2-SU1.mf" hostname="apmvsq1"/></parser-logs></msr></msrs></session> |
From: Murray C. <mu...@mu...> - 2010-03-26 11:12:06
|
On Fri, 2010-03-26 at 20:07 +0900, Andrej van der Zee wrote: > Though, I am doubting the correctness of the > "doc->write_to_stream_formatted()" method now. I expect the output to > be formatted, but if you look at the last few nodes, they are not > formatted at all. See below. Am I misunderstanding something? I stopped using it in Bakery (now in Glom) too. I think it just gives up when it finds child text nodes, because it can't generically know if they should be indented with white space. I did this: http://git.gnome.org/browse/bakery/commit/?id=54d85442f58228609a147f461934b75387fa8d7d -- mu...@mu... www.murrayc.com www.openismus.com |
From: Knut A. R. <kn...@if...> - 2010-03-27 15:19:54
|
* Andrej van der Zee > I am a happy user of libxmlpp for some time now. One thing I could not > find out yet. How can I ignore empty text nodes? [...] Hi. I have used the following XPath-based technique to remove such nodes in order to "normalize" XML-documents that are stored in pretty-printed format. The reason I had them stored pretty-printed, was to improve readability in a test suite. I would not use pretty-printing on XML that is going directly between machines. void remove_whitespace_nodes(xmlpp::Document & doc) { xmlpp::NodeSet whitespace_nodes = doc.get_root_node()->find("//text()[normalize-space()='']"); for (xmlpp::NodeSet::const_iterator it = whitespace_nodes.begin(); it != whitespace_nodes.end(); ++it) { (*it)->get_parent()->remove_child(*it); } } Make sure the XPath above does not identify nodes that carries meaning to your application, otherwise you would have to refine it somehow. -- Sincerely, Knut Aksel Røysland |
From: Andrej v. d. Z. <and...@gm...> - 2010-03-28 01:46:38
|
Hi, > > void remove_whitespace_nodes(xmlpp::Document & doc) > { > xmlpp::NodeSet whitespace_nodes = > doc.get_root_node()->find("//text()[normalize-space()='']"); > for (xmlpp::NodeSet::const_iterator it = whitespace_nodes.begin(); > it != whitespace_nodes.end(); > ++it) > { > (*it)->get_parent()->remove_child(*it); > } > } > Thank you, that's a nice one! Greets, Andrej |