|
From: Andrej v. d. Z. <and...@gm...> - 2010-03-26 10:47:19
|
Hi,
I am a happy user of libxmlpp for some time now. One thing I could not
find out yet. How can I ignore empty text nodes? The problem is that I
construct a dom document by reading two XML files and merging them
together with import_node(). I also remove some nodes manually with
remove_child(). Finally I do a doc->write_to_stream_formatted(cout,
"UTF-8") and end up with ugly output with holes in them like this:
<?xml version="1.0" encoding="UTF-8"?>
<session xmlns="XXX" version="1.0" clientName="Belastingdienst"
projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO
TEST3">
<hosts>
<host name="apmvsq1" cluster="apmvsp1+2" type="mainframe"
smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser"
hardware="ToBeOverWrittenByParser"
serviceLevel="ToBeOverwittenByParser" cpuSpeed="1"
cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015"
cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/>
</hosts>
<msrs><msr xmlns=XXX" type="SU" guiName="SU1"
ignoreEteValidator="0"><parser-logs><ete
filename="SU1/20100304-GBV-M2-SU1.ete"/><resource
filename="SU1/20100304-GBV-M2-SU1.mf"
hostname="apmvsq1"/></parser-logs></msr></msrs></session>
I understand that these are empty text-nodes, but I wish to ignore
them. How can I do that without writing my own version of
write_to_stream_formatted()? I just wish to ignore them altogether.
Thank you,
Andrej
|
|
From: Murray C. <mu...@mu...> - 2010-03-26 10:52:12
|
On Fri, 2010-03-26 at 19:47 +0900, Andrej van der Zee wrote: > Hi, > > I am a happy user of libxmlpp for some time now. One thing I could not > find out yet. How can I ignore empty text nodes? By checking for them in your application when you read the XML document. It's entirely up to your application to decide whether white space is interesting. > The problem is that I > construct a dom document by reading two XML files and merging them > together with import_node(). I also remove some nodes manually with > remove_child(). Finally I do a doc->write_to_stream_formatted(cout, > "UTF-8") and end up with ugly output with holes in them like this: > > <?xml version="1.0" encoding="UTF-8"?> > <session xmlns="XXX" version="1.0" clientName="Belastingdienst" > projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO > TEST3"> > <hosts> > <host name="apmvsq1" cluster="apmvsp1+2" type="mainframe" > smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser" > hardware="ToBeOverWrittenByParser" > serviceLevel="ToBeOverwittenByParser" cpuSpeed="1" > cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015" > cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/> > > > > > > > > </hosts> > <msrs><msr xmlns=XXX" type="SU" guiName="SU1" > ignoreEteValidator="0"><parser-logs><ete > filename="SU1/20100304-GBV-M2-SU1.ete"/><resource > filename="SU1/20100304-GBV-M2-SU1.mf" > hostname="apmvsq1"/></parser-logs></msr></msrs></session> > > > I understand that these are empty text-nodes, but I wish to ignore > them. How can I do that without writing my own version of > write_to_stream_formatted()? I just wish to ignore them altogether. > > Thank you, > Andrej > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Libxmlplusplus-general mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libxmlplusplus-general -- mu...@mu... www.murrayc.com www.openismus.com |
|
From: Andrej v. d. Z. <and...@gm...> - 2010-03-26 11:07:15
|
Hi, > > Yes. I guess you are doing that somehow anyway during your "merge" of > two documents. Actually I am using import_node(). > I guess I'd accept a patch that adds an > Element::remove_empty_text_nodes() method. The you could call that on > get_root_node(). I will have a look when I have time. Though, I am doubting the correctness of the "doc->write_to_stream_formatted()" method now. I expect the output to be formatted, but if you look at the last few nodes, they are not formatted at all. See below. Am I misunderstanding something? Thank you, Andrej -------------- output of write_to_stream_formatted(), is it really formatted? ---------------- <?xml version="1.0" encoding="UTF-8"?> <session xmlns="http://www.contentional.eu" version="1.0" clientName="Belastingdienst" projectName="GBV" phase="1" iteration="1" sessionName="M2 - AUTO TEST3"> <hosts> <host name="apmvsq1" cluster="apmvsp1+2" type="mainframe" smtEnabled="0" numCpus="666" opsys="ToBeOverwittenByParser" hardware="ToBeOverWrittenByParser" serviceLevel="ToBeOverwittenByParser" cpuSpeed="1" cpuSpeedBenchmark="MIPS" cacheHitPerc="95" cacheHitTime="0.0015" cacheMissTime="0.005" monitoringTool="ascb" timeDiff="67"/> </hosts> <msrs><msr xmlns="http://www.contentional.eu" type="SU" guiName="SU1" ignoreEteValidator="0"><parser-logs><ete filename="SU1/20100304-GBV-M2-SU1.ete"/><resource filename="SU1/20100304-GBV-M2-SU1.mf" hostname="apmvsq1"/></parser-logs></msr></msrs></session> |
|
From: Murray C. <mu...@mu...> - 2010-03-26 11:12:06
|
On Fri, 2010-03-26 at 20:07 +0900, Andrej van der Zee wrote: > Though, I am doubting the correctness of the > "doc->write_to_stream_formatted()" method now. I expect the output to > be formatted, but if you look at the last few nodes, they are not > formatted at all. See below. Am I misunderstanding something? I stopped using it in Bakery (now in Glom) too. I think it just gives up when it finds child text nodes, because it can't generically know if they should be indented with white space. I did this: http://git.gnome.org/browse/bakery/commit/?id=54d85442f58228609a147f461934b75387fa8d7d -- mu...@mu... www.murrayc.com www.openismus.com |
|
From: Knut A. R. <kn...@if...> - 2010-03-27 15:19:54
|
* Andrej van der Zee
> I am a happy user of libxmlpp for some time now. One thing I could not
> find out yet. How can I ignore empty text nodes?
[...]
Hi.
I have used the following XPath-based technique to remove such nodes in
order to "normalize" XML-documents that are stored in pretty-printed
format. The reason I had them stored pretty-printed, was to improve
readability in a test suite. I would not use pretty-printing on XML that
is going directly between machines.
void remove_whitespace_nodes(xmlpp::Document & doc)
{
xmlpp::NodeSet whitespace_nodes =
doc.get_root_node()->find("//text()[normalize-space()='']");
for (xmlpp::NodeSet::const_iterator it = whitespace_nodes.begin();
it != whitespace_nodes.end();
++it)
{
(*it)->get_parent()->remove_child(*it);
}
}
Make sure the XPath above does not identify nodes that carries meaning
to your application, otherwise you would have to refine it somehow.
--
Sincerely,
Knut Aksel Røysland
|
|
From: Andrej v. d. Z. <and...@gm...> - 2010-03-28 01:46:38
|
Hi,
>
> void remove_whitespace_nodes(xmlpp::Document & doc)
> {
> xmlpp::NodeSet whitespace_nodes =
> doc.get_root_node()->find("//text()[normalize-space()='']");
> for (xmlpp::NodeSet::const_iterator it = whitespace_nodes.begin();
> it != whitespace_nodes.end();
> ++it)
> {
> (*it)->get_parent()->remove_child(*it);
> }
> }
>
Thank you, that's a nice one!
Greets,
Andrej
|