From: Dave N. <dc...@us...> - 2006-06-20 03:53:34
|
Maynard and I have been working on the XML schema that was proposed several months ago and have made some refinements to it. The opreport -X implementation was based on the initial patch post by Juinichi, and is a first attempt so further refinement is anticipated. The only user interface modification (in addition to the -X option) is a --verbose=xml option which causes some extra information that might be useful for debuggin a problem. Included in the attachment is: - a patch file, xml_patch, to a Oprofile CVS snapshot dated 05-05-06 (I was unable to get past cvs_login to get a more recent snapshot). - the updated schema, oprof.6.xsd which describes the XML that is produced by opreport -X - some example output generated by opreport -X We have made some efforts to streamline the XML that was in the original schema but haven't made any efforts to shorten the names of the XML tags. It has been suggested that a tool like gzip could probably compress the size of the XML filedown to a similar size as an XML file with abbreviations. If this is not sufficient it would be pretty easy to generate abbreviated tags. The main change that was made to the schema was to get rid of extraneous nesting levels, and to add summary(aggregated) sample data for each container: Process, Application, Module, Symbol. I will be out of town until July, but Maynard should be available to field any questions/comments. |
From: Dave N. <dc...@us...> - 2006-06-20 03:57:59
Attachments:
xml_patch.tar
|
Maynard and I have been working on the XML schema that was proposed several months ago and have made some refinements to it. The opreport -X implementation was based on the initial patch post by Juinichi, and is a first attempt so further refinements are anticipated. The only user interface modification (in addition to the -X option) is a --verbose=xml option which causes some extra information that might be useful for debugging a problem. Sorting options are not allowed in conjunction with -X because the structure of the XML file imposes either a Process/Thread/Module (when using --separate=thread) or Application/Module hierarchy on the structure of the output. Included in the attachment is: - a patch file, xml_patch, to a Oprofile CVS snapshot dated 05-05-06 (I was unable to get past cvs_login to get a more recent snapshot). - the updated schema, oprof.6.xsd which describes the XML that is produced by opreport -X - some example output generated by opreport -X We have made some efforts to streamline the XML that was in the original schema but haven't made any efforts to shorten the names of the XML tags. It has been suggested that a tool like gzip could probably compress the size of the XML filedown to a similar size as an XML file with abbreviations. If this is not sufficient it would be pretty easy to generate abbreviated tags. The main change that was made to the schema was to get rid of extraneous nesting levels, and to add summary(aggregated) sample data for each container: Process, Application, Module, Symbol. I will be out of town until July, but Maynard should be available to field any questions/comments. |
From: John L. <le...@mo...> - 2006-06-20 09:44:30
|
On Mon, Jun 19, 2006 at 08:59:38PM -0700, Dave Nomura wrote: > Maynard and I have been working on the XML schema that was proposed > several months ago and have made some refinements to it. The opreport One initial comment: XML has ways to refer to other document elements, but you're not using them, instead you have an integer index. I /think/ that the libxml APIs make it easier to look up elements based on an element id, could you investigate whether the references should be XML-like, please? thanks john |
From: John L. <le...@mo...> - 2006-06-20 10:05:43
|
On Tue, Jun 20, 2006 at 10:44:22AM +0100, John Levon wrote: > > Maynard and I have been working on the XML schema that was proposed > > several months ago and have made some refinements to it. The opreport > Second one: <SummaryData count="1,,"/> Even though it's more verbose, this needs to be split somehow. How is a parser supposed to know what these values mean? Is it supposed to notice the "separated_cpus" ? SummaryData is a weird, overly generic name. Shouldn't it look something like: <classes type="cpu" size="4"> <cpu id="c1">1</cpu> <!-- can expand later to include core, socket ... --> ... </classes> <symbol id="1"> <count class="c1">34</count> <count class="c3">34</count> </symbol> Then this whole separate "event_num" thing shouldn't be necessary, a count always refers to the class, which has more information as needed. Currently you have this weird separation of class types of CPU, thread, process, event, as all different. This does not reflect the data well, look at our profile classes code - all of these are just classes. You should be able to use the scheme above for all these types instead of 'hardcoding' this stuff, e.g. <classes type="thread" size="X"> <thread id="c1" pid="454">2</thread> ... </classes> <symbol id="45"> <count class="c1">34</count> ... Any file names should always be full paths. Stuff like Module probably need to have id references to a module table too. "source_file" and "source_line" can both drop the "source_" bit. The examples have stuff like: <Profile> cpu_name="ppc64" platform="Linux-ppc64" MHz="1656.4" title="opreport -l session:dcnthr tgid:5590 -g -X " That's not even valid XML afaics. Have you considered using libxml to write out XML? The XML should all be lowercase. And your patch is backwards, I can't read it like this. regards john |
From: Maynard J. <may...@us...> - 2006-06-21 14:32:27
|
John, Thanks for the quick response. As my teammember, Dave Nomura, said, he'll be out of the office for a couple weeks or so, and I will answer questions and comments about his patch. I was tied up yesterday, but I hope to break free today or tomorrow to reply to your comments. Regards, -Maynard John Levon wrote: > On Tue, Jun 20, 2006 at 10:44:22AM +0100, John Levon wrote: > > >>>Maynard and I have been working on the XML schema that was proposed >>>several months ago and have made some refinements to it. The opreport >> > > Second one: > > <SummaryData count="1,,"/> > > Even though it's more verbose, this needs to be split somehow. > > How is a parser supposed to know what these values mean? Is it supposed > to notice the "separated_cpus" ? > > SummaryData is a weird, overly generic name. Shouldn't it look something like: > > <classes type="cpu" size="4"> > <cpu id="c1">1</cpu> <!-- can expand later to include core, socket ... --> > ... > </classes> > > <symbol id="1"> > <count class="c1">34</count> > <count class="c3">34</count> > </symbol> > > Then this whole separate "event_num" thing shouldn't be necessary, a > count always refers to the class, which has more information as needed. > Currently you have this weird separation of class types of CPU, thread, > process, event, as all different. This does not reflect the data well, > look at our profile classes code - all of these are just classes. You > should be able to use the scheme above for all these types instead of > 'hardcoding' this stuff, e.g. > > <classes type="thread" size="X"> > <thread id="c1" pid="454">2</thread> > ... > </classes> > > <symbol id="45"> > <count class="c1">34</count> > ... > > Any file names should always be full paths. Stuff like Module probably > need to have id references to a module table too. > > "source_file" and "source_line" can both drop the "source_" bit. > > The examples have stuff like: > > <Profile> > cpu_name="ppc64" > platform="Linux-ppc64" > MHz="1656.4" > title="opreport -l session:dcnthr tgid:5590 -g -X " > > That's not even valid XML afaics. Have you considered using libxml to > write out XML? > > The XML should all be lowercase. > > And your patch is backwards, I can't read it like this. > > regards > john > > > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Maynard J. <may...@us...> - 2006-06-22 22:35:41
Attachments:
oprof-xml-patch2
|
John Levon wrote: > On Tue, Jun 20, 2006 at 10:44:22AM +0100, John Levon wrote: > > >>>Maynard and I have been working on the XML schema that was proposed >>>several months ago and have made some refinements to it. The opreport >> > > Second one: > [snip] > > And your patch is backwards, I can't read it like this. Yikes! Maybe I "encouraged" Dave a bit too strongly to post his patch before he went on vacation. Attached is a patch that you all should be able to apply and play with. I will respond to your other comments in a separate note. I wanted to get this patch fixed up and made available first. I also fixed a few errors that the recent gcc complained about. Regards, -Maynard > > regards > john > > > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Junichi U. <da...@ne...> - 2006-06-21 12:13:37
|
Hi, A generic question: The output XML looks pretty much unparsable to the human eye (one needs to follow references to parse the output), is this really the way forward? A nit: <Profile> cpu_name="ppc64" separated_cpus="8" platform="Linux-ppc64" MHz="1656.4" title="opreport -l image:/home/dcn/work/test/round1,/home/dcn/work/test/round2 session:dcnall -g -X " should probably be <Profile cpu_name="ppc64" separated_cpus="8" platform="Linux-ppc64" MHz="1656.4" title="opreport -l image:/home/dcn/work/test/round1,/home/dcn/work/test/round2 session:dcnall -g -X " > regards, junichi -- dancer@{debian.org,netfort.gr.jp} Debian Project |
From: John L. <le...@mo...> - 2006-06-21 12:16:19
|
On Wed, Jun 21, 2006 at 09:13:34PM +0900, Junichi Uekawa wrote: > The output XML looks pretty much unparsable to the human eye (one > needs to follow references to parse the output), is this really the > way forward? The alternative is massive duplication of data, making the files huge. regards john |
From: Maynard J. <may...@us...> - 2006-06-23 21:59:05
|
Junichi Uekawa wrote: > Hi, > > A generic question: > > The output XML looks pretty much unparsable to the human eye (one > needs to follow references to parse the output), is this really the > way forward? > > > A nit: > > <Profile> > cpu_name="ppc64" > separated_cpus="8" > platform="Linux-ppc64" > MHz="1656.4" > title="opreport -l image:/home/dcn/work/test/round1,/home/dcn/work/test/round2 session:dcnall -g -X " > Profile is the root element. Its closing tag is at the end of the XML document. -Maynard > should probably be > > <Profile > cpu_name="ppc64" > separated_cpus="8" > platform="Linux-ppc64" > MHz="1656.4" > title="opreport -l image:/home/dcn/work/test/round1,/home/dcn/work/test/round2 session:dcnall -g -X " > > > > regards, > junichi |
From: John L. <le...@mo...> - 2006-06-24 02:21:42
|
On Fri, Jun 23, 2006 at 04:58:52PM -0500, Maynard Johnson wrote: > > <Profile> > > cpu_name="ppc64" > > separated_cpus="8" > > platform="Linux-ppc64" > > MHz="1656.4" > > title="opreport -l image:/home/dcn/work/test/round1,/home/dcn/work/test/round2 session:dcnall -g -X " > > > Profile is the root element. Its closing tag is at the end of the XML > document. However, this is not valid XML. Elements may only contain other elements, not random attributes. regards john |
From: Maynard J. <may...@us...> - 2006-06-23 23:54:20
|
John Levon wrote: > On Mon, Jun 19, 2006 at 08:59:38PM -0700, Dave Nomura wrote: > > >>Maynard and I have been working on the XML schema that was proposed >>several months ago and have made some refinements to it. The opreport > > > One initial comment: XML has ways to refer to other document elements, > but you're not using them, instead you have an integer index. I /think/ > that the libxml APIs make it easier to look up elements based on an > element id, could you investigate whether the references should be > XML-like, please? I won't pretend to be an XML guru, so if anyone on the list has any suggestions, fire away. One possible mechanism to use for referencing elements is xpath, which is included with libxml2. However, the syntax for referencing an entity seems pretty verbose. Adding the xpath into the xml document wouldn't be absolutely necessary, however. Since the element type being referenced will be known by the consuming tool (e.g, when processing a Symbol element, references will be for elements of type SymbolData), the tool itself could generate the xpath spec. We got many comments, both on this mailing list and from internal tool developers who are already developing some XML-based tools, that we needed to minimize the size of the XML instances. We even removed the "id" attribute from the referenced elements on the assumption that the consuming tool could internally store the elements in arrays and then reference numbers would simply index into those tables. Note: Since the xml parser returns a tree, this implies the consuming tool would (for performance reasons) probably create arrays holding pointers to referenced tree nodes(like SymbolData and DetailData elements). If we decide to use xpath in one way or another, the "id" attribute can easily be put back into the referenced elements. Perhaps we've gone too far in the direction of minimizing the size of the xml document. Feedback or suggestions from the community would be appreciated. Regards, -Maynard > > thanks > john > > > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: John L. <le...@mo...> - 2006-06-24 02:24:45
|
On Fri, Jun 23, 2006 at 06:54:10PM -0500, Maynard Johnson wrote: > I won't pretend to be an XML guru, so if anyone on the list has any > suggestions, fire away. One possible mechanism to use for referencing > elements is xpath, which is included with libxml2. However, the syntax XPath is for querying document structure, we do not need that. We only need references, i.e. an "id" based approach. > We got many comments, both on this mailing list and from internal tool > developers who are already developing some XML-based tools, that we > needed to minimize the size of the XML instances. We even removed the > "id" attribute from the referenced elements on the assumption that the > consuming tool could internally store the elements in arrays and then > reference numbers would simply index into those tables. Yet this is a very un-XML way of doing things. Look at the libxml2 API and compare approaches. It is very easy with 'id' and not very easy with an array. > Perhaps we've gone too far in the direction of minimizing the size of > the xml document. Feedback or suggestions from the community would be > appreciated. I would love to hear more from interested parties. regards john |
From: Maynard J. <may...@us...> - 2006-06-26 22:41:05
|
John Levon wrote: > On Tue, Jun 20, 2006 at 10:44:22AM +0100, John Levon wrote: > > >>>Maynard and I have been working on the XML schema that was proposed >>>several months ago and have made some refinements to it. The opreport >> > > Second one: > > <SummaryData count="1,,"/> > > Even though it's more verbose, this needs to be split somehow. > > How is a parser supposed to know what these values mean? Is it supposed > to notice the "separated_cpus" ? We struggled with this decision. Yes, the idea (to be explained in a TBD chapter on generating and using XML output) would have been to have the consuming tool key off from the presence of the sepatated_cpus attribute. When profiling a system with a large number of CPUs, this seemed like a good space saver. But I wouldn't put up much of a fight to separate this out per cpu. > > SummaryData is a weird, overly generic name. Shouldn't it look something like: This schema probably should have had more explanation along with it. SummaryData is exactly that. For a profile taken with --separate=all, the XML would contain a collection of Process elements with the following structure: <Process> <Thread> <Module> <Symbol> John, this is, in fact, the hierarachy you had outlined in your Feb 15 note to an earlier schema proposal of mine. In this earlier schema, I purposely did not include summary sample data at each level of the hierarchy since I felt this was information that could be computed by the consuming tool. However, the tool developers that reviewed the schema were adamant that they would want summary data at every level for the purpose of iterative parsing/processing of very large profiles. I felt that other tool developers would probably have a similar view, so we added SummaryData to the schema. > > <classes type="cpu" size="4"> > <cpu id="c1">1</cpu> <!-- can expand later to include core, socket ... --> > ... > </classes> > > <symbol id="1"> > <count class="c1">34</count> > <count class="c3">34</count> > </symbol> > > Then this whole separate "event_num" thing shouldn't be necessary, a > count always refers to the class, which has more information as needed. > Currently you have this weird separation of class types of CPU, thread, > process, event, as all different. This does not reflect the data well, > look at our profile classes code - all of these are just classes. You > should be able to use the scheme above for all these types instead of > 'hardcoding' this stuff, e.g. > > <classes type="thread" size="X"> > <thread id="c1" pid="454">2</thread> > ... > </classes> > > <symbol id="45"> > <count class="c1">34</count> > ... > > Any file names should always be full paths. There's a '--long-filenames' option to opreport. Our intent was to try to support the same options as opreport, as long as they made sense for XML output. > Stuff like Module probably need to have id references to a module table too. Yes, good point. > > "source_file" and "source_line" can both drop the "source_" bit. > > The examples have stuff like: > > <Profile> > cpu_name="ppc64" > platform="Linux-ppc64" > MHz="1656.4" > title="opreport -l session:dcnthr tgid:5590 -g -X " > > That's not even valid XML afaics. Have you considered using libxml to > write out XML? Hmmm . . . I used xmllint on the examples (cpu.xml, all.xml, lib.xml, and thr.xml) and they all pass validation. What part do you see that's not valid XML? As far as using libxml, I'll have to wait for Dave to reply to that. Not even sure if he investigated since I belive he used Junichi's patch (Feb 5) as a starting point. > > The XML should all be lowercase. No problem. Regards, -Maynard > > And your patch is backwards, I can't read it like this. > > regards > john > > > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: John L. <le...@mo...> - 2006-06-26 23:00:40
|
On Mon, Jun 26, 2006 at 05:40:55PM -0500, Maynard Johnson wrote: > > SummaryData is a weird, overly generic name. Shouldn't it look something like: > This schema probably should have had more explanation along with it. > SummaryData is exactly that. For a profile taken with --separate=all, > the XML would contain a collection of Process elements with the > following structure: > > <Process> > <Thread> > <Module> > <Symbol> I think this is probably OK. I do not quite see the need for SummaryData at each level, but I won't argue about that too much. However, it does need to be renamed to <count>. > > <classes type="cpu" size="4"> > > <cpu id="c1">1</cpu> <!-- can expand later to include core, socket ... --> > > ... > > </classes> > > > > <symbol id="1"> > > <count class="c1">34</count> > > <count class="c3">34</count> > > </symbol> > > > > Then this whole separate "event_num" thing shouldn't be necessary, a > > count always refers to the class, which has more information as needed. > > Currently you have this weird separation of class types of CPU, thread, > > process, event, as all different. This does not reflect the data well, > > look at our profile classes code - all of these are just classes. You > > should be able to use the scheme above for all these types instead of > > 'hardcoding' this stuff, e.g. You didn't comment on this. I can see the argument for the hierarchical approach for processes and threads, in fact, now you remind me. But event num, mask, CPU etc all need to be like I describe above, I think. > > Any file names should always be full paths. > There's a '--long-filenames' option to opreport. Our intent was to try > to support the same options as opreport, as long as they made sense for > XML output. I don't think it makes sense to have short file names in XML. It's a presentation feature, XML is not presentation. > > <Profile> > > cpu_name="ppc64" > > platform="Linux-ppc64" > > MHz="1656.4" > > title="opreport -l session:dcnthr tgid:5590 -g -X " > > > > That's not even valid XML afaics. Have you considered using libxml to > > write out XML? > Hmmm . . . I used xmllint on the examples (cpu.xml, all.xml, lib.xml, > and thr.xml) and they all pass validation. What part do you see that's > not valid XML? I guess it's valid since <Profile>'s contents are just "text", but it's certainly unstructured. Ask yourself how I would get the value of "MHz" out via XSLT selectors All these should be attributes upon the <profile> element. regards john |
From: Maynard J. <may...@us...> - 2006-06-30 18:28:58
|
John Levon wrote: > On Mon, Jun 26, 2006 at 05:40:55PM -0500, Maynard Johnson wrote: > > >>>SummaryData is a weird, overly generic name. Shouldn't it look something like: >> >>This schema probably should have had more explanation along with it. >>SummaryData is exactly that. For a profile taken with --separate=all, >>the XML would contain a collection of Process elements with the >>following structure: >> >> <Process> >> <Thread> >> <Module> >> <Symbol> > > > I think this is probably OK. I do not quite see the need for SummaryData > at each level, but I won't argue about that too much. However, it does > need to be renamed to <count>. > > >>><classes type="cpu" size="4"> >>><cpu id="c1">1</cpu> <!-- can expand later to include core, socket ... --> >>>... >>></classes> >>> >>><symbol id="1"> >>><count class="c1">34</count> >>><count class="c3">34</count> >>></symbol> >>> >>>Then this whole separate "event_num" thing shouldn't be necessary, a >>>count always refers to the class, which has more information as needed. >>>Currently you have this weird separation of class types of CPU, thread, >>>process, event, as all different. This does not reflect the data well, >>>look at our profile classes code - all of these are just classes. You >>>should be able to use the scheme above for all these types instead of >>>'hardcoding' this stuff, e.g. > > > You didn't comment on this. I can see the argument for the hierarchical > approach for processes and threads, in fact, now you remind me. But > event num, mask, CPU etc all need to be like I describe above, I think. Dave had actually investigated this approach at one point, but eventually backed away from it. IIRC, the reason he didn't stick with it was because he (and others who reviewed the schema) felt it important to place the sample data out of line from the Process->Thread->Module->Symbol elements. This doesn't seem that big a deal when the XML instance includes only basic sample data, but when the user asks for --details and/or --debug-info, the amount of data explodes. It's easy to see then why having summary data at the 4 levels of the hierarchy described above would be beneficial to a tool consuming the XML. The tool could actually display the top-level information while parsing the rest of the file for the detailed sample data. I believe there are other tool developers out there who are interested in an XML output for oprofile, and we'd be interested to hear your opinions on this. The schema design we came up with is certainly more complicated than it would be if we just kept all the sample data inline, directly under each Symbol element. Is the complexity worth it? Regards, -Maynard > > >>>Any file names should always be full paths. >> >>There's a '--long-filenames' option to opreport. Our intent was to try >>to support the same options as opreport, as long as they made sense for >>XML output. > > > I don't think it makes sense to have short file names in XML. It's a > presentation feature, XML is not presentation. Good point. We can always do this as the default. > > >>><Profile> >>> cpu_name="ppc64" >>> platform="Linux-ppc64" >>> MHz="1656.4" >>> title="opreport -l session:dcnthr tgid:5590 -g -X " >>> >>>That's not even valid XML afaics. Have you considered using libxml to >>>write out XML? >> >>Hmmm . . . I used xmllint on the examples (cpu.xml, all.xml, lib.xml, >>and thr.xml) and they all pass validation. What part do you see that's >>not valid XML? > > > I guess it's valid since <Profile>'s contents are just "text", but it's > certainly unstructured. Ask yourself how I would get the value of "MHz" > out via XSLT selectors All these should be attributes upon the <profile> > element. > > regards > john |
From: John L. <le...@mo...> - 2006-07-02 16:14:08
|
On Fri, Jun 30, 2006 at 01:28:43PM -0500, Maynard Johnson wrote: > > You didn't comment on this. I can see the argument for the hierarchical > > approach for processes and threads, in fact, now you remind me. But > > event num, mask, CPU etc all need to be like I describe above, I think. > Dave had actually investigated this approach at one point, but > eventually backed away from it. IIRC, the reason he didn't stick with > it was because he (and others who reviewed the schema) felt it important > to place the sample data out of line from the > Process->Thread->Module->Symbol elements. This doesn't seem that big a > deal when the XML instance includes only basic sample data, but when the > user asks for --details and/or --debug-info, the amount of data > explodes. It's easy to see then why having summary data at the 4 levels > of the hierarchy described above would be beneficial to a tool consuming > the XML. The tool could actually display the top-level information > while parsing the rest of the file for the detailed sample data. I'm not really talking about the summary data at each level, but the layout of how you're hard-coding stuff... please review my example... regards john |
From: Maynard J. <may...@us...> - 2006-07-05 23:04:40
|
John Levon wrote: > On Fri, Jun 30, 2006 at 01:28:43PM -0500, Maynard Johnson wrote: > > >>>You didn't comment on this. I can see the argument for the hierarchical >>>approach for processes and threads, in fact, now you remind me. But >>>event num, mask, CPU etc all need to be like I describe above, I think. >> >>Dave had actually investigated this approach at one point, but >>eventually backed away from it. IIRC, the reason he didn't stick with >>it was because he (and others who reviewed the schema) felt it important >>to place the sample data out of line from the >>Process->Thread->Module->Symbol elements. This doesn't seem that big a >>deal when the XML instance includes only basic sample data, but when the >>user asks for --details and/or --debug-info, the amount of data >>explodes. It's easy to see then why having summary data at the 4 levels >>of the hierarchy described above would be beneficial to a tool consuming >>the XML. The tool could actually display the top-level information >>while parsing the rest of the file for the detailed sample data. > > > I'm not really talking about the summary data at each level, but the > layout of how you're hard-coding stuff... please review my example... In the generation of XML, we want to output all the sample data available. As I understand it, the profile classes are used as a means of processing the data into merged chunks that can be easily displayed in the column-based report. Nevertheless, we could still use something akin to the approach you suggest. But instead of there being separate class types, there would be just one type, with an instance for each event_num/cpu/(optional)unit_mask combination. I must admit that since the processors I'm familiar with don't use unit masks, I'm not sure how well the unit mask data fits in here. If there are an arbitrary number of unit masks for any given event, then this technique won't work. Regards, -Maynard > > regards > john |
From: John L. <le...@mo...> - 2006-07-06 00:54:21
|
On Wed, Jul 05, 2006 at 06:04:20PM -0500, Maynard Johnson wrote: > akin to the approach you suggest. But instead of there being separate > class types, there would be just one type, with an instance for each > event_num/cpu/(optional)unit_mask combination. I must admit that since Could you look into it? > the processors I'm familiar with don't use unit masks, I'm not sure how > well the unit mask data fits in here. If there are an arbitrary number > of unit masks for any given event, then this technique won't work. There isn't, generally there's a selection you can choose from, or a mask. But really you just need to notice that there's the same event but two different unit masks for example. regards john |
From: Dave N. <dc...@us...> - 2006-08-01 19:12:01
|
John, I'm sorry that we haven't been able to follow up on the opreport -X discussions until now. Between vacations, and conferences both Maynard and I have been away from the office for most of July, but are ready to get back to these issues. Let me summarize the issues that you raised in previous posts: 1. Why are we generating array indexes to index into the detailTable or symbolTable instead of using a more XML-like approach of using element ids. We'll add the lookup id attribute to detailData and symbolData. The XML consumer tools writers are free to ignore these and implement symbolTable and detailTable as arrays if the tree lookup performance is really bad. 2. Our syntax for collecting together sample count data for multiple CPUs into a single comma separated list: count=",,12,34,,56" is weird. Actually the full encoding of counts should have also included syntax for events and masks. For example, (cpu0_count:cpu1_count:..cpuN_count:1, ..., cpu0_count:...cpuN_count):mM mask------------------------------------------------------------------^ event0 ... eventE An instance might look like: (4 CPUs, 3 events, 2 masks) count="(1::3:,:::,:4:5:):m3, (:::,:2::,4::5:):m5" Your response to this is that it is too weird and I think you suggested that we should use a class type that enumerates every combination of CPU, event, unit_mask. My first reaction to your proposal was that it would explode the size of the XML when --details is used but on further reflection I think that explosion in XML size only happens if you get more dense sample data. i.e. most of the instructions that have sample data have data for multiple combinations of CPU, events, and masks. In my very limited experience with oprofile profiles I didn't see this happen. We'll put the class types into the schema and output the list of all combinations of CPU, event, mask in some form and reference these class identifiers as you suggest in the sample data. If XML size becomes a bigger issue we might want to explore the above encoding, or maybe have some option like, --dense, to cause opreport to generate a more dense encoding of counts. 3. SummaryData is a weird, overly generic name. It should be renamed to "count". I think Maynard explained in one of his mails that we have two different XML elements to represent sample data, both of them contain an attribute called "count". sampleData is a --details data point with an optional source_line and source_file attributes; summaryData is the aggregation of all the sampleData elements associated with a container (Thread, Module, Symbol) and has no need for source_line or source_file attributes. If you don't think this distinction is useful, we could just define: <xs:element name="count"> <xs:complexType> <xs:simpleContext> <xs:extension base="xs:nonNegativeInteger"> <xs:attribute name="class" type="xs:string" use="required"/> <xs:attribute name="line" type="xs:nonNegativeInteger" use="optional"/> <xs:attribute name="file" type="xs:string" use="optional"/> </xs:extension> </xs:simpleContext> </xs:complexType> </xs:element> 4. Any file names should always be full paths. Agreed. 5. The following XML although valid, is unusual and hard to access. Maybe we should be using libxml to write out XML. <Profile> cpu_name="ppc64" platform="Linux-ppc64" MHz="1656.4" title="opreport -l session:dcnthr tgid:5590 -g -X " We will change the code to use the xmlwriter interfaces for writing XML. If you have further comments on any of the above issues, or any other issues that I've missed, please let us know. |