|
From: <sv...@va...> - 2006-03-20 10:29:40
|
Author: weidendo Date: 2006-03-20 10:29:30 +0000 (Mon, 20 Mar 2006) New Revision: 5781 Log: Callgrind merge: documentation Added: trunk/callgrind/docs/cl-entities.xml trunk/callgrind/docs/cl-format.xml trunk/callgrind/docs/cl-manual.xml trunk/callgrind/docs/index.xml trunk/callgrind/docs/man-annotate.xml trunk/callgrind/docs/man-callgrind.xml trunk/callgrind/docs/man-control.xml Modified: trunk/callgrind/docs/Makefile.am trunk/docs/xml/manual.xml trunk/docs/xml/tech-docs.xml trunk/docs/xml/valgrind-manpage.xml Modified: trunk/callgrind/docs/Makefile.am =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/callgrind/docs/Makefile.am 2006-03-20 10:27:30 UTC (rev 5780) +++ trunk/callgrind/docs/Makefile.am 2006-03-20 10:29:30 UTC (rev 5781) @@ -1 +1,8 @@ -EXTRA_DIST =3D=20 +EXTRA_DIST =3D \ + cl-entities.xml \ + cl-manual.xml \ + cl-format.xml \ + index.xml \ + man-annotate.xml \ + man-control.xml \ + man-callgrind.xml Added: trunk/callgrind/docs/cl-entities.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/callgrind/docs/cl-entities.xml (rev 0) +++ trunk/callgrind/docs/cl-entities.xml 2006-03-20 10:29:30 UTC (rev 578= 1) @@ -0,0 +1,22 @@ +<!-- callgrind release + version stuff --> +<!ENTITY cl-version "0.10.1"> +<!ENTITY cl-date "November 25 2005"> + +<!-- copyright length of years --> +<!ENTITY cl-lifespan "2000-2005"> + +<!-- website + email --> +<!ENTITY cl-email "Jos...@gm..."> +<!ENTITY cl-url "http://www.valgrind.org/info/developers.html"> + +<!-- external urls used in the docs. kept in here because when --> +<!-- they change it's a real pain tracking them down in the docs --> +<!ENTITY vg-url "http://www.valgrind.org/"> +<!ENTITY cg-doc-url "http://www.valgrind.org/docs/manual/cg-manual.html= "> +<!ENTITY cg-tool-url "http://www.valgrind.org/info/tools.html#cachegrind= "> +<!ENTITY cl-gui "http://kcachegrind.sourceforge.net/cgi-bin/show.cg= i/KcacheGrindIndex"> + +<!-- path/to/callgrind/docs in valgrind install tree --> +<!-- only used in the manpages --> +<!ENTITY cl-doc-path "/usr/share/doc/valgrind/html/callgrind.html"> +<!ENTITY cl-doc-url "http://www.valgrind.org/docs/manual/cl-manual.htm= l"> Added: trunk/callgrind/docs/cl-format.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/callgrind/docs/cl-format.xml (rev 0) +++ trunk/callgrind/docs/cl-format.xml 2006-03-20 10:29:30 UTC (rev 5781) @@ -0,0 +1,551 @@ +<?xml version=3D"1.0"?> <!-- -*- sgml -*- --> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" +[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]> + +<chapter id=3D"cl-format" xreflabel=3D"Callgrind Format Specification"> +<title>Callgrind Format Specification</title> + +<para>This chapter describes the Callgrind Profile Format, Version 1.</p= ara> + +<para>A synonymous name is "Calltree Profile Format". These names actual= ly mean +the same since Callgrind was previously named Calltree.</para> + +<para>The format description is meant for the user to be able to underst= and the +file contents; but more important, it is given for authors of measuremen= t or +visualization tools to be able to write and read this format.</para> + +<sect1 id=3D"cl-format.overview" xreflabel=3D"Overview"> +<title>Overview</title> + +<para>The profile data format is ASCII based. +It is written by Callgrind, and it is upwards compatible +to the format used by Cachegrind (ie. Cachegrind uses a subset). It can +be read by callgrind_annotate and KCachegrind.</para> + +<para>This chapter gives on overview of format features and examples. +For detailed syntax, look at the format reference.</para> + +<sect2 id=3D"cl-format.overview.basics" xreflabel=3D"Basic Structure"> +<title>Basic Structure</title> + +<para>Each file has a header part of an arbitrary number of lines of the +format "key: value". The lines with key "positions" and "events" define +the meaning of cost lines in the second part of the file: the value of +"positions" is a list of subpositions, and the value of "events" is a li= st +of event type names. Cost lines consist of subpositions followed by 64-b= it +counters for the events, in the order specified by the "positions" and "= events" +header line.</para> + +<para>The "events" header line is always required in contrast to the opt= ional +line for "positions", which defaults to "line", i.e. a line number of so= me +source file. In addition, the second part of the file contains position +specifications of the form "spec=3Dname". "spec" can be e.g. "fn" for a +function name or "fl" for a file name. Cost lines are always related to +the function/file specifications given directly before.</para> + +</sect2> + +<sect2 id=3D"cl-format.overview.example1" xreflabel=3D"Simple Example"> +<title>Simple Example</title> + +<para> +<screen>events: Cycles Instructions Flops +fl=3Dfile.f +fn=3Dmain +15 90 14 2 +16 20 12</screen></para> + +<para>The above example gives profile information for event types "Cycle= s", +"Instructions", and "Flops". Thus, cost lines give the number of CPU cyc= les +passed by, number of executed instructions, and number of floating point +operations executed while running code corresponding to some source +position. As there is no line specifying the value of "positions", it de= faults +to "line", which means that the first number of a cost line is always a = line +number.</para> + +<para>Thus, the first cost line specifies that in line 15 of source file +"file.f" there is code belonging to function "main". While running, 90 C= PU +cycles passed by, and 2 of the 14 instructions executed were floating po= int +operations. Similarily, the next line specifies that there were 12 instr= uctions +executed in the context of function "main" which can be related to line = 16 in +file "file.f", taking 20 CPU cycles. If a cost line specifies less event= counts +than given in the "events" line, the rest is assumed to be zero. I.e., t= here +was no floating point instruction executed relating to line 16.</para> + +<para>Note that regular cost lines always give self (also called exclusi= ve) +cost of code at a given position. If you specify multiple cost lines for= the +same position, these will be summed up. On the other hand, in the exampl= e above +there is no specification of how many times function "main" actually was +called: profile data only contains sums.</para> + +</sect2> + + +<sect2 id=3D"cl-format.overview.associations" xreflabel=3D"Associations"= > +<title>Associations</title> + +<para>The most important extension to the original format of Cachegrind = is the +ability to specify call relationship among functions. More generally, yo= u +specify assoziations among positions. For this, the second part of the +file also can contain assoziation specifications. These look similar to +position specifications, but consist of 2 lines. For calls, the format +looks like=20 +<screen> + calls=3D(Call Count) (Destination position) + (Source position) (Inclusive cost of call) +</screen></para> + +<para>The destination only specifies subpositions like line number. Ther= efore, +to be able to specify a call to another function in another source file,= you +have to precede the above lines with a "cfn=3D" specification for the na= me of the +called function, and a "cfl=3D" specification if the function is in anot= her +source file. The 2nd line looks like a regular cost line with the differ= ence +that inclusive cost spent inside of the function call has to be specifie= d.</para>=20 + +<para>Other assoziations which or for example (conditional) jumps. See t= he +reference below for details.</para> + +</sect2> + + +<sect2 id=3D"cl-format.overview.example2" xreflabel=3D"Extended Example"= > +<title>Extended Example</title> + +<para>The following example shows 3 functions, "main", "func1", and +"func2". Function "main" calls "func1" once and "func2" 3 times. "func1"= calls +"func2" 2 times. +<screen>events: Instructions + +fl=3Dfile1.c +fn=3Dmain +16 20 +cfn=3Dfunc1 +calls=3D1 50 +16 400 +cfl=3Dfile2.c +cfn=3Dfunc2 +calls=3D3 20 +16 400 + +fn=3Dfunc1 +51 100 +cfl=3Dfile2.c +cfn=3Dfunc2 +calls=3D2 20 +51 300 + +fl=3Dfile2.c +fn=3Dfunc2 +20 700</screen></para> + +<para>One can see that in "main" only code from line 16 is executed wher= e also +the other functions are called. Inclusive cost of "main" is 420, which i= s the +sum of self cost 20 and costs spent in the calls.</para> + +<para>Function "func1" is located in "file1.c", the same as "main". Ther= efore, +a "cfl=3D" specification for the call to "func1" is not needed. The func= tion +"func1" only consists of code at line 51 of "file1.c", where "func2" is = called.</para> + +</sect2> + + +<sect2 id=3D"cl-format.overview.compression1" xreflabel=3D"Name Compress= ion"> +<title>Name Compression</title> + +<para>With the introduction of association specifications like calls it = is +needed to specify the same function or same file name multiple times. As +absolute filenames or symbol names in C++ can be quite long, it is advan= tageous +to be able to specify integer IDs for position specifications.</para> + +<para>To support name compression, a position specification can be not o= nly of +the format "spec=3Dname", but also "spec=3D(ID) name" to specify a mappi= ng of an +integer ID to a name, and "spec=3D(ID)" to reference a previously define= d ID +mapping. There is a separate ID mapping for each position specification, +i.e. you can use ID 1 for both a file name and a symbol name.</para> + +<para>With string compression, the example from 1.4 looks like this: +<screen>events: Instructions + +fl=3D(1) file1.c +fn=3D(1) main +16 20 +cfn=3D(2) func1 +calls=3D1 50 +16 400 +cfl=3D(2) file2.c +cfn=3D(3) func2 +calls=3D3 20 +16 400 + +fn=3D(2) +51 100 +cfl=3D(2) +cfn=3D(3) +calls=3D2 20 +51 300 + +fl=3D(2) +fn=3D(3) +20 700</screen></para> + +<para>As position specifications carry no information themself, but only= change +the meaning of subsequent cost lines or associations, they can appear +everywhere in the file without any negative consequence. Especially, you= can +define name compression mappings directly after the header, and before a= ny cost +lines. Thus, the above example can also be written as +<screen>events: Instructions + +# define file ID mapping +fl=3D(1) file1.c +fl=3D(2) file2.c +# define function ID mapping +fn=3D(1) main +fn=3D(2) func1 +fn=3D(3) func2 + +fl=3D(1) +fn=3D(1) +16 20 +...</screen></para> + +</sect2> + + +<sect2 id=3D"cl-format.overview.compression2" xreflabel=3D"Subposition C= ompression"> +<title>Subposition Compression</title> + +<para>If a Calltree data file should hold costs for each assembler instr= uction +of a program, you specify subpostion "instr" in the "positions:" header = line, +and each cost line has to include the address of some instruction. Addre= sses +are allowed to have a size of 64bit to support 64bit architectures. This +motivates for subposition compression: instead of every cost line starti= ng with +a 16 character long address, one is allowed to specify relative subposit= ions.</para> + +<para>A relative subposition always is based on the corresponding subpos= ition +of the last cost line, and starts with a "+" to specify a positive diffe= rence, +a "-" to specify a negative difference, or consists of "*" to specify th= e same +subposition. Assume the following example (subpositions can always be sp= ecified +as hexadecimal numbers, beginning with "0x"): +<screen>positions: instr line +events: ticks + +fn=3Dfunc +0x80001234 90 1 +0x80001237 90 5 +0x80001238 91 6</screen></para> + +<para>With subposition compression, this looks like +<screen>positions: instr line +events: ticks + +fn=3Dfunc +0x80001234 90 1 ++3 * 5 ++1 +1 6</screen></para> + +<para>Remark: For assembler annotation to work, instruction addresses ha= ve to +be corrected to correspond to addresses found in the original binary. I.= e. for +relocatable shared objects, often a load offset has to be subtracted.</p= ara> + +</sect2> + + +<sect2 id=3D"cl-format.overview.misc" xreflabel=3D"Miscellaneous"> +<title>Miscellaneous</title> + +<sect3 id=3D"cl-format.overview.misc.summary" xreflabel=3D"Cost Summary = Information"> +<title>Cost Summary Information</title> + +<para>For the visualization to be able to show cost percentage, a sum of= the +cost of the full run has to be known. Usually, it is assumed that this i= s the +sum of all cost lines in a file. But sometimes, this is not correct. Thu= s, you +can specify a "summary:" line in the header giving the full cost for the +profile run. This has another effect: a import filter can show a progres= s bar +while loading a large data file if he knows to cost sum in advance.</par= a> + +</sect3> + +<sect3 id=3D"cl-format.overview.misc.events" xreflabel=3D"Long Names for= Event Types and inherited Types"> +<title>Long Names for Event Types and inherited Types</title> + +<para>Event types for cost lines are specified in the "events:" line wit= h an +abbreviated name. For visualization, it makes sense to be able to specif= y some +longer, more descriptive name. For an event type "Ir" which means "Instr= uction +Fetches", this can be specified the header line +<screen>event: Ir : Instruction Fetches +events: Ir Dr</screen></para> + +<para>In this example, "Dr" itself has no long name assoziated. The orde= r of +"event:" lines and the "events:" line is of no importance. Additionally, +inherited event types can be introduced for which no raw data is availab= le, but +which are calculated from given types. Suppose the last example, you cou= ld add +<screen>event: Sum =3D Ir + Dr</screen> +to specify an additional event type "Sum", which is calculated by adding= costs +for "Ir and "Dr".</para> + +</sect3> + +</sect2> + +</sect1> + +<sect1 id=3D"cl-format.reference" xreflabel=3D"Reference"> +<title>Reference</title> + +<sect2 id=3D"cl-format.reference.grammar" xreflabel=3D"Grammar"> +<title>Grammar</title> + +<para> +<screen>ProfileDataFile :=3D FormatVersion? Creator? PartData*</screen> +<screen>FormatVersion :=3D "version:" Space* Number "\n"</screen> +<screen>Creator :=3D "creator:" NoNewLineChar* "\n"</screen> +<screen>PartData :=3D (HeaderLine "\n")+ (BodyLine "\n")+</screen> +<screen>HeaderLine :=3D (empty line) + | ('#' NoNewLineChar*) + | PartDetail + | Description + | EventSpecification + | CostLineDef</screen> +<screen>PartDetail :=3D TargetCommand | TargetID</screen> +<screen>TargetCommand :=3D "cmd:" Space* NoNewLineChar*</screen> +<screen>TargetID :=3D ("pid"|"thread"|"part") ":" Space* Number</screen> +<screen>Description :=3D "desc:" Space* Name Space* ":" NoNewLineChar*</= screen> +<screen>EventSpecification :=3D "event:" Space* Name InheritedDef? LongN= ameDef?</screen> +<screen>InheritedDef :=3D "=3D" InheritedExpr</screen> +<screen>InheritedExpr :=3D Name + | Number Space* ("*" Space*)? Name + | InheritedExpr Space* "+" Space* InheritedExpr</screen> +<screen>LongNameDef :=3D ":" NoNewLineChar*</screen> +<screen>CostLineDef :=3D "events:" Space* Name (Space+ Name)* + | "positions:" "instr"? (Space+ "line")?</screen> +<screen>BodyLine :=3D (empty line) + | ('#' NoNewLineChar*) + | CostLine + | PositionSpecification + | AssoziationSpecification</screen> +<screen>CostLine :=3D SubPositionList Costs?</screen> +<screen>SubPositionList :=3D (SubPosition+ Space+)+</screen> +<screen>SubPosition :=3D Number | "+" Number | "-" Number | "*"</screen> +<screen>Costs :=3D (Number Space+)+</screen> +<screen>PositionSpecification :=3D Position "=3D" Space* PositionName</s= creen> +<screen>Position :=3D CostPosition | CalledPosition</screen> +<screen>CostPosition :=3D "ob" | "fl" | "fi" | "fe" | "fn"</screen> +<screen>CalledPosition :=3D " "cob" | "cfl" | "cfn"</screen> +<screen>PositionName :=3D ( "(" Number ")" )? (Space* NoNewLineChar* )?<= /screen> +<screen>AssoziationSpecification :=3D CallSpezification + | JumpSpecification</screen> +<screen>CallSpecification :=3D CallLine "\n" CostLine</screen> +<screen>CallLine :=3D "calls=3D" Space* Number Space+ SubPositionList</s= creen> +<screen>JumpSpecification :=3D ...</screen> +<screen>Space :=3D " " | "\t"</screen> +<screen>Number :=3D HexNumber | (Digit)+</screen> +<screen>Digit :=3D "0" | ... | "9"</screen> +<screen>HexNumber :=3D "0x" (Digit | HexChar)+</screen> +<screen>HexChar :=3D "a" | ... | "f" | "A" | ... | "F"</screen> +<screen>Name =3D Alpha (Digit | Alpha)*</screen> +<screen>Alpha =3D "a" | ... | "z" | "A" | ... | "Z"</screen> +<screen>NoNewLineChar :=3D all characters without "\n"</screen> +</para> + +</sect2> + +<sect2 id=3D"cl-format.reference.header" xreflabel=3D"Description of Hea= der Lines"> +<title>Description of Header Lines</title> + +<para>The header has an arbitrary number of lines of the format=20 +"key: value". Possible <emphasis>key</emphasis> values for the header ar= e:</para> + +<itemizedlist> + + <listitem> + <para><computeroutput>version: number</computeroutput> [Callgrind]</= para> + <para>This is used to distinguish future profile data formats. A=20 + major version of 0 or 1 is supposed to be upwards compatible with=20 + Cachegrinds format. It is optional; if not appearing, version 1=20 + is supposed. Otherwise, this has to be the first header line.</para= > + </listitem> + + <listitem> + <para><computeroutput>pid: process id</computeroutput> [Callgrind]</= para> + <para>This specifies the process ID of the supervised application=20 + for which this profile was generated.</para> + </listitem> + + <listitem> + <para><computeroutput>cmd: program name + args</computeroutput> [Cac= hegrind]</para> + <para>This specifies the full command line of the supervised + application for which this profile was generated.</para> + </listitem> + + <listitem> + <para><computeroutput>part: number</computeroutput> [Callgrind]</par= a> + <para>This specifies a sequentially incremented number for each dump= =20 + generated, starting at 1.</para> + </listitem> + + <listitem> + <para><computeroutput>desc: type: value</computeroutput> [Cachegrind= ]</para> + <para>This specifies various information for this dump. For some=20 + types, the semantic is defined, but any description type is allowed.= =20 + Unknown types should be ignored.</para> + <para>There are the types "I1 cache", "D1 cache", "L2 cache", which=20 + specify parameters used for the cache simulator. These are the only + types originally used by Cachegrind. Additionally, Callgrind uses=20 + the following types: "Timerange" gives a rough range of the basic + block counter, for which the cost of this dump was collected.=20 + Type "Trigger" states the reason of why this trace was generated. + E.g. program termination or forced interactive dump.</para> + </listitem> + + <listitem> + <para><computeroutput>positions: [instr] [line]</computeroutput> [Ca= llgrind]</para> + <para>For cost lines, this defines the semantic of the first numbers= .=20 + Any combination of "instr", "bb" and "line" is allowed, but has to b= e=20 + in this order which corresponds to position numbers at the start of=20 + the cost lines later in the file.</para> + <para>If "instr" is specified, the position is the address of an=20 + instruction whose execution raised the events given later on the=20 + line. This address is relative to the offset of the binary/shared=20 + library file to not have to specify relocation info. For "line",=20 + the position is the line number of a source file, which is=20 + responsible for the events raised. Note that the mapping of "instr" + and "line" positions are given by the debugging line information + produced by the compiler.</para> + <para>This field is optional. If not specified, "line" is supposed=20 + only.</para> + </listitem> + + <listitem> + <para><computeroutput>events: event type abbrevations</computeroutpu= t> [Cachegrind]</para> + <para>A list of short names of the event types logged in this file.=20 + The order is the same as in cost lines. The first event type is the + second or third number in a cost line, depending on the value of=20 + "positions". Callgrind does not add additional cost types. Specify + exactly once.</para> + <para>Cost types from original Cachegrind are: + <itemizedlist> + <listitem> + <para><command>Ir</command>: Instruction read access</para> + </listitem> + <listitem> + <para><command>I1mr</command>: Instruction Level 1 read cache = miss</para> + </listitem> + <listitem> + <para><command>I2mr</command>: Instruction Level 2 read cache = miss</para> + </listitem> + <listitem> + <para>...</para> + </listitem> + </itemizedlist> + </para> + </listitem> + + <listitem> + <para><computeroutput>summary: costs</computeroutput> [Callgrind]</p= ara> + <para><computeroutput>totals: costs</computeroutput> [Cachegrind]</p= ara> + <para>The value or the total number of events covered by this trace + file. Both keys have the same meaning, but the "totals:" line=20 + happens to be at the end of the file, while "summary:" appears in=20 + the header. This was added to allow postprocessing tools to know + in advance to total cost. The two lines always give the same cost=20 + counts.</para> + </listitem> + +</itemizedlist> + +</sect2> + +<sect2 id=3D"cl-format.reference.body" xreflabel=3D"Description of Body = Lines"> +<title>Description of Body Lines</title> + +<para>There exist lines +<computeroutput>spec=3Dposition</computeroutput>. The values for positi= on +specifications are arbitrary strings. When starting with "(" and a +digit, it's a string in compressed format. Otherwise it's the real +position string. This allows for file and symbol names as position +strings, as these never start with "(" + <emphasis>digit</emphasis>. +The compressed format is either "(" <emphasis>number</emphasis> ")" +<emphasis>space</emphasis> <emphasis>position</emphasis> or only=20 +"(" <emphasis>number</emphasis> ")". The first relates +<emphasis>position</emphasis> to <emphasis>number</emphasis> in the +context of the given format specification from this line to the end of +the file; it makes the (<emphasis>number</emphasis>) an alias for +<emphasis>position</emphasis>. Compressed format is always +optional.</para> + +<para>Position specifications allowed:</para> +<itemizedlist> + + <listitem> + <para><computeroutput>ob=3D</computeroutput> [Callgrind]</para> + <para>The ELF object where the cost of next cost lines happens.</par= a> + </listitem> + + <listitem> + <para><computeroutput>fl=3D</computeroutput> [Cachegrind]</para> + </listitem> + + <listitem> + <para><computeroutput>fi=3D</computeroutput> [Cachegrind]</para> + </listitem> + + <listitem> + <para><computeroutput>fe=3D</computeroutput> [Cachegrind]</para> + <para>The source file including the code which is responsible for + the cost of next cost lines. "fi=3D"/"fe=3D" is used when the source + file changes inside of a function, i.e. for inlined code.</para> + </listitem> + + <listitem> + <para><computeroutput>fn=3D</computeroutput> [Cachegrind]</para> + <para>The name of the function where the cost of next cost lines=20 + happens.</para> + </listitem> + + <listitem> + <para><computeroutput>cob=3D</computeroutput> [Callgrind]</para> + <para>The ELF object of the target of the next call cost lines.</par= a> + </listitem> + + <listitem> + <para><computeroutput>cfl=3D</computeroutput> [Callgrind]</para> + <para>The source file including the code of the target of the + next call cost lines.</para> + </listitem> + + <listitem> + <para><computeroutput>cfn=3D</computeroutput> [Callgrind]</para> + <para>The name of the target function of the next call cost=20 + lines.</para> + </listitem> + + <listitem> + <para><computeroutput>calls=3D</computeroutput> [Callgrind]</para> + <para>The number of nonrecursive calls which are responsible for the= =20 + cost specified by the next call cost line. This is the cost spent=20 + inside of the called function.</para> + <para>After "calls=3D" there MUST be a cost line. This is the cost + spent in the called function. The first number is the source line=20 + from where the call happened.</para> + </listitem> + + <listitem> + <para><computeroutput>jump=3Dcount target position</computeroutput> = [Callgrind]</para> + <para>Unconditional jump, executed count times, to the given target + position.</para> + </listitem> + + <listitem> + <para><computeroutput>jcnd=3Dexe.count jumpcount target position</co= mputeroutput> [Callgrind]</para> + <para>Conditional jump, executed exe.count times with jumpcount=20 + jumps to the given target position.</para> + </listitem> + +</itemizedlist> + +</sect2> + +</sect1> + +</chapter> \ No newline at end of file Added: trunk/callgrind/docs/cl-manual.xml =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- trunk/callgrind/docs/cl-manual.xml (rev 0) +++ trunk/callgrind/docs/cl-manual.xml 2006-03-20 10:29:30 UTC (rev 5781) @@ -0,0 +1,810 @@ +<?xml version=3D"1.0"?> <!-- -*- sgml -*- --> +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" + "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" +[ <!ENTITY % cl-entities SYSTEM "cl-entities.xml"> %cl-entities; ]> + +<chapter id=3D"cl-manual" xreflabel=3D"Callgrind Manual"> +<title>Callgrind Manual</title> + + +<sect1 id=3D"cl-manual.use" xreflabel=3D"Overview"> +<title>Overview</title> + +<para>Callgrind is a Valgrind tool, able to run applications under=20 +supervision to generate profiling data. By default, this data consists o= f +number of instructions executed on a run, related to source lines, and +call relationship among functions together with call counts. +Optionally, a cache simulator (similar to cachegrind) can produce +further information about the memory access behavior of the application. +</para> + +<para>The profile data is written out to a file at program +termination. For presentation of the data, and interactive control +of the profiling, two command line tools are provided:</para> +<variablelist> + <varlistentry> + <term><command>callgrind_annotate</command></term> + <listitem> + <para>This command reads in the profile data, and prints a + sorted lists of functions, optionally with annotation.</para> + <para>You can read the manpage here: <xref + linkend=3D"callgrind-annotate"/>.</para> + <para>For graphical visualization of the data, check out + <ulink url=3D"&cl-gui;">KCachegrind</ulink>.</para> + + </listitem> + </varlistentry> + + <varlistentry> + <term><command>callgrind_control</command></term> + <listitem> + <para>This command enables you to interactively observe and control=20 + the status of currently running applications supervised. You can=20 + get statistic information, the current stack trace, and request=20 + zeroing of counters, and dumping of profiles.</para> + <para>You can read the manpage here: <xref linkend=3D"callgrind-cont= rol"/>.</para> + </listitem> + </varlistentry> +</variablelist> + +<para>To use Callgrind, you must specify=20 +<computeroutput>--tool=3Dcallgrind</computeroutput> on the Valgrind=20 +command line or use the supplied script=20 +<computeroutput>callgrind</computeroutput>.</para> + +<para>Callgrinds cache simulation is based on the=20 +<ulink url=3D"&cg-tool-url;">Cachegrind tool</ulink> of the=20 +<ulink url=3D"&vg-url;">Valgrind</ulink> package. Read=20 +<ulink url=3D"&cg-doc-url;">Cachegrind's documentation</ulink> first;=20 +this page describes the features supported in addition to=20 +Cachegrind's features.</para> + +</sect1> + + +<sect1 id=3D"cl-manual.purpose" xreflabel=3D"Purpose"> +<title>Purpose</title> + + + <sect2 id=3D"cl-manual.devel"=20 + xreflabel=3D"Profiling as part of Application Development"> + <title>Profiling as part of Application Development</title> + + <para>With application development, usually, one of the last steps is + to improve the runtime performance. To not waste time on + optimizing functions which are rarely used, one needs to know + in which part of the program most of the time is spent.</para> + + <para>This is done with a technique called profiling. The program + is run under control of a profiling tool, which gives the time + distribution of executed functions in the run. After examination + of the program's profile, it should be clear if and where optimization + is useful. Afterwards, one should verify any runtime changes by anothe= r + profile run.</para> + + </sect2> + + + <sect2 id=3D"cl-manual.tools" xreflabel=3D"Profiling Tools"> + <title>Profiling Tools</title> + + <para>Most known is the GCC profiling tool <command>GProf</command>: + one needs to compile an application with the compiler option=20 + <computeroutput>-pg</computeroutput>; running the program generates + a file <computeroutput>gmon.out</computeroutput>, which can be=20 + transformed into human readable form with the command line tool=20 + <computeroutput>gprof</computeroutput>. An disadvantage here is the=20 + required compilation step for preparing the executable; additionally, = the + application should be statically linked.</para> + + <para>Another profiling tool is <command>Cachegrind</command>, part + of <ulink url=3D"&vg-url;">Valgrind</ulink>. It uses the processor + emulation of Valgrind to run the executable, and catches all memory + accesses for the trace. The user program does not need to be + recompiled; it can use shared libraries and plugins, and the profile + measuring doesn't influence the trace results. The trace includes=20 + the number of instruction/data memory accesses and 1st/2nd level + cache misses, and relates it to source lines and functions of the + run program. A disadvantage is the slowdown involved in the + processor emulation, it's around 50 times slower.</para> + + <para>Cachegrind can only deliver a flat profile. There is no call=20 + relationship among the functions of an application stored. Thus,=20 + inclusive costs, i.e. costs of a function including the cost of all=20 + functions called from there, cannot be calculated. Callgrind extends=20 + Cachegrind by including call relationship and exact event counts + spent while doing a call.</para> + + <para>Because Callgrind (and Cachegrind) is based on simulation, the + slowdown due to processing the synthetic runtime events does not + influence the results. See <xref linkend=3D"cl-manual.usage"/> for mo= re=20 + details on the possibilities.</para> + + </sect2> + +</sect1> + + +<sect1 id=3D"cl-manual.usage" xreflabel=3D"Usage"> +<title>Usage</title> + + <sect2 id=3D"cl-manual.basics" xreflabel=3D"Basics"> + <title>Basics</title> + + <para>To start a profile run for a program, execute: + <screen>callgrind [callgrind options] your-program [program options]</= screen> + </para> + + <para>While the simulation is running, you can observe execution with + <screen>callgrind_control -b</screen> + This will print out a current backtrace. To annotate the backtrace wit= h + event counts, run + <screen>callgrind_control -e -b</screen> + </para> + + <para>After program termination, a profile data file named=20 + <computeroutput>callgrind.out.pid</computeroutput> + is generated with <emphasis>pid</emphasis> being the process ID=20 + of the execution of this profile run.</para> + + <para>The data file contains information about the calls made in the + program among the functions executed, together with events of type + <command>Instruction Read Accesses</command> (Ir).</para> + + <para>If you are additionally interested in memory accesses of your=20 + program, and if an access can be satisfied by loading from 1st/2nd + level cache, use Callgrind with the option + <option><xref linkend=3D"opt.simulate-cache"/>=3Dyes.</option> + This will further slow down the run approximatly by a factor of 2.</pa= ra> + + <para>If the program section you want to profile is somewhere in the + middle of the run, it is beneficial to=20 + <emphasis>fast forward</emphasis> to this section without any=20 + profiling at all, and switch it on later. This is achieved by using + <option><xref linkend=3D"opt.instr-atstart"/>=3Dno</option>=20 + and interactively use=20 + <computeroutput>callgrind_control -i on</computeroutput> before the=20 + interesting code section is about to be executed.</para> + + <para>If you want to be able to see assembler annotation, specify + <option><xref linkend=3D"opt.dump-instr"/>=3Dyes</option>. This will p= roduce + profile data at instruction granularity. Note that this type of annota= tion + is only available with KCachegrind. For assembler annotation, it also = is + interesting to see more details of the control flow inside of function= s, + ie. (conditional) jumps. This will be collected by further specifying + <option><xref linkend=3D"opt.collect-jumps"/>=3Dyes</option>.</para> + + </sect2> + + + <sect2 id=3D"cl-manual.dumps"=20 + xreflabel=3D"Multiple dumps from one program run"> + <title>Multiple dumps from one program run</title> + + <para>Often, you aren't interested in time characteristics of a full=20 + program run, but only of a small part of it (e.g. execution of one + algorithm). If there are multiple algorithms or one algorithm=20 + running with different input data, it's even useful to get different + profile information for multiple parts of one program run.</para> + + <para>In full detail, a generated profile data files is named +<screen> +callgrind.out.<emphasis>pid</emphasis>.<emphasis>part</emphasis>-<emphas= is>threadID</emphasis> +</screen> + </para> + <para>where <emphasis>pid</emphasis> is the PID of the running=20 + program, <emphasis>part</emphasis> is a number incremented on each + dump (".part" is skipped for the dump at program termination), and=20 + <emphasis>threadID</emphasis> is a thread identification=20 + ("-threadID" is only used if you request dumps of individual=20 + threads with <option><xref linkend=3D"opt.separate-threads"/>=3Dyes</o= ption>).</para> + + <para>There are different ways to generate multiple profile dumps=20 + while a program is running under Callgrind's supervision. Still,=20 + all methods trigger the same action, viz. "dump all profile=20 + information since the last dump or program start, and zero cost=20 + counters afterwards". To allow for zeroing cost counters without + dumping, there is a second action "zero all cost counters now".=20 + The different methods are:</para> + <itemizedlist> + + <listitem> + <para><command>Dump on program termination.</command> + This method is the standard way and doesn't need any special + action from your side.</para> + </listitem> + + <listitem> + <para><command>Spontaneous, interactive dumping.</command> Use + <screen>callgrind_control -d [hint [PID/Name]]</screen> to=20 + request the dumping of profile information of the supervised + application with PID or Name. <emphasis>hint</emphasis> is an + arbitrary string you can optionally specify to later be able to + distinguish profile dumps. The control program will not terminate + before the dump is completely written. Note that the application + must be actively running for detection of the dump command. So, + for a GUI application, resize the window or for a server send a + request.</para> + <para>If you are using <ulink url=3D"&cl-gui;">KCachegrind</ulink> + for browsing of profile information, you can use the toolbar + button <command>Force dump</command>. This will request a dump + and trigger a reload after the dump is written.</para> + </listitem> + + <listitem> + <para><command>Periodic dumping after execution of a specified + number of basic blocks</command>. For this, use the command line + option <option><xref linkend=3D"opt.dump-every-bb"/>=3Dcount</opti= on>. + The resultion of the internal basic block counter of Valgrind is + only rough, so you should at least specify a interval of 50000 + basic blocks.</para> + </listitem> + + <listitem> + <para><command>Dumping at enter/leave of all functions whose name + starts with</command> <emphasis>funcprefix</emphasis>. Use the + option <option><xref linkend=3D"opt.dump-before"/>=3Dfuncprefix</o= ption> + and <option><xref linkend=3D"opt.dump-after"/>=3Dfuncprefix</optio= n>. + To zero cost counters before entering a function, use + <option><xref linkend=3D"opt.zero-before"/>=3Dfuncprefix</option>. + The prefix method for specifying function names was choosen to + ease the use with C++: you don't have to specify full + signatures.</para> <para>You can specify these options multiple + times for different function prefixes.</para> + </listitem> + + <listitem> + <para><command>Program controlled dumping.</command> + Put <screen><![CDATA[#include <valgrind/callgrind.h>]]></screen> + into your source and add=20 + <computeroutput>CALLGRIND_DUMP_STATS;</computeroutput> when you + want a dump to happen. Use=20 + <computeroutput>CALLGRIND_ZERO_STATS;</computeroutput> to only=20 + zero cost centers.</para> + <para>In Valgrind terminology, this way is called "Client + requests". The given macros generate a special instruction + pattern with no effect at all (i.e. a NOP). Only when run under + Valgrind, the CPU simulation engine detects the special + instruction pattern and triggers special actions like the ones + described above.</para> + </listitem> + </itemizedlist> + + <para>If you are running a multi-threaded application and specify the + command line option <option><xref linkend=3D"opt.separate-threads"/>=3D= yes</option>,=20 + every thread will be profiled on its own and will create its own + profile dump. Thus, the last two methods will only generate one dump + of the currently running thread. With the other methods, you will get + multiple dumps (one for each thread) on a dump request.</para> + + </sect2> + + + + <sect2 id=3D"cl-manual.limits"=20 + xreflabel=3D"Limiting range of event collection"> + <title>Limiting range of event collection</title> + + <para>For aggregating events (function enter/leave, + instruction execution, memory access) into event numbers, + first, the events must be recognizable by Callgrind, and second, + the collection state must be switched on.</para> + + <para>Event recognition is only possible if <emphasis>instrumentation<= /emphasis> + for program code is switched on. This is the default, but for faster + execution (identical to <computeroutput>valgrind --tool=3Dnone</comput= eroutput>), + it can be temporarely switched off until the program reaches parts whi= ch + are interesting to be profiled. Callgrind can start without instrument= ation + by specifying option <option><xref linkend=3D"opt.instr-atstart"/>=3Dn= o</option>. + The instrumentation state can be switched on interactively + with <screen>callgrind_control -i on</screen> + and off by specifying "off" instead of "on". + Furthermore, instrumentation state can be programatically changed with + the macros <computeroutput>CALLGRIND_START_INSTRUMENTATION;</computero= utput> + and <computeroutput>CALLGRIND_STOP_INSTRUMENTATION;</computeroutput>. + </para> + =20 + <para>In addition to instrumentation, events must be allowed to be col= lected + to be counted. This, too, is by default the case. + You can explicitly control for which part of your program you want to + collect events by using=20 + <option><xref linkend=3D"opt.toggle-collect"/>=3Dfuncprefix</option>.=20 + This will toggle the collection state on entering and leaving a + function. When specifying this option, the default collection state + at program start is "off". Thus, only events happening while running + inside of functions starting with <emphasis>funcprefix</emphasis> will + be collected. Recursive + calls of functions with <emphasis>funcprefix</emphasis> do not trigger + any action.</para> + + <para>It is important to note that with instrumentation switched off, = the + cache simulator can not see any memory access events, and thus, any + simulated cache state will be frozen and wrong without instrumentation= . + Therefore, to get useful cache events (hits/misses) after switching on + instrumentation, the cache first must warm up, + probably leading to many <emphasis>cold misses</emphasis> + which would not have happened in reality. If you do not want to see th= ese, + start actual collection a few million instructions after you have swit= ched + on instrumentation</para>. + + + </sect2> + + + + <sect2 id=3D"cl-manual.cycles" xreflabel=3D"Avoiding cycles"> + <title>Avoiding cycles</title> + + <para>Each group of functions with any two of them happening to have a + call chain from one to the other, is called a cycle. For example, + with A calling B, B calling C, and C calling A, the three functions + A,B,C build up one cycle.</para> + + <para>If a call chain goes multiple times around inside of a cycle, + with profiling, you can not distinguish event counts coming from the + first round or the second. Thus, it makes no sense to attach any inclu= sive + cost to a call among functions inside of one cycle. + If "A > B" appears multiple times in a call chain, you + have no way to partition the one big sum of all appearances of "A > + B". Thus, for profile data presentation, all functions of a cycle are + seen as one big virtual function.</para> + + <para>Unfortunately, if you have an application using some callback + mechanism (like any GUI program), or even with normal polymorphism (as + in OO languages like C++), it's quite possible to get large cycles. + As it is often impossible to say anything about performance behaviour + inside of cycles, it is useful to introduce some mechanisms to avoid + cycles in call graphs at all. This is done by treating the same + function in different ways, depending on the current execution + context. Either by giving them different names, or by ignoring calls t= o + functions at all.</para> + + <para>There is an option to ignore calls to a function with + <option><xref linkend=3D"opt.fn-skip"/>=3Dfuncprefix</option>. E.g., = you + usually do not want to see the trampoline functions in the PLT section= s + for calls to functions in shared libraries. You can see the difference + if you profile with <option><xref linkend=3D"opt.skip-plt"/>=3Dno</opt= ion>. + If a call is ignored, cost events happening will be attached to the + enclosing function.</para> + + <para>If you have a recursive function, you can distinguish the first + 10 recursion levels by specifying + <option><xref linkend=3D"opt.fn-recursion-num"/>=3Dfuncprefix</option>= . =20 + Or for all functions with=20 + <option><xref linkend=3D"opt.fn-recursion"/>=3D10</option>, but this w= ill=20 + give you much bigger profile data files. In the profile data, you wil= l see + the recursion levels of "func" as the different functions with names + "func", "func'2", "func'3" and so on.</para> + + <para>If you have call chains "A > B > C" and "A > C > B" + in your program, you usually get a "false" cycle "B <> C". Use=20 + <option><xref linkend=3D"opt.fn-caller-num"/>=3DB</option>=20 + <option><xref linkend=3D"opt.fn-caller-num"/>=3DC</option>, + and functions "B" and "C" will be treated as different functions=20 + depending on the direct caller. Using the apostrophe for appending=20 + this "context" to the function name, you get "A > B'A > C'B"=20 + and "A > C'A > B'C", and there will be no cycle. Use=20 + <option><xref linkend=3D"opt.fn-caller"/>=3D3</option> to get a 2-call= er=20 + dependency for all functions. Again, this will multiplicate the=20 + profile data size.</para> + + </sect2> + +</sect1> + + +<sect1 id=3D"cl-manual.options" xreflabel=3D"Command line option referen= ce"> +<title>Command line option reference</title> + +<para> +This reference groups options into classes, and uses the same order as +the output as <computeroutput>callgrind --help</computeroutput>. +</para> + +<sect2 id=3D"cl-manual.options.misc"=20 + xreflabel=3D"Miscellaneous options"> +<title>Miscellaneous options</title> + +<variablelist id=3D"cmd-options.misc"> + + <varlistentry> + <term><option>--help</option></term> + <listitem> + <para>Show summary of options. This is a short version of this + manual section.</para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--version</option></term> + <listitem> + <para>Show version of callgrind.</para> + </listitem> + </varlistentry> + +</variablelist> +</sect2> + +<sect2 id=3D"cl-manual.options.creation"=20 + xreflabel=3D"Dump creation options"> +<title>Dump creation options</title> + +<para> +These options influence the name and format of the profile data files. +</para> + +<variablelist id=3D"cmd-options.creation"> + + <varlistentry id=3D"opt.base"> + <term> + <option><![CDATA[--base=3D<prefix> [default: callgrind.out] ]]></o= ption> + </term> + <listitem> + <para>Specify another base name for the dump file names. To + distinguish different profile runs of the same application, + <computeroutput>.<pid></computeroutput> is appended to the + base dump file name with + <computeroutput><pid></computeroutput> being the process ID + of the profile run (with multiple dumps happening, the file name + is modified further; see below).</para> <para>This option is + especially usefull if your application changes its working + directory. Usually, the dump file is generated in the current + working directory of the application at program termination. By + giving an absolute path with the base specification, you can force + a fixed directory for the dump files.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.dump-instr" xreflabel=3D"--dump-instr"> + <term> + <option><![CDATA[--dump-instr=3D<no|yes> [default: no] ]]></option= > + </term> + <listitem> + <para>This specifies that event count relation at instruction gran= ularity + should be available in the profile data file. This allows assemble= r + annotation, but currently can only be shown with KCachegrind.</par= a> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.dump-line" xreflabel=3D"--dump-line"> + <term> + <option><![CDATA[--dump-line=3D<no|yes> [default: yes] ]]></option= > + </term> + <listitem> + <para>This specifies that event count relation at source line gran= ularity + should be available in the profile data file. This allows source + annotation for source which was compiled with debug information ("= -g"). + This always should be enabled.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.compress-strings" xreflabel=3D"--compress-stri= ngs"> + <term> + <option><![CDATA[--compress-strings=3D<no|yes> [default: yes] ]]><= /option> + </term> + <listitem> + <para>This option influences the output format of the profile data= . + It specifies whether strings (file and function names) should be + identified by numbers. This shrinks the file size, but makes it mo= re difficult + to be read by humans (which is not recommand either way).</para> + <para>However, this currently has to be switched off if + the files are to be read by + <computeroutput>callgrind_annotate</computeroutput>!</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.compress-pos" xreflabel=3D"--compress-pos"> + <term> + <option><![CDATA[--compress-pos=3D<no|yes> [default: yes] ]]></opt= ion> + </term> + <listitem> + <para>This option influences the output format of the profile data= . + It specifies whether numerical positions are always specified as a= bsolute + values or are allowed to be relative to previous numbers. + This shrinks the file size,</para> + <para>However, this currently has to be switched off if + the files are to be read by + <computeroutput>callgrind_annotate</computeroutput>!</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.combine-dumps" xreflabel=3D"--combine-dumps"> + <term> + <option><![CDATA[--combine-dumps=3D<no|yes> [default: no] ]]></opt= ion> + </term> + <listitem> + <para>When multiple profile data parts are to be generated, these + parts are appended to the same output file if this option is set t= o + "yes". Not recommand.</para> + </listitem> + </varlistentry> + +</variablelist> +</sect2> + +<sect2 id=3D"cl-manual.options.activity"=20 + xreflabel=3D"Activity options"> +<title>Activity options</title> + +<para> +These options specify when different actions regarding event counts are = to +be executed. For interactive control use +<computeroutput>callgrind_control</computeroutput>. +</para> + +<variablelist id=3D"cmd-options.activity"> + + <varlistentry id=3D"opt.dump-every-bb" xreflabel=3D"--dump-every-bb"> + <term> + <option><![CDATA[--dump-every-bb=3D<count> [default: 0, never] ]]>= </option> + </term> + <listitem> + <para>Dump profile data each <count> basic blocks</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.dump-before" xreflabel=3D"--dump-before"> + <term> + <option><![CDATA[--dump-before=3D<prefix> ]]></option> + </term> + <listitem> + <para>Dump when entering a function starting with <prefix></= para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.zero-before" xreflabel=3D"--zero-before"> + <term> + <option><![CDATA[--zero-before=3D<prefix> ]]></option> + </term> + <listitem> + <para>Zero all costs when entering a function starting with <pr= efix></para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.dump-after" xreflabel=3D"--dump-after"> + <term> + <option><![CDATA[--dump-after=3D<prefix> ]]></option> + </term> + <listitem> + <para>Dump when leaving a function starting with <prefix></p= ara> + </listitem> + </varlistentry> + +</variablelist> +</sect2> + +<sect2 id=3D"cl-manual.options.collection" + xreflabel=3D"Data collection options"> +<title>Data collection options</title> + +<para> +These options specify when events are to be aggregated into event counts= . +Also see <xref linkend=3D"cl-manual.limits"/>.</para> + +<variablelist id=3D"cmd-options.collection"> + + <varlistentry id=3D"opt.instr-atstart" xreflabel=3D"--instr-atstart"> + <term> + <option><![CDATA[--instr-atstart=3D<yes|no> [default: no] ]]></opt= ion> + </term> + <listitem> + <para>Specify if you want Callgrind to start simulation and + profiling from the beginning. If not, Callgrind will not be able + to collect any information, including calls, but it will have at + most a slowdown of around 4, which is the minimum Valgrind + overhead. Instrumentation can be interactively switched on via + <computeroutput>callgrind_control -i on</computeroutput>.</para> + <para>Note that the resulting call graph will most probably not + contain <computeroutput>main</computeroutput>, but all the + functions executed after instrumentation was switched on. + Instrumentation can also programatically switched on/off. See the + Callgrind include file + <computeroutput><callgrind.h></computeroutput> for the macro + you have to use in your source code.</para> <para>For cache + simulation, results will be a little bit off when switching on + instrumentation later in the program run, as the simulator starts + with an empty cache at that moment. Switch on event collection + later to cope with this error.</para> + </listitem> + </varlistentry> + =20 + <varlistentry id=3D"opt.collect-atstart"> + <term> + <option><![CDATA[--collect-atstart=3D<yes|no> [default: yes] ]]></= option> + </term> + <listitem> + <para>Specify whether event collection is switched on at beginning + of the profile run.</para> + <para>To only look at parts of your program, you have two + possibilities:</para> + <orderedlist> + <listitem> + <para>Zero event counters before entering the program part you + want to profile, and dump the event counters to a file after + leaving that program part.</para> + </listitem> + <listitem> + <para>Switch on/off collection state as needed to only see + event counters happening while inside of the program part you + want to profile.</para> + </listitem> + </orderedlist> + <para>The second option can be used if the programm part you want = to + profile is called many times. Option 1, i.e. creating a lot of + dumps is not practical here.</para> <para>Collection state can be + toggled at entering and leaving of a given function with the + option <xref linkend=3D"opt.toggle-collect"/>. For this, collecti= on + state should be switched off at the beginning. Note that the + specification of <computeroutput>--toggle-collect</computeroutput> + implicitly sets + <computeroutput>--collect-state=3Dno</computeroutput>.</para> + <para>Collection state can be toggled also by using a Valgrind + User Request in your application. For this, include + <computeroutput>valgrind/callgrind.h</computeroutput> and specify + the macro + <computeroutput>CALLGRIND_TOGGLE_COLLECT</computeroutput> at the + needed positions. This only will have any effect if run under + supervision of the Callgrind tool.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.toggle-collect" xreflabel=3D"--toggle-collect"= > + <term> + <option><![CDATA[--toggle-collect=3D<prefix> ]]></option> + </term> + <listitem> + <para>Toggle collection on enter/leave a function starting with + <prefix>.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.collect-jumps" xreflabel=3D"--collect-jumps=3D= "> + <term> + <option><![CDATA[--collect-jumps=3D<no|yes> [default: no] ]]></opt= ion> + </term> + <listitem> + <para>This specifies whether information for (conditional) jumps + should be collected. Same as above, callgrind_annotate currently i= s not + able to show you the data. You have to use KCachegrind to get jump + arrows in the annotated code.</para> + </listitem> + </varlistentry> + +</variablelist> +</sect2> + +<sect2 id=3D"cl-manual.options.separation" + xreflabel=3D"Cost entity separation options"> +<title>Cost entity separation options</title> + +<para> +These options specify how event count relation to execution contexts sho= uld be +done. More specifically, this specifies e.g. if the recursion level or t= he +call chain leading to a function should be accounted for, are if the +thread ID should be remembered. +Also see <xref linkend=3D"cl-manual.cycles"/>.</para> + +<variablelist id=3D"cmd-options.separation"> + + <varlistentry id=3D"opt.separate-threads" xreflabel=3D"--separate-thre= ads"> + <term> + <option><![CDATA[--separate-threads=3D<no|yes> [default: no] ]]></= option> + </term> + <listitem> + <para>This option specifies whether profile data should be generat= ed + separately for every thread. If yes, the file names get "-threadID= " + appended.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.fn-recursion" xreflabel=3D"--fn-recursion"> + <term> + <option><![CDATA[--fn-recursion=3D<level> [default: 2] ]]></option= > + </term> + <listitem> + <para>Separate function recursions, maximal <level>. + See <xref linkend=3D"cl-manual.cycles"/>.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.fn-caller" xreflabel=3D"--fn-caller"> + <term> + <option><![CDATA[--fn-caller=3D<callers> [default: 0] ]]></option> + </term> + <listitem> + <para>Separate contexts by maximal <callers> functions in th= e + call chain. See <xref linkend=3D"cl-manual.cycles"/>.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.skip-plt" xreflabel=3D"--skip-plt"> + <term> + <option><![CDATA[--skip-plt=3D<no|yes> [default: yes] ]]></option> + </term> + <listitem> + <para>Ignore calls to/from PLT sections.</para> + </listitem> + </varlistentry> + =20 + <varlistentry id=3D"opt.fn-skip" xreflabel=3D"--fn-skip"> + <term> + <option><![CDATA[--fn-skip=3D<function> ]]></option> + </term> + <listitem> + <para>Ignore calls to/from a given function? E.g. if you have a + call chain A > B > C, and you specify function B to be + ignored, you will only see A > C.</para> + <para>This is very convenient to skip functions handling callback + behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want + to see the function emitting a signal to call the slots connected + to that signal. First, determine the real call chain to see the + functions needed to be skipped, then use this option.</para> + </listitem> + </varlistentry> + =20 + <varlistentry id=3D"opt.fn-group"> + <term> + <option><![CDATA[--fn-group<number>=3D<function> ]]></option> + </term> + <listitem> + <para>Put a function into a separation group. This influences the + context name for cycle avoidance. All functions inside of such a + group are treated as being the same for context name building, whi= ch + resembles the call chain leading to a context. By specifying funct= ion + groups with this option, you can shorten the context name, as func= tions + in the same group will not appear in sequence in the name. </para> + </listitem> + </varlistentry> + =20 + <varlistentry id=3D"opt.fn-recursion-num" xreflabel=3D"--fn-recursion1= 0"> + <term> + <option><![CDATA[--fn-recursion<number>=3D<function> ]]></option> + </term> + <listitem> + <para>Separate <number> recursions for <function>. + See <xref linkend=3D"cl-manual.cycles"/>.</para> + </listitem> + </varlistentry> + + <varlistentry id=3D"opt.fn-caller-num" xreflabel=3D"--fn-caller2"> + <term> + <option><![CDATA[--fn-caller<number>=3D<function> ]]></option> + </term> + <listitem> + <para>Separate <number> callers for <function>. + See <xref linkend=3D"cl-manual.cycles"/>.</para> + </listitem> + </varlistentry> + +</variablelist> +</sect2> + +<sect2 id=3D"cl-manual.options.simulation" + xreflabel=3D"Cache simulation options"> +<title>Cache simulation options</title> + +<variablelist id=3D"cmd-options.simulation"> + =20 + <varlistentry id=3D"opt.simulate-cache" xreflabel=3D"--simulate-cache"= > + <term> + <option><![CDATA[--simulate-cache=3D<yes|no> [default: no] ]]></op= tion> + </term> + <listitem> + <para>Specify if you want to do full cache simulation. Disabled by + default; only instruction read accesses will be profiled.</para> + <para>Note however, that estimating of how much real time your + pr... [truncated message content] |