Diff of /op_report [000000] .. [8c28c9] Maximize Restore

  Switch to unified view

a b/op_report
1
2002-01-29  John Levon  <moz@compsoc.man.ac.uk>
2
3
  * start this document
4
 
5
Design for new op_report
6
------------------------
7
8
1. Why a re-design ?
9
--------------------
10
11
It's worth asking why a re design is worth it. There are a number of reasons.
12
The current pp code is in part based on very old crappy code by me. As we have
13
rounded out our idea of what sort of use cases pp has, several "hack" solutions
14
have been added. Whilst the code generally works pretty well, there are several
15
things we would like to do that are made difficult by the current architecture.
16
For me, the code would be improved by taking a more object-oriented approach.
17
18
A redesign would prove advantageous because we now know a lot better what things
19
we want to do. Things like merge and diff can be built in to the design, rather
20
than bolt-plated on. Working code can still be used where sensible, but at the
21
same time we can do things like turning the symbol processing on its head, as
22
has been discussed before.
23
24
Whilst the current code works well, its maintainability is becoming suspect. These
25
are really problems at the design level. Hence I have started this document in
26
an attempt to provide a rationale for a new design, what we want to do, and how
27
we decide to do it.
28
 
29
Is this a radical re-write of everything ? I don't really think so. It is more a
30
sensible re-organisation of the "meat" code, and the addition of a flexible
31
architecture for triggering the "meat" code.
32
 
33
2. Design aims
34
--------------
35
36
This section will discuss our basic design aims for the pp code.
37
38
1) flexibility
39
40
We want to be as flexible as possible. For example, the design, the code, and the
41
interface should allow inspection of an arbitrary number of symbols, images, sessions,
42
etc. I should be able to do "op_time" on a subset of profiles and have the right
43
results.
44
45
2) convenience
46
47
The user wants results not hassle. We should avoid, for example, requiring the user
48
to specify sample filenames. We should cope with failed bfd_openr's, etc.
49
50
3) performance
51
52
some of our operations are inherently slow, but this doesn't mean we should ignore
53
performance.
54
55
4) readability and clarity
56
57
The output should be readable and clear. Where possible, it should be machine-readable
58
as well. We should not output useless or misleading information (consider current
59
oprofpp -kl /lib/libc-2.2.2.so).
60
 
61
3. Feature requirements
62
-----------------------
63
64
Image summaries
65
66
The user should be able to request summaries of an arbitrary selection of images, with
67
percentages to match. These should (automatically ?) show attached shared library image
68
summaries where applicable.
69
 
70
Symbol summaries
71
72
The user should be able to request symbol-based summaries for an arbitrary number of
73
images. Sorting must be possible on a symbol basis. Anonymous symbols must be catered for
74
and displayed (including symbol-less images). Demangling must be supported in all cases.
75
76
Symbol details
77
78
The user should be able to request symbol profiles for an arbitrary number of symbols over
79
any set of images. This should include a raw output for further text processing. It should
80
include a disassemble option.
81
82
File range details
83
84
The user should be able to request profiles for an arbitrary interval in image profiles,
85
along with the usual disassembly options etc.
86
87
Counters
88
89
All operations must be capable of displaying all counters, including a calculated "total count"
90
across all counters. Any sorting operation must be able to arbitrarily sort by a particular
91
counter or the total.
92
93
Header checks
94
95
Headers must be checked and displayed. 
96
97
Sessions
98
 
99
We must have simple support for specifying one or more sessions to operate on.
100
101
Merging
102
103
We must support merging of profiles for a given image. This includes merging profiles which are
104
not strictly incompatible (e.g. merge counter0 profile with counter1 profile)
105
106
Difference
107
108
The difference between two profiles of an arbitrary number of images should be possible. This
109
should be done on the symbol level to allow for changes in the image layout. Session diffs should
110
be possible on a summary and detailed level.
111
112
Source generation
113
114
We should be able to generate annotated source code and mixed source/asm for any input that makes sense.
115
116
Exclusion/inclusion
117
118
We should allow for the arbitrary inclusion/exclusion of both images and symbols in all cases. We should
119
allow cut-off percentages as well (which do not affect the calculated relative percent).
120
121
4. Suggested interface
122
----------------------
123
124
I think we should have one binary "op_report" that is used for everything.
125
126
We should support "./binary", "/bin/binary", and the sample file as we do right now
127
as identifying an image (share-samples is a special case here). We need arbitrary
128
inclusion and exclusion lists. Something like :
129
130
op_report ... --images ALL --exclude-images /lib/libc-2.2.so,./binary
131
132
Note the special case "ALL". We could perhaps support a "KERNEL", a "SHARED", etc.
133
134
For the shared-samples case, we need to consider how to deal with this. I think that :
135
136
op_report /my/bin
137
138
should by default consider its shared libraries. We may need to support a --merge,
139
or we could do it automatically. 
140
141
We can also add an :
142
143
  --ignore-images blah
144
145
option. This would process the images anyway, but not include their details in the out
146
put. (to clarify, --exclude-images affects the percentage counts, --ignore-images does not).
147
 
148
We can use the same for symbols.
149
150
op_report --symbols ALL --exclude-symbols fgetc --ignore-symbols blah / --ignore-less-than / --exclude-less-than
151
etc.
152
153
154
For sorting purposes, something like :
155
156
  --sort reverse,1
157
158
      list in reverse order, primarily by counter 1 value
159
160
  --sort t 
161
162
      sort by total value of counter 0 + counter 1 ...
163
164
  --sort i / --sort -s
165
166
      Make the primary sort image / symbol based.
167
168
169
For diffing :
170
171
op_report --diff -r 0 -r 1 /lib/libc.so
172
173
 The difference report for current session and session 1 of the image. Maybe need --merge too.
174
We want to be able to diff two library images from shared-samples I suppose - how would we go
175
about this ? maybe :
176
177
op_report --diff /bin/ls:/lib/libc.so /bin/make:/lib/libc.so
178
179
? Should we make them specifiable like this ?
180
181
Listing - without any calculations, what is available ? and simple stats. e.g. 
182
183
184
op_report --list ALL        (ALL is optional)
185
186
session3/ [Date XX/XX/2001 XX:XX, finished XX...]
187
  /lib/libc-2.2.so: X big, symbols found, counter setup was blah blah
188
  ...
189
current/
190
  /blah/blah
191
192
? 
193
 
194
I think an interface similar to this is pretty nice - it's simple and clear enough.
195
Shared samples is probably going to be the most confusing part. 
196
 
197
I saw an interesting thing on source forge for basic single-file monolithic SQL92 code
198
that doesn't require client/server. Would it be totally insane to populate such a database
199
like this ? It would allow truly arbitrary requests, and as a by-product would provide
200
the results caching that would be nice. Alternatively we can hand code a similar thing
201
just for caching results.
202
203
 
204
5. OO design
205
------------
206
207
The primary object we have is the image. Every other object is related to one and exactly
208
one image. (Note I am treating the same symbol in different versions of the same image as
209
essentially separate, although during the diff we compare them).
210
211
  class image {
212
      string path;
213
      bfd bfd;
214
      list<profile> profiles;
215
  }
216
 
217
Now each image has any number of profiles :
218
219
  class profile {
220
      image & image;
221
      sample_file file;
222
      session_nr session;
223
      container<symbol_profile> symbols;
224
      counters_t totals;
225
  }
226
227
  class sample_file {
228
      void * mmap;
229
      // ... 
230
  }
231
 
232
  class counters_t {
233
      ulong total;
234
      ulong val[OP_MAX_COUNTERS];
235
  }
236
 
237
  class symbol_profile { 
238
      bfd_sym & sym;
239
      string pretty_name, name;
240
      ulong size;
241
      vector<counters_t> vals[size];
242
      counters_t totals;
243
  }
244
 
245
So the class profile constructor will open and parse the sample file, creating the symbol_profile's
246
as necessary.
247
248
How to represent shared-samples relationship between primary image and shared libs (and kernel profile) ?
249
How to represent stuff for source annotation ?
250
 
251
The summaries are now simple iterators across images and profiles, that carry exclude info etc. as necessary.
252
253
The diff operation means pairing up symbols in two profiles to generate a new thing perhaps class profile_diff
254
255
Merging is probably the same operation in reverse.
256
257
Sorting operations is just sorting a container of profiles/symbol_profiles by the predicate.