|
From: Adnan K. <Adn...@ne...> - 2005-06-20 18:29:40
|
Hi there, I'm a hardware architect and I'd like to use Valgrind to address an old = problem that I've typically faced - the collection of instruction = execution traces. Although I'm not entirely sure but I doubt if very many people outside = of the processor design fraternity have this problem. I've been using = Valgrind in=20 conjunction with the cachegrind tool to collect a list of load/store = memory address references which I want to feed into a cycle accurate = latency simulator. Traditionally, collection of Instruction traces has been a tedious = process usually being hardware based - however faster and faster = processors have relegated this technique to obsolesce as hardware monitoring devices = aren't fast enough not to mention, most processors don't offer a direct = method of collecting the instructions executed.=20 In researching alternatives to the above mentioned techniques, I've come = across a few tools that execute the application in a virtual = environment. Of the several that I have evaluated, I have to admit that although a = couple of tools may have been faster, but NONE has even come close in = terms of reliability and stability of Valgrind.=20 So as a simple test, I examined the cachegrind code and uncommented the = debug printf statements in log_*_cache_access() function calls. Actually = this worked quite well and I it produced exactly what I was looking for. In fact I = even managed to use Valgrind 2.4 to run the entire SpecJBB2000 suite to = completion=20 which is no small feat!=20 So here are my questions that I wanted to pose to the developers 1. Do you foresee any problems with the way I've used cachegrind to = generate an address reference stream? I'm specifically interested in the = fidelity of the addresses and any possible "differences" between the native execution of the = application. I know this sounds silly especially since the application = does execute but nonetheless I haven't gone through all of the Valgrind code to ascertain this. 2. I would also be very interested in differentiating process switches. = I need this because I want to collect traces in a Uniprocessor system = and then play the role of the OS in my multiprocessor latency simulator so that I can schedule threads to different = processors. Not sure if this is entirely possible and to the accuracy of = the results but its something I'd like to try. I know the CR3 register keeps track of the page table of an application and different values in = the CR3 register indicate a process swap (correct me if I'm wrong). At = the very least, this would be useful for my purpose. If you=20 would have a different suggestion as to how I may accomplish this, I = would truly appreciate this. Any other pointers etc would be extremely helpful. Regards Adnan |