|
From: Paulo E. C. <pau...@gm...> - 2013-07-16 17:29:22
|
On 16/07/13 16:39, doug sanden wrote: > Paulo, >> I can confirm that the fixtures generated seem different per machine.... >> Even when testing in two similar architectures "Linux" the fixtures >> seemed to differ. >> I think this might be down to a combination of library versions, graphic >> cards etc ... > I have bit differences in 20% of comparisons on a single win32 platform, > but they are scattered across the image -not header related- and larger > color spreads on textured objects. > That's why I use the graphicsMagick mean square error measure. Ah, btw, I noticed that when running the test in the console is best if I you're not interacting physically with it as it seems to generate different images here and there. >> Further, I commented the code that was writing the header file of the >> bmp file and that seemed to generate more stable results across runs. >> At least I don't have any bit differences between runs. > The header issue is puzzling to me. I should have zeroed the header, and put only useful stable numbers in it. There are a few gotchas that I didn't look into: struct alignment -if a compiler pads a struct to an even 8 bytes then I write out the struct as a binary blob I'll get padding, which is bad. And intel vs motorola byte ordering - I'm not sure how to fix that. > > 64 BIT VS 32 BIT, INTEL VS MOTOROLA > > The padding can be fixed by writing each element of the struct out separately. And do I have the right data types to avoid 64bit ints? One way to tell is to try the graphicsmagick or gimp readers on each platform and see if they can read my header generated on various platforms, assuming gimp and graphicsmagick have the right idea. > (Right now I have only 32bit hardware and compilers, and opengl 2 only on one machine) I mentioned before I was having trouble opening the .bmp files being generated. Not even graphicsmagic would do the trick. But honestly I was not really interested in opening them, the point was just to prove that one could generate the fixtures and test them for comparison even if restricted to the same machine they were being generated in. >> I used a different approach though ... >> >> First I commented that time reduction hack so that I could record things >> smoothly. >> https://github.com/pecastro/freewrl/commit/4926a80062179dd1223296537eb45bf04b9a17eb >> >> Using recording mode, I manually created one master .fwplay whilst >> moving in the scenegraph back and forth rotating etc and in between each >> move taking some manual snapshots. >> This master file was created using the freewrl/tests/2.wrl >> https://github.com/pecastro/freewrl/commit/044850835cc7d56e4f6c2b52360485134620e4d3 >> >> Then I used this .fwplay file to test all of the .wrl/x3d files. >> I iterate over the list of .wrl files to be tested and for each one I >> amend the .fwplay file, and substitute the scenefile for the current >> file being tested. >> I then run freewrl in playback mode for that specific .wrl file. >> I kept all the headerless .bmp files generated and committed them to a >> local branch. >> >> Then I built freewrl from scratch and ran exactly the same procedure of >> running the script in playback mode and in the end it was a matter of >> asking my source control system if any of those .bmp files had changed. > Great idea. > So summary: this method is good for checking geometry rendering changes in detail, but doesn't allow for scene-specific mouse or keyboard input -except see below^- such as clicking something. > What it's good for: > a) inspecting scene rendering in finer detail > b) using only one fwplay, with high-fidelity avatar movements, to view each scene using the same avatar movements for each scene Yes, for scene specific testing I suppose the methodology of having a .fwplay per test would be preferable... The point of this exercise was to prove the possibility and understand the requirements. > > ^It might also be very helpful when you have one giant test scene with everything in it -or everything that's of interest for your development changes- and you use just that one scene for testing. In that case you can do clicking and keyboarding for that scene. For example we have 52 tests, and you could put them all side-by-side in a single scene file, then navigate the avatar between them during a test run. I think so, though I'm not familiar on how you'd accomplish that. > > Q. if you have 100 test scenes, how long does it take to run the test on all of them? > (the degraded-frame-rate, one-fwplay-per-small-scene takes 3 minutes to playback and compare 104 scenes on a 32 bit pentium) Testing against all the freewrl/test files takes ~30 min. On a headless machine, this is, testing in a remote session would take up to 3 hours as the rendering is done by software rather then using the graphics card. The time the test took to run is of course related with the quantity of things that the .fwplay is doing. Also, the biggest chunk of time is spent initializing a new instance of freewrl for each test file. The real point of this was to rehearse the possibility of having a headless machine running the test suite in a Continuous Integration style. Hence, the time it takes to run the test or tests is not necessarily important right now, the test could be running in a machine somewhere independent of local development smoking changes as they're committed to the repository. This kind of testing is more like a safety net against nefarious changes that would impact the perceived visual rendering. As a side note, my headerless bmp files differ between console snapshots and remote session snapshots. > Thanks for the link. And the ideas. I wonder if the two methods can be combined somehow, perhaps as options / parameters, so developers can conveniently choose. Doug, you were right when you said in reply to John that "This type of testing won't say if something is right or wrong, only if something changed." This is just black box testing and it's far away from any form of unit testing.... but that said, I do think it's better than nothing. My suggestion for an approach going forward is the following: To pick the most iconic .wrl/x3d file in terms of testing features ( movement, texture, spatial orientation, whatever ) as I'm assuming that some features are common amongst many of the test files. Create specific .fwplay files for each using an extended version of the methodology that I used, ( exercising the various modes, moving about whilst stopping to take snapshots at points in time ) Use those .fwplay files and generate those BMP features, either correct BMPs or the headerless BMPs just caring about the pixel frame, or striking the BMP file format all together and just write the pixel representation to a file which actually sounds similar to what commenting the header was achieving. Once there's an agreed .fwplay for each most iconic .wrl file, features could be generated in the different platforms by running playback mode and committed or kept somewhere for reference. Then it would be a matter of running the test script that would go about picking each .fwplay in playback mode, generating the snapshots and in the end compare them. CONS: - The overall size of features; 6 headerless bmp snapshots times 58 test files equals roughly 160MB on my machine ( Fedora Linux ). - Not yet 100% sure of the feasibility of this kind of testing VS O.S. updates, package/library updates or other minor changes in freewrl - Not entirely sure this is the route you'd want to go as a team... PROS: - Some assurance against future changes - Most of it seems to be done, now its just a matter of gluing the pieces together. - Some automated testing rather than manual |
|
From: doug s. <hig...@ho...> - 2013-07-16 18:19:08
|
>> That's why I use the graphicsMagick mean square error measure. > Ah, btw, I noticed that when running the test in the console is best if > I you're not interacting physically with it as it seems to generate > different images here and there. I noticed that too, on linux, and wondered if the mouse cursor steals a bit, and it maybe it shows up as the low bit. If so, then can you move the mouse cursor off before hitting the 'x' snapshot? The graphicsmagick mse measure shows 0.0 if its just a few dozen low order bits. >> >> 64 BIT VS 32 BIT, INTEL VS MOTOROLA > I mentioned before I was having trouble opening the .bmp files being > generated. Not even graphicsmagic would do the trick. > But honestly I was not really interested in opening them, the point was > just to prove that one could generate the fixtures and test them for > comparison even if restricted to the same machine they were being > generated in. If you -or developers who use your method- switch to the graphicsmagick mse you'll need a proper image file format with a header (to give gm the height x width etc). > > As a side note, my headerless bmp files differ between console snapshots > and remote session snapshots. Interesting. I wonder what the difference is. > Doug, you were right when you said in reply to John that "This type of > testing won't say if something is right or wrong, only if something > changed." > This is just black box testing and it's far away from any form of unit > testing.... but that said, I do think it's better than nothing. > My suggestion for an approach going forward is the following: > > To pick the most iconic .wrl/x3d file in terms of testing features ( > and committed or kept somewhere for reference. > ... > Then it would be a matter of running the test script that would go about > picking each .fwplay in playback mode, generating the snapshots and in > the end compare them. OK sounds generally good, although I don't have any comprehensive/iconic scenes prepared - mostly hundreds of small test files (which I need to spend days to sort through to find good, orthogonal ones). And we do make improvements. Different developers will have their own special test scenes. So developers need to have a convenient way to locally re-generate fixtures for their special/private scenes. |