From: Anton B. <an...@sa...> - 2004-03-19 14:38:41
|
Hi, As alluded to in my last email, Im seeing a problem where oprofiled will die in sfile_log_sample. A backtrace is at the end of the email. I poked around a bit and noticed it was processing samples in /bin/bash w= hen the problem occurred. Some debugging showed we had two different cookies that pointed to /bin/b= ash and that we set up two separate struct sfiles (both pointing to the same sample file): cookie c035e2a0 /var/lib/oprofile/samples/current/{root}//bin/bash/{dep}/{root}//bin/bash= /TIMER.0.0.all.all.all cookie d7bddf20 /var/lib/oprofile/samples/current/{root}//bin/bash/{dep}/{root}//bin/bash= /TIMER.0.0.all.all.all The installed version of bash has an inode number of 1501678: # ls -li /bin/bash 1501678 -rwxr-xr-x 1 root root 733508 Feb 23 00:42 /bin/bash And it turns out not everyone who has /bin/bash open has that inode: # cat /proc/8795/maps | grep bash 10000000-100a6000 r-xp 00000000 03:0c 1501678 /bin/bash # cat /proc/1202/maps | grep bash 10000000-100a6000 r-xp 00000000 03:0c 2187594 /bin/bash At this stage its pretty obvious prelink is causing my problems, it replaced the old prelinked bash with a newer one. Now we have two different things in oprofile operating on the same sample file. Eventually the hash code gets confused and oprofiled exits. How should this problem be handled? It would be good to be able to detect if the file is deleted or at the very least continue in the presence of these errors. Thoughts? Anton -- #3 0x100046d4 in sfile_log_sample (trans=3D0x48156d78) at opd_sfile.c:34= 3 err =3D 7 pc =3D 5194226692075320537 file =3D (samples_odb_t *) 0x4815b8fc #4 0x100052f4 in opd_put_sample (trans=3D0xbffff7d0, pc=3D879052) at opd_trans.c:99 event =3D 2202244478516 #5 0x1000584c in opd_process_samples ( buffer=3D0x200 <Address 0x200 out of bounds>, count=3D3221222964) at opd_trans.c:268 trans =3D {buffer =3D 0x481702e4 "=FF=FF=FF=FF", remaining =3D 78121,=20 tracing =3D TRACING_OFF, current =3D 0x1004c878, last =3D 0x1003dfd0,=20 cookie =3D 3506415136, app_cookie =3D 3524271712, pc =3D 879052, last_p= c =3D 170568,=20 event =3D 0, in_kernel =3D 0, cpu =3D 0, tid =3D 10149, tgid =3D 10149} code =3D 0 #6 0x100027a4 in opd_do_samples (opd_buf=3D0xbffff7d0 "H\027\002=E4",=20 count=3D-1073744332) at init.c:127 num =3D 268566528 #7 0x100028a8 in opd_do_read (buf=3D0x10020084 "\020", size=3D1209383164= ) at init.c:177 count =3D -1073743920 #8 0x100037c8 in main (argc=3D8, argv=3D0xbffff924) at oprofiled.c:473 err =3D 512 rlim =3D {rlim_cur =3D 2048, rlim_max =3D 2048} |
From: John L. <le...@mo...> - 2004-03-19 16:15:43
|
On Sat, Mar 20, 2004 at 01:33:30AM +1100, Anton Blanchard wrote: > Some debugging showed we had two different cookies that pointed to /bin/bash > and that we set up two separate struct sfiles (both pointing to the same > sample file): Ouch. > cookie c035e2a0 > /var/lib/oprofile/samples/current/{root}//bin/bash/{dep}/{root}//bin/bash/TIMER.0.0.all.all.all > > cookie d7bddf20 > /var/lib/oprofile/samples/current/{root}//bin/bash/{dep}/{root}//bin/bash/TIMER.0.0.all.all.all I must admit I didn't consider this case properly. We simply cannot handle dcache aliasing at all (yes, we suck). > How should this problem be handled? It would be good to be able to > detect if the file is deleted The problem is not deletion, it's that it was deleted and then a different inode is found at the same path. Currently I cannot think of any solution other than a /full/ hash table search for cookie aliases somehow. There's just no way to detect failure (opening the same sample file twice) further down the line... regards john |
From: Philippe E. <ph...@wa...> - 2004-03-19 20:14:04
|
On Fri, 19 Mar 2004 at 16:14 +0000, John Levon wrote: > > cookie c035e2a0 > > .../{dep}/{root}//bin/bash/TIMER.0.0.all.all.all > > > > cookie d7bddf20 > > .../{dep}/{root}//bin/bash/TIMER.0.0.all.all.all > > I must admit I didn't consider this case properly. We simply cannot > handle dcache aliasing at all (yes, we suck). We can't accept aliasing, handling aliasing can be done in a simple way: add a level of direction to samples_odb_t: struct samples_odb_t { samples_odb_data * data; }; struct samples_odb_data { /* all data that was in old samples_odb_t struct */ const char * filename; int ref_count; samples_odb_t * next_hash; }; then hash by sample filename to retrieve an existing sample file and use ref counting. But now we accept invalid data: samples from distinct source going to the same samples files ... > > How should this problem be handled? It would be good to be able to > > detect if the file is deleted well, it's not difficult we can either: - abort() - augment the sample filename with an uid (the cookie ?) - delete the old sample file, mark this sfile as "closed and can't be reopended", samples going to the old inode will be ignored. the last seems fine, anyway the user has no way to access this deleted inode so he can't use these samples. regards, Phil |