|
From: Christoph B. <bar...@or...> - 2005-11-22 10:57:11
|
Hi,
I use the current SVN version of valgrind. On two of our machines running
RedHat Enterprise Linux 4 we get a strange error. On all other machines it
does not occur.
Essentially we have the following method:
static void
xrMessage()
{
/* Get time */
char t[24];
time_t timer;
time(&timer);
/* Write to stdout. */
strftime(t,24,"%b%d %H:%M:%S: ",localtime(&timer));
printf("%s\n", t);
}
On all machines besides the two ones we get here the current time in the
specified format. On the suspect ones localtime sometimes returns NULL or the
date is wrong.
When we run this code without valgrind or in a normal debugger this error does
not occur at all. Only when valgrind is involved.
Another strange thing is that within the same second (returned by time)
localtime has different results. But if we double the call to localtime both
calls are either wrong or correct.
There are no errors reported by valgrind before the wrong dates are printed.
Maybe this is an indication for an error on another location, but how can I
find it? Or is this an error inside of valgrind?
Greets,
Christoph Bartoschek
|
|
From: Christoph B. <bar...@or...> - 2005-11-22 12:07:29
|
I forgot to mention that I failed to extract a small testprogramm which shows the error. All small testprogramms work fine on all machines. Additionally the erros occur always on the same points in the programm. Christoph Bartoschek |
|
From: Dennis L. <pla...@in...> - 2005-11-22 12:21:27
|
At 13:08 22.11.2005, Christoph Bartoschek wrote: >I forgot to mention that I failed to extract a small testprogramm which shows >the error. All small testprogramms work fine on all machines. > >Additionally the erros occur always on the same points in the programm. Is this a multithreaded program ? Maybe the call is not thread-safe. greets Dennis Carpe quod tibi datum est |
|
From: Christoph B. <bar...@or...> - 2005-11-22 12:31:25
|
Am Dienstag, 22. November 2005 13:23 schrieb Dennis Lubert: > At 13:08 22.11.2005, Christoph Bartoschek wrote: > >I forgot to mention that I failed to extract a small testprogramm which > > shows the error. All small testprogramms work fine on all machines. > > > >Additionally the erros occur always on the same points in the programm. > > Is this a multithreaded program ? Maybe the call is not thread-safe. > > greets > > Dennis There is no multithreading or signaling or memory sharing involved. Christoph |
|
From: Igmar P. <mai...@jd...> - 2005-11-22 12:28:49
|
> I forgot to mention that I failed to extract a small testprogramm which shows > the error. All small testprogramms work fine on all machines. Smells like a multithread issue to me. If you use threads : Don't use time(), but use gettimeofday_r() instead. (and notice the _r postfix). Regards, Igmar |
|
From: Christoph B. <bar...@or...> - 2005-11-22 15:44:49
|
I've just found out that not locatime is incorrect. time() gives the wrong result. I did not recoginze it because the lower 32 bits of the value are correct and I casted the time_t value to int. After printing the correct unsigned long value I see that the results are sometimes wrong and they differ only in the upper 32 bits. The funny thing is that time_t val1; time_t val2; val1 = time(&val2); always gives the correct value for val1 but not for val2. Is valgrind somehow involved in setting the value for val2? Is valgrind emulating time()? ... 5 minutes later: Now I think valgrind is causing the error: I changed the sequence to the following code: time_t val2 = 0xAAAAAAAA00000000; time_t val1 = time(&val2); Now I always get an error with valgrind. And val2 has still 0xAAAAAAAA in the upper 32 bits. Is it possible that valgrind's time function only sets the lower 32 bits of the parameter? Christoph Bartoschek |
|
From: Julian S. <js...@ac...> - 2005-11-22 15:54:01
|
What arch is this on, x86 or amd64? J On Tuesday 22 November 2005 15:45, Christoph Bartoschek wrote: > I've just found out that not locatime is incorrect. time() gives the wrong > result. |
|
From: Christoph B. <bar...@or...> - 2005-11-22 16:01:08
|
Am Dienstag, 22. November 2005 16:45 schrieb Julian Seward: > What arch is this on, x86 or amd64? > > J uname -a Linux rheinmetall 2.4.21-32.ELsmp #1 SMP Fri Apr 15 21:03:28 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux cat /etc/issue Red Hat Enterprise Linux WS release 3 (Taroon Update 6) Kernel \r on an \m Christoph |
|
From: Christoph B. <bar...@or...> - 2005-11-22 15:50:30
|
Now I have a small testprogramm:
extern "C" {
#include <time.h>
}
#include <iostream>
int main() {
time_t timer = 0xAAAAAAAA00000000;
time(&timer);
std::cout << timer << std::endl;
}
Christoph Bartoschek
|
|
From: Tom H. <to...@co...> - 2005-11-22 16:14:34
|
In message <200...@or...>
Christoph Bartoschek <bar...@or...> wrote:
> Now I have a small testprogramm:
>
> extern "C" {
> #include <time.h>
> }
> #include <iostream>
>
> int main() {
> time_t timer = 0xAAAAAAAA00000000;
> time(&timer);
> std::cout << timer << std::endl;
>
> }
Hmm. That works for me on a 64 bit machine:
dellow [~/vgtest] % uname -a
Linux dellow.uk.cyberscience.com 2.6.13-1.1532_FC4smp #1 SMP Thu Oct 20 01:42:06 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
dellow [~/vgtest] % ./time
0x43834387
dellow [~/vgtest] % /tmp/valgrind-debug/bin/valgrind -q ./time
0x43834388
(mine is a C program using %lx to print the time).
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christoph B. <bar...@or...> - 2005-11-22 16:20:13
|
Am Dienstag, 22. November 2005 17:14 schrieb Tom Hughes: > Hmm. That works for me on a 64 bit machine: > > dellow [~/vgtest] % uname -a > Linux dellow.uk.cyberscience.com 2.6.13-1.1532_FC4smp #1 SMP Thu Oct 20 > 01:42:06 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux dellow [~/vgtest] % ./time > 0x43834387 > dellow [~/vgtest] % /tmp/valgrind-debug/bin/valgrind -q ./time > 0x43834388 > > (mine is a C program using %lx to print the time). As I said. It only fails on our RedHat Enterprise Linux 3 machines with kernel 2.4.x. Not on SuSe 64bit machines with 2.6.x kernel. Christoph |
|
From: Dirk M. <dm...@gm...> - 2005-11-22 16:27:32
|
On Tuesday 22 November 2005 17:21, Christoph Bartoschek wrote: > As I said. It only fails on our RedHat Enterprise Linux 3 machines with > kernel 2.4.x. Not on SuSe 64bit machines with 2.6.x kernel. It could be that kernel 2.4.x behaved differently and valgrind isn't able to cope with that. Dirk |
|
From: Tom H. <to...@co...> - 2005-11-22 16:27:49
|
In message <200...@or...>
Christoph Bartoschek <bar...@or...> wrote:
> Am Dienstag, 22. November 2005 17:14 schrieb Tom Hughes:
>
>> Hmm. That works for me on a 64 bit machine:
>>
>> dellow [~/vgtest] % uname -a
>> Linux dellow.uk.cyberscience.com 2.6.13-1.1532_FC4smp #1 SMP Thu Oct 20
>> 01:42:06 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux dellow [~/vgtest] % ./time
>> 0x43834387
>> dellow [~/vgtest] % /tmp/valgrind-debug/bin/valgrind -q ./time
>> 0x43834388
>>
>> (mine is a C program using %lx to print the time).
>
> As I said. It only fails on our RedHat Enterprise Linux 3 machines with
> kernel 2.4.x. Not on SuSe 64bit machines with 2.6.x kernel.
Ah right. Unfortunately I haven't got any 64 bit machines with 2.4
kernels on them anymore.
Do you actually see a call to time in the system call trace if you
use --trace-syscalls=yes?
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christoph B. <bar...@or...> - 2005-11-22 16:35:40
|
Am Dienstag, 22. November 2005 17:27 schrieb Tom Hughes: > Ah right. Unfortunately I haven't got any 64 bit machines with 2.4 > kernels on them anymore. > > Do you actually see a call to time in the system call trace if you > use --trace-syscalls=yes? Yes, SYSCALL[19858,1](201) sys_time ( 0x7FF000378 )[sync] --> Success(0x438348BC) |
|
From: Tom H. <to...@co...> - 2005-11-23 10:58:01
|
In message <200...@or...>
Christoph Bartoschek <bar...@or...> wrote:
> Am Dienstag, 22. November 2005 17:27 schrieb Tom Hughes:
>> Ah right. Unfortunately I haven't got any 64 bit machines with 2.4
>> kernels on them anymore.
>>
>> Do you actually see a call to time in the system call trace if you
>> use --trace-syscalls=yes?
>
> Yes,
>
> SYSCALL[19858,1](201) sys_time ( 0x7FF000378 )[sync] --> Success(0x438348BC)
This is a bug in your kernel - in 2.4.21 the time system call is
mapped to sys_time which treats it's argument as a pointer to an int
and only fills in 4 bytes.
In 2.4.32 it has been changed to use sys_time64 in the x86_64 arch
specific code and that treats the pointer as a long point and fills
in all 8 bytes.
The reason you don't normally see this is that time is a vsyscall so
is normally done in user space but valgrind deliberately disables the
vsyscalls and forces them to be done by real system calls to the
kernel.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christoph B. <bar...@or...> - 2005-11-23 11:07:55
|
Am Mittwoch, 23. November 2005 11:57 schrieb Tom Hughes: > In message <200...@or...> > > Christoph Bartoschek <bar...@or...> wrote: > > Am Dienstag, 22. November 2005 17:27 schrieb Tom Hughes: > >> Ah right. Unfortunately I haven't got any 64 bit machines with 2.4 > >> kernels on them anymore. > >> > >> Do you actually see a call to time in the system call trace if you > >> use --trace-syscalls=yes? > > > > Yes, > > > > SYSCALL[19858,1](201) sys_time ( 0x7FF000378 )[sync] --> > > Success(0x438348BC) > > This is a bug in your kernel - in 2.4.21 the time system call is > mapped to sys_time which treats it's argument as a pointer to an int > and only fills in 4 bytes. > > In 2.4.32 it has been changed to use sys_time64 in the x86_64 arch > specific code and that treats the pointer as a long point and fills > in all 8 bytes. > > The reason you don't normally see this is that time is a vsyscall so > is normally done in user space but valgrind deliberately disables the > vsyscalls and forces them to be done by real system calls to the > kernel. > Ok, thanks. Christoph |
|
From: Tom H. <to...@co...> - 2005-11-23 19:27:28
Attachments:
valgrind-time.patch
|
In message <200...@or...>
Christoph Bartoschek <bar...@or...> wrote:
> Am Mittwoch, 23. November 2005 11:57 schrieb Tom Hughes:
>
> > This is a bug in your kernel - in 2.4.21 the time system call is
> > mapped to sys_time which treats it's argument as a pointer to an int
> > and only fills in 4 bytes.
> >
> > In 2.4.32 it has been changed to use sys_time64 in the x86_64 arch
> > specific code and that treats the pointer as a long point and fills
> > in all 8 bytes.
> >
> > The reason you don't normally see this is that time is a vsyscall so
> > is normally done in user space but valgrind deliberately disables the
> > vsyscalls and forces them to be done by real system calls to the
> > kernel.
>
> Ok, thanks.
Try this patch - it tweaks our wrapper for time to update the
value pointed at by the argument using the return value.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christoph B. <bar...@or...> - 2005-11-25 09:59:47
|
Am Mittwoch, 23. November 2005 20:27 schrieb Tom Hughes: > > Try this patch - it tweaks our wrapper for time to update the > value pointed at by the argument using the return value. > This patch works, thanks. Christoph |