until a few days ago i was able to go to:
/nagios/cgi-bin/perfparse.cgi?all_bin=1&group_name=**ALL**
With no issues.
I started getting segfaults yesterday and the page gets
cut at a particular host. I purge all data from the db
in reference to the last host printed, and after that
things started working again.
Now (today), i got the same segfault about some other host.
Running perfparse.cgi from the command line i get the
following:
QUERY_STRING='all_bin=1&group_name=**ALL**'
/usr/lib/nagios/cgi/perfparse.cgi
[a bunch of HTML... good stuff]
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2> runtime=s </td>
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2> UP </td></tr>
Segmentation fault
strace gave me:
write(1, " <td bgcolor=\"#FFFFFF\"><font fa"..., 104
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2><nobr> 2006-03-02 14:00:50 </td>
) = 104
write(1, " <td bgcolor=\"#FFFFFF\" align=ri"..., 92
<td bgcolor="#FFFFFF" align=right><font face="Arial,
Helvetica" size=2> 0 </td>
) = 92
write(1, " <td bgcolor=\"#FFFFFF\"><font fa"..., 88
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2> runtime=s </td>
) = 88
write(1, " <td bgcolor=\"#FFFFFF\"><font fa"..., 81
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2> UP </td>
) = 81
write(1, "</tr>\n", 6</tr>
) = 6
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
And finally, gdb gave me:
$> QUERY_STRING='all_bin=1&group_name=**ALL**' gdb
/usr/lib/nagios/cgi/perfparse.cgi
<td bgcolor="#FFFFFF"><font face="Arial, Helvetica"
size=2> UP </td></tr>
Program received signal SIGSEGV, Segmentation fault.
0x0017e045 in mysql_fetch_row () from
/usr/lib/mysql/libmysqlclient.so.14
(gdb) bt
#0 0x0017e045 in mysql_fetch_row () from
/usr/lib/mysql/libmysqlclient.so.14
#1 0x0804bab8 in displayAllBin () at cgi_bin_report.c:125
#2 0x0804a4c5 in main (argc=1, argv=0xbfd1ee14) at
perfgraph.c:179
(gdb)
I make clean and re-configure/re-compile all; then
re-install to make sure i'm not hitting a lemon
mis-match library somewhere. Same issue.
In cgi_bin_report.c I did the following changes and
that made things a bit better:
169
170 printf("%s\n","</tr>");
171 172 }
173
174 printf("%s\n","</table>");
175
176 177 }
Before that, the crash was in line 170 of that file.
I'm sure that shouldn't matter because the crash seems
to come from the mysql client library itself.
$> profile-computer
#==============================================================================#
# profile-computer 1.15 Luis Mondesi <lemsx1@gmail.com>
# http://lems.kiskeyix.org/toolbox/?f=profile-computer&d=1
#==============================================================================#
Host Name: venus.dev.americanhm.com
System Kernel: Linux venus.dev.americanhm.com
2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13 21:46:01 EST
2005 i686 i686 i386 GNU/Linux
#==============================================================================#
CPU Info: Pentium III (Coppermine)
Total Processors: 2
Bogomips total: 3383
#==============================================================================#
Memory: 1034384 kB
Virtual Memory (swap): 2031608 kB
#==============================================================================#
Host bridge: Intel Corporation 440GX - 82443GX Host bridge
PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
SCSI storage controller: Adaptec AIC-7896U2/7897U2
SCSI storage controller: Adaptec AIC-7896U2/7897U2
Ethernet controller: Intel Corporation 82557/8/9
[Ethernet Pro 100] (rev 08)
ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA
(rev 02)
IDE interface: Intel Corporation 82371AB/EB/MB PIIX4
IDE (rev 01)
USB Controller: Intel Corporation 82371AB/EB/MB PIIX4
USB (rev 01)
Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
VGA compatible controller: Cirrus Logic GD 5480 (rev 23)
PCI bridge: Digital Equipment Corporation DECchip 21150
(rev 06)
PCI bridge: Texas Instruments PCI2031 (rev 01)
#==============================================================================#
LSB_VERSION: 1.3
#==============================================================================#
Library: libc6
Compiler Version: gcc (GCC) 4.0.2 20051125 (Red Hat
4.0.2-8)
#==============================================================================#
/proc/cmdline
ro root=/dev/VolGroup00/LogVol00
#==============================================================================#
Logged In: YES
user_id=239796
I downgraded the mysql binaries/server and devel stuff for
FC4 from the current RPM (from updates) 4.1.16 to 4.1.15 and
then to 4.1.14. To NO avail...
Oh, i made sure i recompiled perfparse.cgi everytime I did a
downgrade.
Here is the output of the perfparse-log2mysql --show_config
Perfparse-log2* [options]
# File where Perfparse logs messages
# Error_Log = "string"
Error_Log = "/var/log/nagios/perfparse.log"
# Rotate Perfparse log files
# Error_Log_Rotate = "Y/N"
Error_Log_Rotate = "Yes"
# When perfparse cannot parse a line, it drops it to that file
# Drop_File = "string"
Drop_File = "/tmp/perfparse.drop"
#
# Drop_File_Rotate = "Y/N"
Drop_File_Rotate = "Yes"
# Log source from nagios (or other tools) that perfparse
will scan
# Authorized values: a file name, '-' for stdin, '|' for a
fifo and '>' for a host:port socket
# For sockets, a command 'history' will be sent before
retreiving the data
# Service_Log = "string"
Service_Log = "/var/log/nagios/serviceperf.log"
# Save the read position in the nagios log file ? If yes,
perfparse will start from that position instead of from the
beginning
# Service_Log_Save_Position = "Y/N"
Service_Log_Save_Position = "No"
# Path for files containing the read position for nagios log
files
# Service_Log_Position_Mark_Path = "string"
Service_Log_Position_Mark_Path = "/var/tmp"
# Start timestamp for history retreiving (positive is
absolute, negative is relative to end tm)
# History_Start_Tm = "value"
History_Start_Tm = "-86400"
# End timestamp for history retreiving (positive is
absolute, negative is relative to Now)
# History_End_Tm = "value"
History_End_Tm = "-30"
# Show status bar when running
# Show_Status_Bar = "Y/N"
Show_Status_Bar = "no"
# Print a report at the end of the processing
# Do_Report = "Y/N"
Do_Report = "no"
# Dummy hostname if gethostname() does not work
# Dummy_Hostname = "string"
Dummy_Hostname = "localhost"
# Don't store raw data
# No_Raw_Data = "Y/N"
No_Raw_Data = "no"
# Don't store bin data
# No_Bin_Data = "Y/N"
No_Bin_Data = "no"
# Path where storage modules are
# Storage_Modules_Dir = "string"
Storage_Modules_Dir = "/usr/lib"
# Modules to load (Coma separated values)
# Storage_Modules_Load = "string"
Storage_Modules_Load = "mysql"
# Storage Module : mysql
# ==============================
# Database user
# DB_User = "string"
DB_User = "nagios"
# Database password
# DB_Pass = "string"
DB_Pass = "nagios"
# Database name
# DB_Name = "string"
DB_Name = "nagios"
# Database hostname
# DB_Host = "string"
DB_Host = "127.0.0.1"
The string:
2006/03/06 11:15:16 [ storage.c:95 27013 ]
storage_mysql module successfully loaded
Is printed to the screen every time.
Logged In: YES
user_id=239796
Ok, I'm making progress on this.
I took the .src.rpm package from dev.mysql.com for 5.0.18
and compiled on the localhost (rpmbuild --rebuild ...). Once
that was done, recompiled the perl-DBD-mysql module and
restarted nagios. Then recompiled perfparse against this new
mysqlclient library and lo and behold. It worked fine. At
least for the all_bin=1&group_name=**ALL** page. Some graph
work and others do not.
For the graphs that don't work, i tried purging the db for
the host:
$> /usr/bin/perfparse-db-purge
An error occured with the SQL:
"DELETE perfdata_service FROM
perfdata_service,perfdata_host WHERE
perfdata_service.host_name = perfdata_host.host_name AND
perfdata_host.is_deleted = 1"
Failure Message:
"Cannot delete or update a parent row: a foreign key
constraint fails (`nagios/perfdata_service_raw`, CONSTRAINT
`perfdata_service_raw_ibfk_1` FOREIGN KEY (`host_name`,
`service_description`) REFERENCES `perfdata_service`
(`host_name`, `service_description`))"
That got me to a dead end. Not sure how to fix this.
I tried copying and pasteing the URL for one of the binary
graphs that don't work (the one that segfaults):
QUERY_STRING='graph=1&host=DNS1&service=DNS+Checks&metric=%5F'
gdb /usr/lib/nagios/cgi/perfparse.cgi
<BODY BGcolor="#EEFFFF" TEXT="#000000" LINK="#000000"
VLINK="#000000" ALINK="#000000" onload="isAbsRelVisible()">
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1209047360 (LWP 9549)]
0x00b34ad9 in isNull (iCol=0) at dbms.c:72
warning: Source file is more recent than executable.
72
(gdb) bt
#0 0x00b34ad9 in isNull (iCol=0) at dbms.c:72
#1 0x08053c7e in getRange (dMaxRangeLocal=0x8191478,
dMinRangeLocal=0x8191498)
at cgi_graph.c:723
#2 0x080547f9 in displayGraphHeader () at cgi_graph.c:409
#3 0x0804bf07 in main (argc=1, argv=0xbf805274) at
perfgraph.c:162
And saw the error comes from dbms.c:72. I changed those
lines to do:
63 int iData(int iCol)
64 {
65 if (result_row==NULL)
66 return 0;
67 if (result_row[iCol] && result_row[iCol][0])
68 return atoi(result_row[iCol]);
69 else
70 return 0;
71 }
72
73 int isNull(int iCol)
74 {
75 if (result_row==NULL)
76 return FALSE;
77
78 if (result_row[iCol] == NULL)
79 return TRUE;
80 return FALSE;
81 }
82
83 char *sData(int iCol)
84 {
85 if (result_row==NULL)
86 return "";
87
88 if (result_row[iCol] && result_row[iCol][0])
89 return result_row[iCol];
90 else
91 return "";
92 }
After that the program doesn't crash, but gives me blank
graphs ;-)
Inching closer...
Logged In: YES
user_id=239796
Got the latest code from CVS and it compiled fine and fixed
all my problems.
You should probably release a bug-fix release or a major
release as soon as possible.