Syslogd2 Wiki

High capacity syslog data collection, filtering, and management.

Brought to you by: efreesmeyer

HomeObsolete

(Please check this wiki often as it is being updated frequently until I'm comfortable that its content is complete.)

Syslogd2 Fills Multiple Roles in Modern Network & Host Managment

(Link to updated (in-progress) wiki)
(Known bugs, omitted functionality and work-arounds

Since the first syslog daemon was written for the first Unix system in the 1970s, the basic structure and capabilities of Unix/Linux syslog daemons have not changed -- even though the uses of syslog and the demands for syslog processing have increased dramatically both in terms of increased throughput (traffic volume), in terms of selectivity (which messages have value to which audiences and how long the value of a message lasts), and in terms of network transport (the introduction of centralized analysis servers and performance issues concerned with transmitting large amounts of (low-value) UDP data through firewalls or over "high-cost" links between sites or buildings (not to mention loss of data when links go down).

Syslog daemons (syslogd) have been (marginally) updated since the 1970s, but have not even kept up with host-management volumes or needs - let alone the much higher traffic and processing demands of network management applications. Today, syslog-based network management systems cannot improve substantially until the problems of network congestion and extraction of useful information from the overwhelming amount of raw syslog data are addressed. Syslogd2's design is an attempt to address these issues.

Network Management Syslog Edge Collection

Syslogd2 was originally written to address the high-volume demands of firewall collection and data-reduction. Since the original prototype code, Syslogd2 has added several features to support general network-management data-collection tasks when deployed in the role of network-edge data-collector or in a host-support role as the host's syslog service daemon.

Syslogd2 is specifically designed to support ultra-high (>1000 events per second) traffic levels, and to filter and reduce this volume of data in real time before transmitting it over (possibly congested or "expensive") links to centralized syslog-analysis servers. Should such transmission transit firewalls Syslogd2 can use TCP connections instead of UDP and can also spool data to disk in the event of link outages until the forwarding connection can be re-established, flushing the spool files after the data is successfully transmitted.

When deployed as de-centralized "network instrumentation", Syslogd2 can enhance any pre-existing syslog analysis server implementation by pre-qualifying (pre-filtering) the data at the network edge. This pre-filtering function has the added benefit of removing the syslog traffic that would otherwise traverse the internal network links only to be discarded at the central server as non-relevant. Pre-filtering also has the major advantage of reducing (even eliminating) port-congestion at the syslog server that would otherwise occur when too many devices send too much data to a single host or application. It is anticipated that most Network Management edge collection rolls will generally operate Syslogd2 in one of its high-speed MultiThread models (3, 4 or 5) [Syslogd2 MultiThread Models].

There is generally some "cross-over" between syslog-based management of network devices and syslog-based host-management (even application-management) tasks because many "network-devices" are now being deployed as applications running on dedicated Linux operating systems (load balancers, VPN servers, even firewalls and routers) while other devices are based on MS Windows or proprietary operating systems that may produce log data, but not in a convenient single-line, printable format. Systems that produce binary logs (I believe Cisco IP Phone Call-centers are among these) may need to be received and 'translated' into syslog format before their logs can actually be integrated into syslog-based management systems. Syslogd2 allows for applications to receive and 'pre-process' such input, then write to dedicated Syslogd2 input sockets on edge-collection devices for filtering, and forwarding to existing (centralized) syslog processing servers.

Host Management Syslog Data Collection

Syslogd2 also has a role as a host-level syslog data-collection daemon -- a role that will only grow as networks start fielding more and more Linux systems as engineering workstations, as servers, and even on standard desktops and laptops and as the need for pre-filtering of syslog data prior to forwarding to syslog-consolidation hosts becomes more urgent. Because Syslogd2 can provide detailed on-host filtering, syslog data collected by multiple hosts can be logged on each host as well as filtered by each host prior to forwarding to one or more conslidation log-servers. This reduces the volume of (mostly-useless) messages that otherwise clog syslog-based applications and frustrate administrators tasked to monitor logs.

Syslogd2 host-management rolls are generally anticipated to utilize Syslogd2's smaller-footprint (and lower-volume) multi-thread models 1 and 2 [Syslogd2 MultiThread Models]. Multi-thread model 1 consists of a single threadpool with one or more threads to process syslog data. This is the model utilized by most (all) other syslog daemons today. Model 2 introduces multiple threadpools each with multiple threads allowing segregation of inputs. When combined with Syslogd2's support for user-definable input sockets (IP or Linux and datagram (UDP) or streaming (TCP) as well as Syslogd2's ability to use flat-text files as input (adding facility, priority, and even pseudo-hostnames as desired), the roll of Syslogd2 as a host-logging daemon begins to approach that of network-management edge-collector.

Scalability

Syslogd2 has a run-time 'switch' to convert it from a Linux syslog service-daemon to a stand-alone backgrounding application so it can run alongside an existing syslog daemon. The need to run alongside an existing daemon may occur if custom output filters are being used to translate rsyslog output to a particular application. In this scenario, it may be possible for Syslogd2 to monitor the IP port(s) (log and filter incoming data, before relaying that data to the default /dev/log socket which is being monitored by rsyslog, syslog-ng or some other customized daemon.

If Syslogd2 is deployed generically on Linux systems as the host logger, it can provide consistent feature-sets across an organization while reducing system admistrator learning curves due to it's (basically) traditional syslog configuration. Deploying Syslogd2 generically would also enable any Linux host to become a syslog data collector / data filter for its own (locally-installed) applications or to receive and filter data from other devices at will.

Syslogd2 can be scaled up or down as data-collection requirements for a particular host dictate. It can be as simple as 2 threads (one parent/monitor thread + one data-collection thread) or as complex as the administrator wants it to get with hundreds of active threads. Syslogd2 supports 5 distinct 'models' of multi-threading (single threadpool consisting of one or more threads, multiple independent threadpools with varying number of threads, plus 3 models designed for high-volume network-management rquirements [Syslogd2 MultiThread Models]. These 5 multithread models provide wide latitude to trade off memory footprint and resource usage against performance and overall throughput.

Syslogd2 specifically addresses the limitations of current syslog-daemons -- limitations that are common to all syslog-based management systems (high-traffic loading, network-link & port congestion, CPU load on routers & firewalls, etc). These issues are a direct result of attempting to use a host-logger design from the 1970's for addressing the modern roles of network-data-collection and reporting.

Base Enhancements in Syslogd2

Syslogd2 supports a variety of CAP_* features that can enable or disable major capabilities. References to CAP_* symbols refer to these capabilities. See [Syslogd2 CAP_-abilities] for a complete listing of these.
Syslogd2 provides the following 'base' enhancements for syslog configuration. These enhancements will be present whether the binary supports one operational thread or 100 as they are present in the 'base' code components.

Default support for both IPv4 and IPv6. Syslogd2 checks for system libraries at compile-time and includes what it finds. The administrator can (at run-time) disable the use of either IPv4 or IPv6 system-wide or on a port-by-port basis.
Additional facilities that can be used for routing, sorting, or other purposes. The default number is 16, but can be changed at compile-time. Range is 0-1000. Names are extra0 through extra<n-1>. Syslogd2 also allows access to and use of the 4 reserved facilities between the FTP facility and local0 as reserved0 through reserved3.
Additional syntax for selecting destinations. In addition to '*'' and 'none' that can be used for either facilities or priorities, Syslogd2 allows '<', '=', '>', '!' and '~' in any combination following the period between a facility and a priority. See also [Configuring Syslogd2]. The '!' and '=' have been documented for traditional Linux syslog daemons and '<=' is the default action for a selector of the form "<faclity>.<priority>. '>' simply extends the logic so that user.>warning selects all user-facility messages that user.warning (user.<=warning) would have ignored.
- What is perhaps new is Syslogd2's negate operator ('~') which explicity UNSETS rather that SETS matching facility/priority bits.
Delayed IP Resolution. On startup, Syslogd2 checks the status of the IP network and (if the network is not yet up), delays any attempt to initialized or resolve any IP input or output socket until such time as periodic re-checks indicate the network has started. At that time, Syslogd2 will attempt to initialize all configured IP interfaces as a background task (either by the parent thread or a housekeeping thread).
Distinction between "usable" and "resolved" destinations. When resolving IP destinations, Syslogd2 will attempt to fully resolve every address and name so it can combine multiple output lines that may reference the same destination via different identifiers. Combining multiple lines to the same address avoids duplicating messages that may occur with current (simpler-design) syslog daemons.
- A 'resolved' destination is one that has been successfully looked up in DNS, the local /etc/hosts file, or the provided cache file (if any). [Configuring Syslogd2]
- A 'usable' destination is one for which Syslogd2 has been unable to obtain all known addresses and aliases via 'resolution', but which contains a numeric IP address and numeric port designator (if provided). Such destinations allow a connection to be established "to something", but not deconflicted with possible other host aliases. Attempts will be made periodically to resolve usable addresses usingbackground processes.
Facility / priority override. Syslogd2 allows a specification of facility, priority or facility.priority on any input specification to force all incoming traffic on that port to the given (partial or full) value.
Host override. Syslogd2 allows the specification of a hostname on any --tailfile specification which will become the 'host' that all such messages appear to originate from instead of the local hostname. This allows creation of 'pseudo-hosts' such as mysql.hostname.domain. for a mysql error file, etc.
RFC3339 time format support. Should you have any applications that produce RFC3339 time format to --tailfile text files, Syslog will transparently read and parse these timestamps. Same for any external traffic entering an IP port.
Syslogd2 currently logs parsing and startup errors to a user-specified file instead of looping the information back to itself for two reasons:
- (1) If Syslogd2 crashes on startup and is unable to output it's own syslog errors to the syslog facitliy, the error file may indicate why.
- (2) Syslogd2 supports a CAP-ability (CAP_WHATIF) that produces a configuration display based on current contents of the configuration file. The file output can be used to diagnose and troubleshoot complex configurations without impacting live log files.
All Syslogd2 output files, pipes and Linux sockets accept uid, gid, and mode sub-options that will over-ride default file permissions allowing selected files to be tailored (different directory, group permission) to allow interested groups access to log files.
Unless restricted at compile time, Syslogd2 also provides the following (as base functionality). (See also CAP_SINGLEPOOL, CAP_SINGLETHREAD and CAP_SINGLEPORT in [Syslogd2 CAP_-abilities])
- Run-time configuration of multiple input sources into one or more threadpools - each with an independent number of threads.
- Configuration and use of arbitrary numbers of arbitrary IP UDP ports (in addition to the default syslog port 514).
  - The default port can be further configured or disabled entirely at runtime.
- Configuration and use of arbitrary nubmers of arbitrary datagram LINUX ports (in addition to the default log port at /dev/log).
  - The default log port can be further configured or disabled entirely at run-time.
Any command-line parameter (which can be many and / or lengthy) can be moved into the configuration file. Exceptions are: --configfile=<config-file-location>, --version and --help.

For further reading:

Wiki: CAP_*-abilities
Wiki: Configuring CAP_*-abilities
Wiki: Configuring Syslogd2
Wiki: Future Features
Wiki: MultiThread Models
Wiki: Related Projects
Wiki: Syslogd2 Layered View
Wiki: Tour Syslogd2
Wiki: Welcome and Introduction

Discussion

Anonymous