Syslogd2 Wiki

High capacity syslog data collection, filtering, and management.

Brought to you by: efreesmeyer

InputProcessing

Syslogd2 Input Processing OverView

Linux Sockets: (keyword: --input) This is the basic, most fundamental input source-type. The default system log socket is a datagram linux socket usually located at /dev/log. Syslogd2 extends the ability to use Linux sockets through connection options, allowing the specification of both datagram and streaming sockts located on virtually any local filesystem. Before a streaming Linux connection can be configured, the CAP_STREAMIN compile-time option must be enabled.

Linux socket pathnames must be absolute file-paths and are recognizable by their initial '/' character.

IP Sockets: (keyword: --input) This source-type is a close second to the basic Linux sockets because it requires that the Linux IP networking stack be configured before it can function. Like the Linux socket type, Syslogd2 supports IP sockets that can be datagram (UDP) (such as the default IP port on UDP/514) or streaming (TCP) connections and allows the user to define as many sockets as desired on any desired IP port as either IPv4, IPv6 or "both" subject to certain network-implementation restrictions:

(1) IP sockets may be specified by hostname or address. Ports may be specified numerically or by name.

(a) When specifying a hostname or the "*" wildcard, the IP version may be specified with the "version" option keyword.
(b) The "version" option accepts values of "4", "6" or "46". The default is "46" to accept either (or both) address families.

(2) If a UDP socket is specified without a port, it is assumed to be the default syslog poart (514).
(3) Support for IPv4 or IPv6 can be globally disabled (administratively) with global boolean values.

(a) If IPv4 is to be supported (and is not disabled), at least one interface must be configured with an IP v4 address.
(b) If IPv6 is to be supported (and is not disabled), at least one interface must be configured with an IP v6 address.

(4) Before TCP connections can be configured, the CAP_STREAMIN compile-time option must be enabled.
(5) Syslogd2 will check the network status before attempting to resolve or open IP connections, and if the network is not available will either cancel or defer the ip connection depending on the connection recovery options selected by the applicable global settings.

Text-Files: (keyword: --tailfile) Syslogd2 offers two options for monitoring text-files input sources. Both options perform similarly to the "logwatch" tool and directly read each new line as a distinct syslog event.

(1) Use the Linux filesystem facility inotify. This facility will alert Syslogd2 whenever a monitored file is modified, allowing Syslogd2 to respond immediately to file updates.

(2) Poll the list of files assigned to each threadpool on a periodic basis and read whatever data (if any) is available, then wait a while and poll again.

The default case (without the "poll" option) will be to use the Linux filesystem's inotify feature (if available) to receive a socket notification when the filesystem detexts a change. The inotify feature that this option depends on may not be available in all Linux versions or on all Linux-supported (esp non-native) filesystems. When implemented, the inotify method is expected to be more responsive (and perhaps faster) than the polling method. The inotify code is not yet implemented.
(3) Inotify input connections are considered to be a "socket-type" instead of "tailfile" type of input. They are therefore not compatible with tailfile-type threadpools.
(4) Use the poll" option keyword to force Syslogd2 to periodically "poll" the file to see if new input is available. Use of the poll keyword makes the input connection of type tailfile and incompatible with socket-type input threadpools.
(5) Polling should work on any filesystem -- (even remote filesystems such as NTFS, FAT or NFS) -- and on local Linux filesystems that do not support the inotify feature such as reiserfs or (perhaps) gfs2.
(6) Both polling-type and inotify-type threadpools use the same 'sequence' of numeric identifiers.
(7) Both polling and inotify methods default to a threadpool-id of zero (0) which is differentiated internally based on thread-pool and connection-type.
(8) For configuraiton purposes, the two types of tailfile connections cannot be mixed in the same threadpool or assigned to opposing threadpool types.
(9) If the inotify feature is not available on the filesystem on which the target file resides, the fallback will be to use the polling mothod.

Monitoring text-files is an optional feature and must be enabled at compile-time with the CAP_TAILFILES compile-time declaration. Once the CAP_TAILFILES declaration is compiled into the binary, it is available for use. You may define as many text-files as you wish to be monitored by Syslogd2.
NOTE: Polled tailfiles cannot share threadpools with either socket inputs or inotify tailfiles.

Kernel Input: (keyword: --kernel or --tailfile) Kernel input is differentiated from non-kernel primarily by use of a different default value that is used when no syslog priority value is provided in the message content (which is most kernel log messages).

(1) The default priority for non-kernel data can be changed with the global "--defaults= userfacility=<new-value>" command-line option. The default is user.notice.
(2) The default priority for kernel data can be changed with the global "--defaults= kernelfacility=<new-value>" command-line option The default is kern.notice..
(3) The --kernel option does not accept a filename because it uses the system-defined filename.
(4) The --tailfile option with the kernel suboption indicates the file being 'tailed' is kernel input.
(5) Syslogd2 provides five mechanisms to read local kernel input.

(a) Kernel input via the tailfile method using the kernel option keyword to specify the file as containing kernel data. ("--tailfiles=<filename>,kernel..."). You may define as many kernel-input files as you wish with this tailfiles mechanism.
Alternatively, if CAP_KERNELTHREADS is not declared or is disabled, the --kernel option will use tailfiles-based options as a fallback mechanism.("--kernel=...[, procfs]").

1. Use the Linux filesystem inotify facility to have the filesystem notify a Syslogd2 socket when data becomes available. Not all filesystems support the inotify facility potentially limiting availability. This method hsa not yet been implemented.
2. Use a polling algorithm by specifying the keyword poll. This mthod reads available data from all files assigned to the threadpool until no more data is immediately available. It then waits for a number of seconds before checking again to see if more data has arrived.

(b) Declare CAP_KERNELTHREADS at compile-time and do not disable the usage of the resulting kernelthreads wieh "--disable kernelthreads".

3. By default Kernel-threads will make Linux system-calls directly to the kernel and will read (and empty) the syslog ring-buffer for input. This option does not require the /proc filesystem to be mounted. It also causes the /proc/kmsg file to not be written.
Direct system-call is the default input method when CAP_KERNELTHREADS is declared and is not disabled.
4. To get the CAP_KERNELTHREADS to read the /proc/kfs file instead of making direct kernel calls, define "procfs" with the --kernel command-line option.

(4.1) if the Linux inotify facility is available and the "poll" option is not also specified, the kernel-threads will use the inotify method to "tail" the kernel file.
(4.2) if inotify is not available or if "poll" is also specified on the kernel command-line, a polling algorithm will be used to read the file.

(6) Syslogd2 can be told to use a kernel filename other than the system default (/proc/kmsg) by using --defaults= klogfile=<alternate-filepath>".
(7) Disabling the CAP_KERNELTHREADS feature can be done with "--disable=kernelthreads". Disabling all kernel logging can be done with "--disable kernellogging".
(8) The --kernel option automatically identifies the input as kernel-generated events from the local host.
(9) If CAP_KERNELTHREADS is not declared or if kernelthreads is disabled, the fallback for the --kernel option is to use tailfiles if CAP_TAILFILES is defined. If using tailfiles and the inotify feature is not available on the filesystem on which the target file resides, the fallback will be to use the polling mothod.
(10) If CAP_KERNELTHREADS is not defined, kernelthreads is disabled and CAP_TAILFILES is not defined, no kernel input can be obtained.

Use Compile-Time Declarations to define Syslogd2 Architecture

Syslogd2 can be deployed in multiple software architectures depending on which CAP_* declarations are selected for each binary image at compile time.
Starting with a "base" of defining nothing, Syslogd2 will use a model2 architecture which features multiple reader threadpools (each with multiple reader threads). This allows UDP traffic only with no kernelthrads, no housekeeping threads, and no tailfile threads. The only input method is datagram socket input on as amany IP/UDP or datagram Linux sockets as desired. Kernel input will be read (when implemented) via the inotify socket-notification filesystem mechanism if available.
The baseline above can be restricted using the CAP_SINGLE* declarations to deny multiple threadpools, to deny non-default ports or to deny multiple threads per threadpool.

The baseline abbove can also be expanded with additional capabilities (CAP_STREAMIN) allows input TCP and streaming Linux Socket connections, CAP_FILTERSIN enables filtering (both transmutation and pass/drop filtering of data arriving on input connections. CAP_TAILFILES allows specification of generic text-file input and expands kernel input options. CAP_KERNELTHREADS enalbes kernel-specific input options. Etc.
<
There are also two CAP_* declarations of note that cause Syslogd2 to divide the work that each reader thread would otherwise be required to perform into 3 segments, offloading 2 of those wegments onto other threadgpool types.
<
In total, there are 3 "working types" of threadpools corresponding to the 3 phases of processing that each message must pass through. Remember that there may be multiple instances of each type of threadpool. These are:

Reader: This processing phase is responsible for obtaining data and breaking it up into individual syslog events. Each read of a stream input may contain multiple individual syslog messages while each read from a datagram socket contains only one. The reader threads are also responsible for obtaining "envelope" information for each message:

The time=of=receipt for the incoming message.
The raw message string.
The input-structure that contains the connection-specific instructions on how to process the message. If the input was from an IP socket, the peer address from the connection is also saved.

Worker: This processing phase is responsible for all aspects of "prepping" each incoming message and preparing it for output:

Parse the raw input string, extracting and interpreting the priority, time and host fields (if present).
Resolve the peer-addresses to hostnames or printable addresses.
Convert binary characters in the message to printable characters.
"Pruning" unwanted hostname sub-domains.
Optionally execute a filter process on each input message to either transform the message or determine that it is unwanted and processing stops before the output "phase".

Output: This processing phase is responsible for handling some number of output specifications. Specifically:

Compare each message facility+priority to each output selector until a match is found. Once a match is found:

Determine if an optional filter process needs to be applied to the message prior to output and execute a filter process (as above).

If the filter determines the message is to be dropped immediately check the next output-selector specification. If the filter modifies the message, prepare an output string using the filter-modified contents.

Determine how to write the message and write it:.

If the destination is a user-list, the message may be queued to a dedicated UserThreads threadpool or written by the output thread. If the destination cannot be written and spooling is defined and available, the output will be spooled until the connection is restored. Line-termination characters are set appropriately to write to pipes, files, user-terminals or sockets. etc.

After writing a single copy of the message to each destination, the next destination in the list is checked.

By 'grouping' output destinations into output threadpools:

Each worker-thread need only compare against the threadpool to queue the message.
Each output threadpool need only process a limited number of destinations for each message.
Any delays imposed by network timeouts can be mitigated by isolating (grouping) Streaming output into separate output threadpools from datagram output. (Or high volume vs low-volume).

If CAP_WORKERTHREADS is declared, "worker threadpols" may be defined in Syslogd2. The only parameters for worker threadpool configuration are a positive numeric id-number, a threadcount and a number of lines in their input queues, both of which have global default values, which can be over-ridden by the --threadmaps option or by individual connection options.
Like reader threadpools, each worker-threadpool is identified by a positive integer (a different numeric sequence from the reader threadpool ids). Each reader threadpool takes an optiona parameter (queue) to indicate which worker threadpool is to receive the data read by that threadpool. The default value for this queue parameter is zero (0). Each reader threadpool may only designate one worker-threadpool to process the data it receives, but each worker threadpool may process data for multiple reader threadpools.

when CAP_WORKERTHREADS are declared, the reader-threads are freed up to immediately read and prep the next message, resuling in a massively-reduced lag-time seen at the intput interfaces and a much lower data-loss due to input-buffer overflow. This directly translates to a much higher potential throughput and lower congenstion level on busy IP or Linux interfaces.

If CAP_WORKERTHREADS is not declared, the reader threads will do their workload passing what would have been queued values as subroutine parameters.

if CAP_OUTPUTTHREADS is declared, "output threads" may be defined in Syslogd2. Like worker-threads output threads are defined by a positive numeric id-number, a threadcount and a number of lines in the input-queue. These parameters can also be glboally set as default values or over-ridden by the --threadmaps command-line option or may be set by individual output specification options. Each output specification may declare which output threadpool it belongs to. If the output threadpool is not otherwise defined, it will be created with default parameters.
If CAP_OUTPUTTHREADS is not declared, either the reader- or worker-threadpools will handle all output processing, passing what would have been queued values as subroutine parameters.

If neither CAP_WOREKERTHREADS no CAP_OUTPUTTHREADS is declared, the reader-threads must do all the work for each message they receive. This is the default operational "configuration" of Syslogd2 and the only configuration of all other known syslog daemons.

It is largely the offloading of worker and output processing and the multiple copies of each threadpool-type that allows Syslogd2 to effectively utilize large large numbers of concurrent threads to achieve extremly high throughput.

Use global settings to define connection-recovery and reconfiguration options.

Use input-parameters to specify individual characteristics for input connections and to over-ride global values.

Return to top

Socket Input Options

Return to top

Return to top

Return to Home page

Discussion

Anonymous