Syslogd2 Wiki

High capacity syslog data collection, filtering, and management.

Brought to you by: efreesmeyer

Tour Syslogd2

Let's start by selecting a Syslogd2 binary for use in our tour. If you have downloaded the source from sourceforge and compiled it using the default values, you will have 7 binary files to choose from as named below. To identify the differences between them, show the version screen ./syslogd2? -v or ./syslogd2? --version. Please notice that the list of enabled / disabled capabilities changes between binaries. These differences in capability are the only reason there are so many iterations of the same binary. [Syslogd2 MultiThread Models]. The astute observer will have noticed the notations about facilities extra0 through extra15 and NLS not supported. NLS (internationalization is a work-in-progress and will not be available until it actually works. We will come back to the 'extra' faclities.

syslogd2t: (tiny) -- minimum size and thread-count -- enforced limits on threadcounts, threadpools and selectable ports.
syslogd2s: (small)
syslogd2m: (medium)
syslogd2l: (large)
syslogd2h: (huge)
syslogd2g: (mega) -- all features, maximum capability
syslogd2d: (demo) -- a demonstration / experimentation binary -- same feature set as the 'mega' binary.

Now let's create a configuration file (or review the one currently in place). The default location for Syslogd2's configuration file is /etc/syslog.conf. You can specify a different file on the Syslogd2 commandline using the --configfile=<file> option (-c <file> for short). Any file using traditional format will suffice to start. We need to add one (syslogd2-specific) line to this file (on faith for now) to avoid conflict with the syslog service daemon currently running on your test host. Please add this line verbatim anywhere in the file:
# ~ --ApplicationMode

# This is a sample (traditional) syslog flle.
# ~ --ApplicationMode
*.crit    - /var/log/critical.log
*.err     root,sysadmin
kern.*    - /var/log/kernel.log
mail.warning   - /va/log/mail.log
*.warning;user.notice   - /var/log/messages

That's the minimum. Start the Syslogd2 daemon (with the "-c" command-line parameter to specify the file location if it is non-default) and you are up and running with any of the Syslogd2 compiled binaries (except the "tiny" model). (Syslogd2 requires at least one input and one output specification to operate. The --ApplicationMode option prevents the automatic creation of the default input sockets (Linux log socket & IP input socket) while the tiny model prevents the use of any input EXCEPT the default Linux log socket or the default IP socket.

Now let's assume you have an application that produces a log file you would like to have Syslogd2 monitor. You need a version of the binary that supports the CAP_TAILFILES CAP_-ability (remember the --version option. Of the default binaries, this is any binary except Tiny. Let's select the Medium option this time so we can build a more complex example.

We add to the config-file (or command-line) the following: --tailfile=<filename>. If we put this in the config-file, we need to insure the line begins with a tilde ('~') and at least one space or tab. (We'll add this line to our config file). NOTE: Syslogd2 (usually) ignores the first comment-character in a line so use 2 comment characters (not necessarily contiguous to start a comment.

Do not worry about the order of lines in the config-file. Syslogd2 parses the entire file before actually configuring anything. Whenever two lines conflict, the one with the lowest line-number will 'win'. The actual command-line is considered line 0. Default values are considered to be on line <n> + 1 where <n> is the last linenumber in the file.</n></n>

...
# The line below may cause problems for other log daemons unless the line starts with a '#' (making it a coment).
~ --tailfile = /var/opt/app 1/logs/logfile` # this is NOT a valid Syslogd2 comment and \
                        will cause a Syslogd2 error complaint because there is only one comment character \
                        before the comment.
#~ --tail = /var/opt/app 1/log file 2.log # This is a valid Syslogd2 command and comment \
        that will not be read by other log daemons.  Note the line continuation using the \
        backslash ('\') character. (Max default length of a continued line is 1024 characters).
~ -r ## This is a backward-compatability option that is short for `--enable inet, forwarding`\
        It enables the use of IP for input & output and the forwarding of data received via IP sockets\
        to IP destinations.

# NOTE: If a line begins with a '#' (not necessarily in column 1), the next token is checked for a '~' \
        or a valid facility name. If neither is found, the line is ignored as a comment; \
        if a tilde ('~'), it is command-line parameters; otherwise it's an output specification.
...

Notice that when command-line parameters are moved into the config file, spaces are allowed around delimieters and all quotes and escape-characters are removed from what would otherwise be required on the command-line. Also each file line that contains command-line parameters must begin with a tilde ('~') as the first non-comment-char, non-whitespace character.

With these additions, Syslog will now enable the default IP input port and will forward messages to the default syslog port on remote IP hosts. Syslogd2 will also create and use a specialized threadpool to monitor these files for updates, treating any new data-lines as syslog input. Maximum line length is 1024 bytes (which can be changed at compile-time).
If a line starts with a syslog priority value (<###>...), that facility and priority will be parsed and used -- otherwise the default user priority will be used (default is user.notice). The value to use between the <> is (<numeric-facility> * 8) + priority.

For all inputs (including IP and Linux sockets, Syslogd2 allows overriding the facility and priority for all values using comma-separated options. as follows:

# To change the facility value, leaving the priority value unchanged:
#~ --tail = /var/opt/app 1/log file 2.log, facility = extra3, hostname=app1.host.domain
# The 'hostname' option will cause Syslogd2 to report the data from this file as coming from the (pseudo-) host 'app1.host.domain'.

# To change the priority, leaving the facility unchanged:
#~ --tail = /var/opt/app 1/log file 2.log, priority = warning

# To force both priority and facility:
#~ --tail = /var/opt/app 1/log file 3.log, priority = extra8.notice # Note the use of the \
        'extra' facility to avoid conflict with other traffic.

So far, so good. Now let's define some TCP and streaming Linux input ports for other Syslogd2 processes to send us data or for local applications to connect to. For TCP or streaming Linux input support, we need to insure we have the CAP_STREAMIN in our binary (--version option).

...
#~ --input=10.2.1.4, port=800, stream # Note the boolean values 'stream' for TCP, and 'datagram' for UDP 
#~ --input=myhost.domain, port=200, datagram, version=46 # '(IP) version' values are '4', '6', and '46'.
#~ --input=/tmp/syslogd2.socket1, stream # a user-defined streaming Linux input socket
#~ --input=/tmp/syslogd2.socket2, datagram # a user-defined datagram Linux input socket
...

In general, --tailfile (--tail) reads data from text files, while --input (-i) reads data from sockets and --kernel reads data from the system kernel but defaults unspecified facility.priority to system.notice instead of user.notice.
(The kernel-read routines are not yet final, so the short-term workaround is to define CAP_TAILFILES and use the --kernel option to read the default system kernel file at /proc/kmsg. The format for --kernel varies only slightly in that the filename is not given -- just the parameter list. There is no need to specify the --kernel option unless you wish to use non-default sub-options.

To disable kernel logging or to have Syslogd2 NOT create default Linux datagram or UDP/IP ports, suppress them with one of the following:

    #~ --disable syslog, kernellog
    #~ --no syslog, klog            # klog is an alias for 'kernellog'
    #~ --suppress klog, syslog
    #~ --enable syslog=no, klog=no             # you get the idea

We'll come back to input configurations in a bit. For now, let's configure some extended output configurations. We will need (want) the CAP_STREAMOUT facility so check the --version output to insure it's enabled in our working binary. If not, select a binary where it is. Syslogd2 output configurations work much the same way as the input options (but different -- :-) ).

Let's start by defining a some simple Linux outputs to show the general format. We know that traditional files use '|' for pipes, '*' to write to all logged-in users, '@' to connect to a remote UDP socket, and a list of users to write to selected user terminals. Syslogd2 extends that tradition.

Because Syslogd2 uses "delayed resolution" (where it waits for the IP network to come up before it attempts to resolve IP hosts/addresses and it will retry failed resolutions periodically until it succeeds), you can use hostnames, host aliases or addresses to designate hosts. Because Syslogd2 recognizes that two definitions to the same host might use either different addresses or a mix of names and addresses (thereby sending possibly duplicate data to the same host), Syslogd2 will always attempt to 'resolve' each host obtaining all addresses and names for a given IP host.

extra3.*;user.warning               - /directory 1/file.log # , uid=root, gid=netstaff, mode=640   # additional parameters hidden from other daemons.

# Syslogd2 allwos both commas and colons as alternative separators in the selector list (but no spaces) -- because I make lots of typos....
extra4.*,extra5.*          |/tmp/OutputPipe, uid=root,gid=appgid, mode=660  ## this will set Linux file permissions and ownership on the pipe.

extra4.>warning      - /var/log/notice_to_debug.log ## this file will be created with defalt 0600, root, root permissions.

extra6.*          @remotehost.domain    ## a default udp/syslog IP connection on whichever address family is configured, enabled and able to connect.
# To globally disable IPv6 in Syslog, use the command-line option `--disable ipv6`. This will force all outbound IP connections to use IPv4 only.

# extra7.*           @remotehost.domain, port=333, stream, ver=6 # 'hidden' from other daemons, this creates a TCP IPv6 connection to port 333
# extra8.*           @fec0:4::88, port = rcmd, stream  # 'hidden' from other daemons, this also creates a TCP IPv6 connection \
        after looking up the numeric value of the TCP/rcmd port in the /etc/services file.
# extra9.*            @/var/opt/socket1, stream, uid=root, gid=LogserverApp, mode=660 # attempts to open a 'hidden' streaming \
        Linux socket for output to an application.
# extra10.*        @/var/opt/socket2 # attempts to open a datagram (default) Linux socket with (default) permissions 0666,root,root
# extra11.*        user1, user2,  user3, # must hide this because 'extra11' is not valid outside Syslogd2. Note the (legal) spaces around \
        the delimiters in the user list.

Filters

The next topic to address in our tour is probably one of the most powerful in Syslogd2 -- filters. Filters are applied at two points in the processing of syslog messages. The filters themselves are stored in separate individual files located in the directory specified by the ConfigDir option (/etc/syslog.d by default). Filters are assigned to any input source or any output source with the filter=<relative-filename> sub-option where <relative-filename> is relative to the ConfigDir directory.</relative-filename>

Input filters are applied after the message has been resolved and just before the defined destination-selectors are searched for matches. Input filters require the CAP_FILTERSIN CAP_-ability. (Check the --version command-line output.)

Output filters are applied after a match has been made to a destination-selector but before the data is written. (Once a message has been written to a destination, remaining selectors that have 'resolved' to the same destination are skipped. Selectors are tested in ascending line-number order). Output filters require the CAP_FILTERSOUT CAP_-ability. (Check the --version command-line output.) The filter to use is dependent on the output selector component -- not the output destination component of the configuration file line, allowing multiple selectors to use different filters.

This example is probably better understood in a network-management context, but can should easily be adaptable to host-application filter requirements once the basic mechanics are understood.

--input       10.4.2.4, facility=extra0, filter = pixFilter1, ver=4  # syslog port 514 and datagram are defaults. This interface is assumed to be on \
        the same vlans as the pix.
--input          10.4.2.28, facility=extra1, filter=vpnFilter1, ver=4 # This interface is assumed to be on the same vlan as VPN syslog-producing hosts.

Now let's create the file 'pixFilter1'. All filters (both input and output) use the same syntax and format. The full filename will be /etc/syslog.d/pixFilter1 which is case-sensitive because of the Linux filesystem. Don't be concerned about over-commenting your filters. Syslogd2 parses each filter file during startup (or in a background process during run-time), removing comments while it builds the filter data structures.

# Filter files uses a simpler syntax than the primary configuration file.
# A '#' character in column 1 (and ONLY in column1) indicates a comment. No in-line comments are allowed.
# There are 2 basic filter-line-types.  One sets the default action of the filter to either 'pass' or 'discard'.
# This linetype consists of two letters on a line by themselves: either 'dp' or 'dd' for 'default pass' or 'default discard'.
# The default mode is 'dp'. An empty or non-existant file or one with no entries passes everthing as if the filter were not defined.

# The 'working' line-type consists of 2 parts: a 'keystring' and a series of delimited data fields.
# The delimiter for each line since is defined as the 1st non-whitespace character following the first whitespace in the line.
# The 'keystring' starts in column 1 and is terminated by the 1st whitespace encountered.
# The delimiter for a given line is not allowed to appear inside any of the data-fields defined on that line.

# The keystring consists of the characters "^$cnhfp" in any order, possibly repeating.
# The data-fields are positional based on the keystring. There must be a minimum of one data-field per keystring character.
# ^: Case-sensitive string-match between data-field and start of message content.
# $: Case-sensitive string-match between data-field and end of message conent.
# c: Data field is a case-senstive sub-string match to the message content.
# n: Data field is NOT a case-sensitive sub-string match to the message content.
# h: Data field is a case-NON-sensitive match to the resolved hostname (or IP if hostname won't resolve).
# f: Facility of message matches the facility specification in the data-field
# p: Priority of message matches the priority specification in the data field.

# At the first failure of a message to match any key-string criteria in a line (parsing the keystring left-to-right), the filter-line is discarded and the next filter-line in the file attempts a match.
# If the end-of-file is reached with no matching entry, the 'default' action is taken.
# If a match is found between a message and ALL keystring criteria for a message, one of two actions is taken:
#    If the number of data-fields exactly matches the number of characters in the keystring,
#         the action opposing the default-action set as of the end-of-file is taken immediately:
#             For a  default-pass filter, the message is discarded.
#             For a default-discard filter, the message is kept.
#    The default-action is allowed to change as often as desired during the processing of the filter file using the 'dd' and 'dp' operators.
#    If the number of data fields exceeds the number of keystring characters, the ramaining line is a 'transformation filter'.
#        The keystring is reparsed left-to-right, substituting the 'transformation field' strings for the 'match' strings found earlier.
#            The number of tranformation fields may be less than the number of keystring 'matching fields', so insure that fields ot be transformed are listed first in the keystring.
#        In the case of facility and priority values, the message's facility and priority values are reset to those specified in the
#            transformation field(s) of the filter.
#        Once the last transformation field is applied to a message, the filter is immediately exited and the revised message is kept for further processing.
#        Transformed messages are never discarded.

# The following line explicitly sets the filter's default mode to 'pass'
dp
# The following line chnages all messages containing the string ' logged in' where the hostname contains the non-case-sensitive string "rtf1506" to facility extra12, then the filter is exited.
fch /*/ logged in/Rtr1506/extra12/
# The following line changes the facility to 'extra4' for any message containing the string ' logged in.' from any remaining hosts (where hostname does not contain the 'rtr1506' string).
fc /*/ logged in/extra4/
# Change the facility of all other router traffic to 'extra2'
fh  /*/rtr1506/extra2/

We can now configure outputs as follows: (Remember that all traffic was forced to extra0 by the input statement above.)

# We need to use 2 '#' characters to force a comment below since first token after the '#' is a valid facility token.
## extra4.* is the 'logged in' messages from other than rtr1506, extra0 is everything else from the that (those) hosts.
# assuming that rtr1506 is the internet router and the only other host logging to this address 10.4.2.4 is a pix:

#extra0.*;extra4.*         - /var/opt/networklogs/AllPixData.log, uid=root,gid=netstaff,mode=0640 # unfiltered -- log everything to disk
#extra0.*;extra4.*         @remotehost.domain,port=500, stream, ver=4, filter=outFilter1  # a (heavily) filtered TCP connection to port 500

## extra12 is the 'logged in' messages extracted from extra0 by the input filter, extra2 is the remaining rtr1506 msgs extracted by the input filter.
#extra12.*;extra2.*         - /var/opt/networklogs/AllRouter.log, uid=root, gid=netstaff, mode=0640, filter = outFilter3
#extra12.*;extra2.*         @remotehost.domain,port=500, stream, ver=4, filter=outFilter3  # a (heavily) filtered TCP connection to port 500

#extra4.*;extra12.*           - /var/opt/networklogs/AllUserLogins.log, uid=root, gid=netstaff,mode=0640
#extra4.*;extra12.*           @SecurityMonitorHost.domain,port=1000,stream,version=4, filter=outFilter2 #send same data to security office monitor

outFilter2's job is to convert facilities back to something the commercial log-analyser used by security can handle (local7) and looks like this:

# File outFilter2
dp
f /*/local7/

outFilter1's job is to reduce the traffic flow from the combined pix and router to something manageable and might look like this:
The outFilter1 file discards all messages containing common strings, changing the remaining messages to facility local6 for a network-log-analyzer/alerting system:

# File outFilter1 -- outFilter3 would be similar, but tailored to router messages.
dp
# This is a match-only filter line that drops any message matching the string "connection established"
c /connection established/

c /connection terminated/
c /embryonic connection/
c /connection denied/
...
# Finally, convert remaining traffic to facility local6 because that's what the receiver expects.
f /*/local6/

Spool Files

Now that we know how to configure TCP and streaming output Linux sockets, it's time to think about the challenge of broken network links disrupting our syslog data transfers. The UDP protocol has no method to detect or re-transmit errors or dropped traffic during transmission which makes it ideal for many uses, but there are some cases where the syslog traffic we have collected absolutely, positively MUST get through -- even if it is late. To address these requirements, we can use a TCP/Streaming Linux socket connection in combination with a local spool-file for those time when we are unable to reliably transmit data to our receiver.

Another reason for using TCP vs UDP traffic for forwarding data is to reduce the load on network router, switch and firewall CPUs. These devices are usually designed with slower, cheaper (sometimes underpowered ?) CPUs than host systems use. A large amount of UDP traffic forces them to use their CPUs to constantly check their rule-bases / router tables to see where each individual UDP packet should be routed or whether it should be dropped. TCP connections by contrast usually involve a single check of configuration tables followed by pushing the approved connection down to the hardware level where any further traffic between the two TCP ports need not be inspected because the connection has alaredy been approved. The use of the TCP protocol for transmitting bulk syslog traffic can therefore greatly reduce the CPU burden on routers and firewalls.

Whenever Syslogd2 has a streaming connection (TCP, Streaming Linux or Pipes), it can detect when that connection fails and take action to attempt to re-establish the conneciton. If the connection cannot be immediately re-established, Syslogd2 can write the data to a disk-based "Spoolfile" for later transmission once the connection HAS be re-established. To configure this capability requires a directory to store files into (known as spooldir) and the CAP_SPOOLFILES CAP_-ability (verify with the --version command-line option). Each streaming connection that wishes to utilizes spooling in the event of connection failure must include the suboption --spoolfile or -sf in the output destination's output list. The --spoolfile option takes an option parameter which is a 'suggested' filename that Syslogd2 will attempt to honor in the absence of a name-conflict. If a spoolfile name is suggested by the user, it will be relative to the spooldir directory location that defaults to /var/spool/syslog.

When spooling is activated and configured, the name of the spoolfile (as well as the connection it is 'backing up' are considered destination options while the decision as to whether to spool a particular message that has matched a particular selector specification is a selector-specific decision. This means that multiple selectors that share a common destination (perhaps by different names or addresses) may legally make different choices on whether or not to spool data if the data cannot be immediately transmitted.

Moving on to [Tour Syslogd2 - Part 2] . . .

Wiki: Configuring Syslogd2 - obsolete
Wiki: HomeObsolete
Wiki: MultiThread Models
Wiki: Tour Syslogd2 - Part 2

Discussion

Anonymous