DBD2 Wiki

A multi-threaded, multi-database tcp-based database insertion app.

Status: Beta

Brought to you by: efreesmeyer

TermsConcepts

Authors:

CAP_*-abilities

Return to top

The term CAP_*-abilities refers to the capabilities of DBD2 that are conditionally compiled into the code when certain compiler-symbols are declared. Since these "special" compiler-symbols all follow the format " CAP_* ", the term CAP_*-abilities was coined.

The CAP_* declarations can be found in the shared/defines.h file in the DBD2 source-code directory. At the top of this file is a list of all defined CAP_* declarations and brief descriptions of what they do.

Immediately below the definitions, you will find a list of entries of the format " #define CAP_ ... ". Any line starting with two backslashes ('\') is commented out and will be ignored.

Adjust the list of ' #define ' statements (adding / removing / commenting out) entries to select the desired capabilities of the DBD2 server. After selecting the CAP_*-abilities (and saving the file), re-run the make command to re-compile the code with the revised feature-set. Redeploy the re-compiled server. (Note that CAP_*-abilities do not apply to the dbcl client code so dbcl will not need to be re-deployed.)

Return to top

Connection-Spec

Return to top

A "ConnectionSpec is any input source or output target specification plus all connection-specific options specified for that source or target. A ConnectionSpec is a convenient place to configure all connection-specific parameters so is somewhat difficult to define with specificity. A ConnectionSpec will frquently contain the threadpool assignment for a data-source or target as well as other DBD2-specific settings that manage other aspects of DBD2 processing. Connection-specific settings include stream-vs-datagram, IP Port, IP Address-family (if an IP hostname), database-connection-count and connection-control information (if a HostSpec), uid, gid, mode information (if a FileSpec or LogSpec) and other common information that is of a connection-specific nature (such as filter or spool settings in future releases).
A ConnectionSpec is used to consolidate all connection-specific settings (database, threadpool, and DBD2-options) for input and output connections into a single parameter -- especially in the global section where such consolidation prevents confusion as to which parameters apply to which inputs.

Specific variations of the term "<Connection-Spec>" include "<File-Spec>", "<Log-Spec>", "<Destination-Spec>", <HostSpec>, <InputSpec>, etc.

<File-Spec>::= An absolute filepath combined with optional uid, gid, mode, threadpool and other options.
<Log-Spec>::= An absolute filepath to a log file (stderr or stdout) with optional uid, gid, mode and level values. A <LogSpec> differs from a <FileSpec> in that the former is limited to uid, gid, mode and logging-level options while the latter is expected to contain additional options (such as threadpool specifiers)
<Log-Spec> levels are defined to be the same as syslog priorities.
<HostSpec>::= An output line that provides all necessary information to establish an output connection to a database as well as optional DBD2 processing-control settings. Each output section in the configuration file must contain either a <FileSpec> or a <HostSpec>.
<Output-Spec>::= A generic term that combines <HostSpec> and <FileSpec> to designate the primary connection of a configuration file output section..
<InputSpec>::= An input specification that defines a socket input source plus all included options. The name comes from the Input= keyword that defines it.

Return to top

Template Strings

Return to top

A Template String is a string (no spaces or delimiters) of Template-Identifiers.
A Template-Identifier is an output destid letter + a template-index-number within that output definition.

A Template-identifier of "a1" indicates the template with template-id of "1" in the database (output section) with destid equal to 'a'. A Template string of "a1b2c12a2" sends data to 4 distinct templates in 3 distinct databases. (database 'a' has 2 different templates in use).

Return to top

SQL Variables

Return to top

An sql variable is a placeholder in a template-string that references a globally-defined sql-variable.
An sql variable is comprised of a dollar-sign ('$') followed by a positive integer (an sql-number). An sql-number is globally-unique and used as an identifier into the global list of variable-containers.
A variable-container is a placeholder for a user-specified value (or a calculated value) identified by either an sql variable or by a list of names <name> that are matched to incoming <name>=<value> pairs. Variable containers are defined with the var keyword:

var = <sql number> = <non-case-sensitive-comma-separated-name-list>
var = 1 = jobid, jobnumber, job
var 2 userid, username
var 3 $source_host
var 4 $eventTime

If a variable-container name-list starts with a name that starts with a '$', it is a calculated value (an envelope variable). The name (minus the '#') is matched against the list of defined envelope-names and the corresponding definition is used to calculate the value.
Envelope variables are defined with the '$' keyword:

$ = host = source, string, fqdn = source_host

Return to top

Envelope Variables

Return to top

There are 4 parts to an envelope variable definition:

(1) The '$' keyword that specifies we are defining an envelope variable.
(2) A 'field' selector. This can be one of:

time: All time-related values.
host: All host-related values.
templatekey: Set a new list of names that identify incoming template-key-string values.
rawinput: The entire (unprocessed) input string as received at the input source.
If CAP_SYSLOG is defined:
facpri: Syslog facility-priority values and derivatives.
msg: The syslog message-field.

(3) A list of formatting options.

raw: A placeholder option to get "unprocessed" data. No changes are made to the "raw" input values. This is the only valid formatting option for the rawinput and msg fields. For syslog fields, it specifies raw (as received) formatting. For time fields, numeric data if server-time, undefined for source-time, and unprocessed data for syslog-time.
int, numeric: Numeric output: GMT System-clock time for time fields, IP Addresses for host fields, numeric values for facpri fields.
string All non-numeric (string) formats -- can be modified by other option values. (facpri names, hostnames, formatted time values, etc)
<host-selection> One of "source" for the host that directly connected to the DBD2 server, "server" for the DBD2 server itself, or "syslog" for the host-field from the syslog message.
fqdn: For host fields with string option only: specifies a Fully-Qualified-Domain-Name string format.
hostonly, nodomain: For host fields with string option only: specifies the hostname with all domain components stripped off.
IPv4: For host fields with numeric option only: specifies the "official" IPv4 address of the host.
IPv6: For host fields with numeric option only: specifies the "official" IPv6 address of the host.
gmt: For time fields only: specifies the time in the GMT (UTC) timezone -- regardless of the display format.
local: For time fields only: specifies the time in the local timezone -- regardless of the display format.
dbtime, db: For time fields with string option only: specifies time format as "yyyymmddHHMMSS".
syslogtime: For time fields with string option only: specifies time format as "mmm dd HH:MM:SS" -- the 'standard' syslog time-field format.
rfctime, rfc: For time fields with string option only: specifies rfc 1339 time format: "yyyy-mm-ddTHH:MM:SSZ" with the gmt tag or "yyyy-mm-ddTHH:MM:SS[+/-]HH:MM" with the local tag.
nofacility, nofac: For facpri fields only: specifies that only the priority value is to be provided. (The default is to translate both).
nopriority, nopri: For facpri fields only: specifies that only the facility value is to be provided. (The default is to translate both).
=<TimeStringIndex>: For time fields with string option only. As an alternative to the pre-defined time-string formats, use "string=<timestring-index>" to specify a time-string format.

(4) A comma-separated list of names with which to identify this envelope-variable.

Examples:
$ = host = source, string, fqdn = source_host
$ = time = syslog, local, string=200 = flowery_local_time The equal-sign before the name-list field is mandatory when using TimeString indices.

Return to top

Time Strings

Return to top

TimeStrings are defined using the timestring keyword. The paramter is a single "<TimeStringIndex> = <format>" value pair where TimeStringIndex is a positive integer and <format> is an unquoted format string using strftime() options. (Look up the manual page on strftime for valid syntax and parameters):
Since the TimeString value is an actual format string for time-values, any literal string values may be included anywhere in the string.

timestring = 100 = Test time 1: %w %b %e %H:%M:%S %Y --> "Test time 1: 5 Jan 1 22:23:13 2021"
timestring = 200 = %H%M hours and %S seconds on %A, %B %e in the year of our Lord: %Y --> "2223 hours and 13 seconds on Friday, January 1 in the year of our Lord: 2021"

Return to top

ThreadPools

Return to top

A Threadpool in DBD2 is defined the same way it is defined in Syslogd2.

A ThreadPool consists (primarily) of some number of processing threads designated as "reader-threads", "worker-threads", or "output-threads" depending on the role they serve and the algorithm they execute in DBD2. Reader-threads obtain their data from external sources (a virtual FIFO queue), so have no need of an explicit, application-defined FIFO queue (First-In/First-Out) to act as a source of incoming "work". Once data is read by the reader-threads, it is queued to a FIFO queue belonging to a worker-threadpool for further processing. After the data has been processed and is ready to be written out, it is queued to one or more FIFO queues belonging to output-threadpools where output-threads do the output-specific work of final preparation and actual writing to the destinations.

A ThreadPool then, is a group of some number of threads and the associated input FIFO queue that are dedicated to running a specific algorithm on each input message before forwarding to the next stage of processing (or final destination). This definition understands that the network buffers / Linux socket buffers function as the (virutal) "FIFO queue" for reader-threads. Each ThreadPool in DBD2 is configured with a numeric identifier, a "threadcount", some number of "lines" (or message-slots) in the associated FIFO queue, and a destination for processed data.

Reader-threadpools use external data sources as their (virtual) FIFO buffer for input and the "queue" option to designate which worker-thread to send their output to.
Worker-threadpools use the processed Template-String values to determine which output destinations to send data to, so have no need of a configured output queue.
Output-threadpools are the last stage of processing in DBD2. Their "processing step" is to dispose of the message data. Therefore there is no "next stage" of processing and no output queue specifiers.

Return to top

DBD2 Wiki

A multi-threaded, multi-database tcp-based database insertion app.

TermsConcepts

Contents

CAP_*-abilities

Connection-Spec

Template Strings

SQL Variables

Envelope Variables

Time Strings

ThreadPools