Menu

Algorithm Log in to Edit

Robin

Back-end Algorithm

Input: 5min flow file
For each flow record:
* Parse line
* Exclude if it's not TCP or UDP
* Exclude if the source of the destination does not belong to a defined subnet
* Store the source and destination end points in memory
* Store the unidirectional flow in memory (update an existing unidir. flow if one has already been defined)

For each unidirectional flow in memory:
* Check if a valid mirror unidirectional flow exists:
* If yes: merge the two flows to generate a bidirectional flow
    * Run each heuristics on the source and destination end points
    * Compute a detection proba using Bayesian inference ran on heuristic output
    * Label source and destination end points according to the detection probability (either client or server)
* If no: this flow is only unidirectional
    * Label source end point as client/scanner and destination end point as invalid

For each end point in memory:
* Compute metrics for this end point
* Output the node to a result file

Client/Server Detection Heuristics

  • H.0 Flow timing. Let t1 and t2 be the timestamps of
    the unidirectional flows constituting a bidirectional
    flow. The source of the flow with the larger (more
    recent) timestamp is likely the server. The difference between t1 and t2 provides an indication on
    the probability that this heuristic will identify the
    correct end point as a server. If the timestamps are
    identical, they cannot be used to decide which end
    point is the server.
  • H.1 Port number. Let p1 and p2 be the port numbers
    associated with a bidirectional flow. The end point
    with the smaller port number is likely the server. If
    the port numbers are identical, they cannot be used
    to decide which end point is the server.
  • H.2 Port number with threshold at 1024. If an end point
    has a port number lower than 1024, then it is likely
    a server. The value of 1024 corresponds to the limit
    under which ports are considered privileged and
    designated for well-known services. If both ports
    are above or below 1024, this heuristic cannot be
    used to decide which end point is the server.
  • H.3 Port number advertised in /etc/services. If the port
    number of an end point is listed in the standard
    UNIX file /etc/services that compiles assigned port
    numbers and registered port numbers, then it is
    likely a server. If both or neither port numbers are in
    /etc/services, this heuristic cannot be used to decide
    which end point is the server.
  • H.4 Number of distinct ports related to a given end
    point. If two or more different port numbers (in different flows) are associated with an end point, the
    end point is likely a server. The number of different port numbers related to an end point provides an
    indication on the probability that this heuristic will
    correctly identify the server. This heuristic comes
    from the fact that ports on the client-side are often
    randomly selected. Therefore, ports on the clientside of a connection are less likely to be used in
    other connections compared to ports on the serverside. If both end points are related to the same number of ports, then this heuristic cannot be used to
    decide which end point is the server.
  • H.5 Number of distinct IP addresses related to a given
    end point. This heuristic is identical to the previous
    one but counts IP addresses instead of ports.
  • H.6 Number of distinct tuples related to a given end
    point. This heuristic is identical to the previous
    one but counts end points instead of single IP addresses. This heuristic is based on the observation
    that each server typically has two or more clients
    that use the service. Furthermore, even if only onereal user accesses the service (e.g., identified by the
    IP address of the user’s machine), the communication will likely require multiple connections and the
    client side of the access often uses different port
    numbers. Thus, multiple end points will be detected


Related

Documentation: Overview

Discussion

Anonymous
Anonymous

Add attachments
Cancel