Menu

Security_architecture

Aleksi Kallio

Chipster security architecture

This document describes the key areas that form the Chipster security architecture. Also potential attack scenarios are discussed for each area. To understand the document, it is recommended to first read at least [ArchitectureDescription].

Secure communications

  • All communication through brokers
    • Only brokers have listening sockets
    • All listening ports are handled by common Java class library protocol implementations (TCP/IP, SSL, HTTP)
    • Broker host is considered untrusted
    • It is recommended to run Tripwire or similar periodic checks on the broker host
  • Communication is done only with OpenWire (ActiveMQ) and HTTP (Jetty) protocols
  • Strongly encrypted protocols are used by default
    • Defaults: SSL for OpenWire, RC4 for HTTP
  • Hostile party sends a tampered message to broker
    • Compared web servers and HTTP, the OpenWire protocol and ActiveMQ have significantly smaller attack surface
    • JMS temporary topics are used for authenticated communication: For forging identity of the sender, the attacker must be able find out the strongly random topic identifier

Data confidentiality

The administrator can configure Chipster to use any cipher supported by the JCE implementation. The default cipher for data encryption is RC4 with 128 bit keys. RC4 was chosen for performance reasons. The main weaknesses of RC4 are related to nonrandomness of first bytes of the stream and leaking key information when long term keys are used. In the Chipster setup keys are not long term, but each file of each user has a different key. Java SSL also typically uses RC4/128 together with RSA.

Stronger encryption can be achieved by using AES algorithm with 256 bit keys. We have measured a drop of about XXX on a fast network in transfer speed when using AES/256; with default setting of RC4 there is no performance drop in any realistic network (meaning that the network is the bottleneck).

  • Attacker breaks into file broker host
    • No plain data is available
  • Attacker breaks into compute service host
    • Only plain data currently being processed is available

User authentication and authorisation

  • All authentications attempts are logged
    • A random session identifier is created and passed back to client
    • Identifier is UUID, 128-bit cryptographically strong pseudo random number, generated with java.util.UUID.randomUUID()
    • Following requests carry session identifier, which is validated against server's session identifier table
  • Server authentication
    • Secure random shared key is used to authenticate components to broker
  • If someone finds out private server certificate
    • Can deploy a modified version of the server component
  • If someone captures traffic between servers (passing SSL encryption somehow)
    • Uses public key based authentication, so private key is not compromised
  • If someone captures traffic between client and server (passing SSL encryption somehow)
    • Random session identifier is transient, so the current session is compromised, but the whole user account is not

Execution of external tools

  • User input to analysis programs is checked thoroughly to prevent code injection and other malicious content
    • There are three alternative policies for inserting parameters
    • Recommended: Parameters are not inserted into script code, but loaded directly into memory (e.g., Java based tools)
      • Still simple exclusion filters or even whitelists can be used, especially when parameters are used in downstream processing
    • Parameters are inserted into separated literals (such as "user input") AND separator character is strictly disallowed in all user input AND allowed characters are whitelisted
      • Even if escaping the literal is not possible, whitelisting guards against attacks at later stages of tool execution where user input is used in different contexts
    • Parameters are inserted directly to scripts, but only input that conforms to strict pattern is allowed (for example, decimal values)
      • Whitelisting allowed characters is not enough, but also the order of the characters must be checked
      • As we don't have the security of separated literals in this case, we balance it with stricter pattern matching requirements
    • And always check the input for a sensible maximum length (1000 characters by default)
    • As of now, other policies are not allowed (e.g., using only character whitelisting while inserting user input as it is into executable script)
  • Analysis server runs under one dedicated account and all calculation is done with that account
  • Only preconfigured analysis programs can be run, user cannot send own code to be run
  • Analyser can use external web servers for annotation, but handles them in a fashion where compromised servers can not be used attacking
    • Does not store files using names given by external servers etc.
  • If there is a vulnerability in the analysis component allowing injection of hostile code, a modified analysis request could execute hostile code with analysis server's privileges
    • Request would have to go through many layers of checks
    • All logins are logged, so there would be a trace

System reliability

  • Reliability is not critical in analysis environment
  • Monitoring system (Nagios) can be used to monitor hardware and the actual system
  • By flooding system with messages, the public broker can be DoS'ed
  • By flooding system with analysis requests, the analyser component can be DoS'ed
    • But requires user account/password or captured user session identifier
  • These all scenarios are very far fetched

Related

Wiki: ArchitectureDescription