ms-rfc:2
Subject:Html Form protection
Date: 19-07-2006
Obsoletes:none
Author: Carles Bonamusa cbonamusa@isecauditors.com

Html Form protection

Status of this Memo

This memo briefly describes a proposal for developing html form protection facilities into mod_security.

Abstract

According Html 4 specifications html forms are defined as a section which offers special controls that allow users to input data and submit it to an agent for processing (eventually a web server).This mechanism is the most deployed in web applications whenever it is needed to gather data from the user.

This document presents an initial draft to introduce ways to protect web forms from being misused by malicious crafted requests issued by an attacker. The aim of all this new mechanisms is to enforce the expected ( correct ) form data submission, and also detect any suspicious data that deviates from the normal model expected by the web application.

HTML Forms

As stated above, forms are mechanism that html version 4 defines to provide a way for web applications to obtain data from the user. According to that a form is defined by a special html markup tag called <FORM> . Data acquisition is done by using controls placed into that section. This elements are identified by the browser and rendered in a way that allows the user to input data and finally submit it to a predefined url location.

To offer several ways to input data, there have been provided various input controls:

  • buttons
  • check boxes
  • radio buttons
  • menu
  • text
  • file selection
  • hidden fields
  • object
  • selection list

Each one of this may receive a default value defined at the server side. Then the user can interact with form controls rendered by the browser changing their value and finally submit provided data by clicking submit button.

When user invokes this last action, html standard specifies how the browser must build the submission request. This request is crafted based upon url defined at the form definition as form action. Then all form controls containing a value ( successful controls as stated by the standard) are also added to the request in the form of name=value pair.

To send gathered data back to the serer there are two available ways, one of them must be selected when writing form markup by using method attribute to define submission method. When GET method is used, client browser will build the request as a url based on form's action attribute, appending successful controls name-value pairs as url parameters. The other option is to use POST method. Using this, form-dataset containing control's name-value pairs is passed in the body of the request instead of being chained that data to the request url.

Motivation

Html forms are one of the most extended ways used by web applications to obtain data from the user. That is why it has strong concerns when thinking in security issues.

The main threat comes from the fact that data gathering and form request building are tasks done by the browser. This implies that a malicious attacker is able to study form definition ( because it is contained into html code which is sent to the browser) and handcraft a request modifying control names and values with the aim of misleading the web application trying to abuse it.

In addition to this data validation tasks are out of the html standard and are clearly charged to the application. That is why every application is responsible for checking inputs obtained via web forms. This situation leads to a scenario where at best every application must rewrite data validation procedures for every form. Those tasks are boring, and so they become error prone. Besides this it shall be noted that within this model all security concerns are left to to the application programmer who is not only forced to know normal html form behaviour but also all possible web forms attacks.

Taking all this into account this document attempts to describe a mechanism to achieve an intermediate layer that would be able to provide generic form protection by enforcing some restrictions to form usage using cryptographic functions.

Goals and restrictions

As already said, main attack vectors to html forms come related to supplying maliciously build data. To this end there will always be some degree of responsibility in data validation at the application level. In that field, mod_security already offers some help by providing a way to define custom rules to protect each independent argument. But this task requires to know the application behaviour in order to build the appropriate rules.

That is not the goal which drives this effort, but the opposite one,our goal is to develop a generic shielding mechanism that dos not have to know much information about back-end applications, but only build upon filtering outgoing html code served in an automated manner.

Also the other way round, this new safety facilities must not interfere with normal application execution so they can run unaware of modsecurity form protection features and need not to be adapted in any way.

Attending to this restrictions, this proposal tries to define a mechanism to enforce structural correctness/fitness of form supplied data based only on the form structure identified by static analysis of html form definition.

Security Flaws

Keeping in mind all previous statements a list can be built enumerating all sorts of anomalous data submissions that can clearly be spotted no to be normal nor correct (suspicious enough to be treated as attacks) only by looking at original form declaration:

  • Addition of parameters not present in the original form definition.
  • Modification of predefined values for hidden fields.
  • Modification of available option values inside a select control.
  • Modification of action url where the form is submitted.
  • Exceeding control defined size.
  • Attempting to use the wrong submission method.

Protection Countermeasures

All previous flaws can be easily identified if some data about form structure is stored when form is sent to the user. So the idea is to filter outgoing html code issued by the back-end server searching for form definitions. Whenever a form markup tag is found , then for every submit button defined (which eventually means a different url for the action attribute) a form-description block is build containing the following data.

  • A list of defined parameters (name, type, and size if one is defined)
  • A list of hidden fields and their respective preassigned values.
  • For each select control, a list of its predefined available options.
  • submission method.
  • action url.

This data is then encrypted and sent to the user attached to the corresponding form (it could be added as an additional hidden field, or issued as a cookie).

Later when form submission occurs, before send form data to the back-end web application, modsecurity would be able to check submission request in order to filter out malicious data. This process would involve retrieve form-description data and then match actual request data against that predefined structure. Only in those cases where request matches all restrictions a form would be considered correct an so sent over to the originating application.

At this point it is important to notice that all this process constitutes another protection level , but it does not eliminate the need of data validation to be done by the application. This only enforces some structure to be accomplished by form requests with the aim of narrowing the range of possibilities for the attacker to be able to abuse protected applications.

Final Considerations

In all the previous described process, it is assumed that the full form definition is known when serving the page. That is, all form components are defined by server side application.

This scenario, although the most frequent, it is not the only one which is possible. It is important to notice that as with all the rest of web page contents, form definition can be modified by means of client side scripting access. That means that there are indeed applications that are built using client side javascript to modify form components. Often this modifications affect action uri and hidden attributes.

To be able to protect seamlessly this applications without interfering with all this client side stuff, it will be needed a way to selectively enable / disable a subset of the listed elements. With that option , it would be easy to enforce server side defined elements for some elements of the form, but ignoring action or hidden fields if those are known to be modified at client side by a legitimate javascript.