Need to properly escape, encode, and validate user input and generated output. Anyone know of a best practice?
To escape special chars, one option is htmlspecialchars()
To prevent SQL injection, one option is mysql_real_escape_string()
One of the comments on the mysql_real_escape_string manual page had a snippet on how to quickly clean the usual suspects ($_GET, $_POST, $_COOKIE, $_REQUEST). Only problem is it overwrites the dirty data with no way to recover it. The mysql_real_escape_string function is a good way to prevent SQL injection for mysql interface. The only problem being that the clean data from this approach is good for the db, but might not be the best for HTML output. ie things escaped for MySQL may not need to be escaped HTML. Should our approach be to have 2 arrays for our input (SQLSAFE and HTMLSAFE) and only use those arays when accessing the input data?
How much validating do we want to do? Do we care about the input beyond making sure it's a string or a number? Do we want to care about how many characters are in the string input and trim it if it's long. Or just let MySQL truncate the input?
NOTE: A best solution for preventing SQL injection would be using bind variables in the SQL statements which means moving up to the mysqli interface (requires PHP 5.0 or higher) or using PDO (shipped with PHP 5.1).
Based on the poll so far, It seems that php 5.1 or 5.2 should be the version we target. So, that clears the way for using bind variables.
I think that converting queries over to bind variables is approximately the same work as using mysql_real_escape_string. So, might as well leap frog to that, instead.
Could we take a look at what some other packages does that is more widely used? Something like wordpress and its handling of user input that allows some html formatting?
I downloaded the WordPress code. From what I see they check to see if get_magic_quotes_gpc is active. If so then apply the stripslashes to the input. Then they just use the mysqli_real_escape_string which is called from a custom db object they have. There is also some extra code to traverse into sub arrays. The last thing they do is take the clean $_GET and $_POST input and put it into $_REQUEST. They mostly seem to apply strip_tags at a later point in the processing.