PHP Security / Overview / Data Filtering Thu, May 17. 2007
Data Filtering
数据过滤
As stated previously, data filtering is the cornerstone of Web application security, and this is independent of programming language or platform. It involves the mechanism by which you determine the validity of data that is entering and exiting the application, and a good software design can help developers to:
正如前面所说,数据过滤是 Web 程序安全的奠基石,这个规则和 编程语言以及平台无关。数据过滤包括了你如何检测程序输入输出的数据的合法性的机制,一个好的软件设计能够在如下方面帮助开发者:
- Ensure that data filtering cannot be bypassed,
- 确保没有办法越过数据过滤操作,
- Ensure that invalid data cannot be mistaken for valid data, and
- 确认不合法的数据不能被误认为是合法的数据,还有
- Identify the origin of data.
- 识别出数据的来源。
Opinions about how to ensure that data filtering cannot be bypassed vary, but there are two general approaches that seem to be the most common, and both of these provide a sufficient level of assurance.
如果确保不能越过数据过滤有很多种方法,不过最常用的也就两种方法,这两种方法都能够提供足够的保险。
The Dispatch Method
调度法
One method is to have a single PHP script available directly from the Web (via URL). Everything else is a module included with include or require as needed. This method usually requires that a GET variable be passed along with every URL, identifying the task. This GET variable can be considered the replacement for the script name that would be used in a more simplistic design. For example:
第一种方法是只允许 Web (通过 URL 调用)直接访问到一个单一 PHP 脚本。所有其他的东西都是在需要的时候通过 include 或者 require 加载进来的模块。这个方法通常需要在每个 URL 里都传递一个特定的 GET 变量,来识别要完成的任务。这个 GET 变量可以理解成对简单设计里使用的脚本名称的取代。例如:
http://example.org/dispatch.php?task=print_form
The file dispatch.php is the only file within document root. This allows a developer to do two important things:
dispatch.php 文件是根目录里唯一的文件。这让开发者能够做下面两件重要的事情:
- Implement some global security measures at the top of
dispatch.phpand be assured that these measures cannot be bypassed. - 在
dispatch.php的开始实现全局的安全措施,并且确保这些措施不能被越过。 - Easily see that data filtering takes place when necessary, by focusing on the control flow of a specific task.
- 你把注意力集中在一个具体的任务的控制流程时,在需要的时候可以很容易的看到数据过滤是如何起作用的。
To further explain this, consider the following example dispatch.php script:
请看下面的例子,以进一步解释 dispatch.php 脚本:
- <?php
- / Global security measures /
- switch ($_GET['task'])
- {
- case 'print_form':
- include '/inc/presentation/form.inc';
- break;
- case 'process_form':
- $form_valid = false;
- include '/inc/logic/process.inc';
- if ($form_valid)
- {
- include '/inc/presentation/end.inc';
- }
- else
- {
- include '/inc/presentation/form.inc';
- }
- break;
- default:
- include '/inc/presentation/index.inc';
- break;
- }
- ?>
If this is the only public PHP script, then it should be clear that the design of this application ensures that any global security measures taken at the top cannot be bypassed. It also lets a developer easily see the control flow for a specific task. For example, instead of glancing through a lot of code, it is easy to see that end.inc is only displayed to a user when $form_valid is true, and because it is initialized as false just before process.inc is included, it is clear that the logic within process.inc must set it to true, otherwise the form is displayed again (presumably with appropriate error messages).
如果这个文件是唯一的公开的 PHP 脚本,那么很明显了,这个程序的设计确保了程序开始处的全局安全措施不能被越过。这样的设计也能让开发者很容易看清楚某一个特别的任务的控制流程。例如,不用看完大段的代码,就能够很容易看到只有在 $form_valid 的值为 true 的时候,end.inc 才会展示给用户,因为在加载 process.inc 前 $form_valid 就初始化为 false 了,所以非常明显的,process.inc 中的程序逻辑必须把 $form_valid 设置为 true,否则这个表单就会重复显示(推测是为了显示错误信息)。
NOTE:
If you use a directory index file such as index.php (instead of dispatch.php), you can use URLs such as http://example.org/?task=print_form.
You can also use the Apache ForceType directive or mod_rewrite to accommodate URLs such as http://example.org/app/print-form.
备注:
如果你用的是类似于 index.php 这样的 directory index 文件 (而不是 dispatch.php),你可以用 http://example.org/?task=print_form 这样的 URL 来访问。
你还可以用 Apache 的 ForceType 命令或者 mod_rewrite 模块重写类似http://example.org/app/print-form 这样的 URL。
The Include Method
Include 法
Another approach is to have a single module that is responsible for all security measures. This module is included at the top (or very near the top) of all PHP scripts that are public (available via URL). Consider the followingsecurity.inc script:
security.inc 脚本:
- <?php
- switch ($_POST['form'])
- {
- case 'login':
- $allowed[] = 'form';
- $allowed[] = 'username';
- $allowed[] = 'password';
- if ($allowed == $sent)
- {
- include '/inc/logic/process.inc';
- }
- break;
- }
- ?>
In this example, each form that is submitted is expected to have a form variable named form that uniquely identifies it, and security.inc has a separate case to handle the data filtering for that particular form. An example of an HTML form that fulfills this requirement is as follows:
在这个例子里,所有客户端提交的表单都应该有一个名为 form 的表单变量以唯一标示这个表单,security.inc 也有一个独立的 case 语句处理对应每一个表单特定的数据过滤。下面是一个匹配这个请求的 HTML 表单的例子:
An array named $allowed is used to identify exactly which form variables are allowed, and this list must be identical in order for the form to be processed. Control flow is determined elsewhere, and process.inc is where the actual data filtering takes place.
这个名为 $allowed 的数组被用来准备的标示那些允许的表单变量,而且数组的元素顺序必须和要处理的表单的元素顺序完全一样。控制流程在其他的地方写,process.inc 是真正用来进行数据过滤得程序。
NOTE:
A good way to ensure that security.inc is always included at the top of every PHP script is to use the auto_prepend_file directive.
备注:
为了确保每个 PHP 脚本都在头部引用了 security.inc,有一个好方法就是使用 auto_prepend_file指令。
Filtering Examples
数据过滤实例
It is important to take a whitelist approach to your data filtering, and while it is impossible to give examples for every type of form data you may encounter, a few examples can help to illustrate a sound approach.
给你的数据过滤创建一个白名单是很重要的,既然没有办法给每种你可能遇到的表单数据类型都举一个例子,那么下面几个例子能够帮助你理解正确的数据过滤得方法。
The following validates an email address:
下面的例子是用来验证电子邮件地址的:
- <?php
- $email_pattern =
- '/^[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}$/i';
- {
- $clean['email'] = $_POST['email'];
- }
- ?>
The following ensures that $_POST['color'] is red, green, or blue:
下面一个例子确保 $_POST['color'] 的取值只能是 red、green 或 blue:
- <?php
- switch ($_POST['color'])
- {
- case 'red':
- case 'green':
- case 'blue':
- $clean['color'] = $_POST['color'];
- break;
- }
- ?>
The following example ensures that $_POST['num'] is an integer:
下面的例子用来保证 $_POST['num'] 是一个整数:
- <?php
- {
- $clean['num'] = $_POST['num'];
- }
- ?>
The following example ensures that $_POST['num'] is a float:
下面的例子用来保证 $_POST['num'] 是一个浮点数:
- <?php
- {
- $clean['num'] = $_POST['num'];
- }
- ?>
Naming Conventions
命名转换
Each of the previous examples make use of an array named $clean. This illustrates a good practice that can help developers identify whether data is potentially tainted. You should never make a practice of validating data and leaving it in $_POST or $_GET, because it is important for developers to always be suspicious of data within these arrays.
上面的所有例子都使用了一个名为 $clean 的数组。这展示了一种很好的,能够帮助开发者区分数据是否可靠的习惯。你不得再验证数据后还把这些数据留在 $_POST 或 $_GET 数组里,因为对开发者来说,不信任这些数组里的数据是很重要的。
In addition, a more liberal use of $clean can allow you to consider everything else to be tainted, and this more closely resembles a whitelist approach and therefore offers an increased level of security.
另外,一个更自由的使用 $clean 的方法就是你假设其他的所有变量都是不保险的,这更接近于使用白名单的方法,因此这样提供了一个更高的安全级。
If you only store data in $clean after it has been validated, the only risk in a failure to validate something is that you might reference an array element that doesn't exist rather than potentially tainted data.
如果你把通过验证的数据都存储在 $clean 里,那么一些变量验证失败会引起唯一的风险就是你会引用到一个空的数组元素,而不是原来那个不可靠的数据。
Timing
时序
Once a PHP script begins processing, the entire HTTP request has been received. This means that the user does not have another opportunity to send data, and therefore no data can be injected into your script (even if register_globals is enabled). This is why initializing your variables is such a good practice.
一旦 PHP 脚本开始执行,就说明整个 HTTP 请求都已经被接受了。这也就意味着用户没有机会再发送任何数据了,因此,也就无法向脚本注入任何数据(即使启用了 register_globals 选项)。所以说,初始化变量是一个很好的习惯。
