Author(s) Module: Olaf Conrad, (c) 2004
Author(s) Wiki Documentation: Vern Cimmery, (c) 2011 (kapcimmery at hotmail dot com)
License Wiki Documentation: Creative Commons Attribution-ShareAlike 3.0 Unported License
Description:
Multiple regression analysis assumes the existence of a linear relationship between the dependent and independent variables. The multiple regression equation involving two independent variables can be written as:
Y= a + b1X1 + b2X2
where Y is the dependent variable; X1 and X2 are the independent variables; “a” is the intercept; and “b1” and “b2” are the coefficients of the independent variables X1 and X2. The intercept represents the value of Y when the values of the independent variables are equal to zero, and the parameter coefficients indicate the change in Y for a one-unit increase in the corresponding independent variable.
The SAGA multiple regression analysis module uses a forward selection procedure.
Figure MRA.1. Settings page for module Multiple Regression Analysis (Grids/Points).
Input
Grid system
The entry for the ‘Grid system’ parameter must be the grid system the input grid data layers for the '>>Grids' parameter are a part. In order to be selected, the grid system must be loaded in the current work session.
>>Grids
The grid data layer or layers chosen for this parameter are the independent variables in the multiple regression formula. These grid data layers must be a part of the grid system chosen for the 'Grid system' parameter, otherwise, they will not ba available to choose.
>>Shapes
The ‘>>Shapes’ parameter is where you choose a point shapes data layer that will be the dependent variable for the multiple regression analysis. The list you choose from will display all shapes data layers (point, line, and polygon) loaded for the work session. It is up to the user to make sure a point shapes data layer is chosen. If a shapes data layer that is not a point shape data layer is chosen, the points defining the line or polygon features will be interpreted as points for this module.
Attribute
Unlike grid data layers, shapes data layers use attribute tables to provide characteristics of their objects or features. The ‘Attribute’ parameter is where you choose which attribute in the attribute table for the point shapes data layer will provide the numeric data for the dependent variable of the multiple regression analysis. When you click with the mouse pointer in the value field to the right of the ‘Attribute’ label, a list of the attributes for the point shapes data layer entered for the ‘>>Shapes’ parameter is displayed. Choose the attribute you want to use from the list.
Output
<<Regression
The ‘<<Regression’ parameter is the grid data layer output from this module. The default entry for this parameter is “[create]”. Normally you will use the default. The output grid data layer will be named using the name of the input point shapes data layer concatenated with “(Multiple Regression Analysis (Grids/Points))”. A rather long name for a data layer.
<Residuals
The optional ‘<Residuals’ output is a point shapes data layer made up of the same point objects as the input shapes data layer. One of the three attributes for the objects will be residual data value. The default for this parameter is “[not set]”. This parameter could be set to "[create]". When the "[create]" option is chosen, the output shapes data layer will use the name of the input point shapes data layer concatenated with: “[Residuals]”.
The attribute table for the output ‘<Residuals’ point shapes data layer is made up of one record for each point or object. Each record has three columns or data fields. The first column contains the original data value for the point. The second column is called the trend value. This column contains the data value for the point calculated using the multiple regression formula. The third column is the residual. The residual is the plus or minus difference between the original and trend values for the point.
<Details
The optional ‘<Details’ parameter is located in the ‘Tables’ section of the module settings page. The default for this parameter is “[not set]”. This parameter could be set to “[create]”. You can choose “[create]” or an existing table file by clicking in the value field to the right of the ‘<Details’ label. A list of the table files active in the work session and the “[create]” choice will display. If you choose an existing table, its’ content will be replaced by the table output from the module. If you choose “[create]”, a table displaying tabular analysis results will be created. The default name for the ‘<Details’ table is “Multiple Regression Analysis”. “[create]” would be the normal choice if you want this optional output.
Options
Grid Interpolation
The method the module will use for data interpolation can be chosen with the ‘Grid Interpolation’ parameter. The default choice is to use the "B-Spline Interpolation" approach. The other choices are Nearest Neighbor, Bilinear Interpolation, Inverse Distance Interpolation, and Bicubic Spline Interpoloation.
Execution Example:
In this example, the Multiple Regression Analysis (Grids/Points) module will be used to create a grid data layer for average annual temperature for the Olympic Peninsula in Washington State, US. Average annual temperature data is available for weather stations at a variety of locations in and near the peninsula. Observers have concluded that average annual temperature appears to have a relationship with elevation and slope aspect.
The Multiple Regression Analysis (Grids/Points) module uses the average annual temperature data for point locations as the dependent variable and elevation and slope aspect as the independent variables. The point locations are objects in the point shapes data layer named 'OlyPenTempAvg'
. The two grid data layers for elevation and aspect are ‘OPAspect2’ and ‘OPdem’.
1. Execute the Multiple Regression Analysis (Grids/Points) module.
2. Choose the grid system the input grid data layers for the '>>Grids' parameter are a part for the 'Grid system' paramter.
3. The grid data layers 'OPdem' and 'OPAspect2' are chosen from the drop-down list of layers for the '>>Grids' parameter.
4. Verify that the "[create]" option is chosen for the output parameter '<<Regression'.
5. The point shapes layer, 'OlyPenTempAvg'
is entered for the '>>Shapes' parameter.
6. 'TA-ANNUAL' is chosen from the list of attributes for the 'OlyPenTempAvg'
layer for the 'Attribute' parameter.
7. Choose the "[create]" option for the '<Residuals' parameter. The default is "[not set]".
8. Choose the "[create]" option for the '<Details' parameter. The default is "[not set]".
9. Use the default entry "B-Spline Interpolation" for the Opton parameter 'Grid Interpolation'.
10. When changes have been made to the module parameters and options, click on the 'Okay' button to execute the module.
11. When module execution is complete, the outputs will appear in the 'Tab' area of the Workspace.
Notes:
1. The output grid data layer for the '<<Regression' parameter will be named using the name of the input point shapes data layer concatenated with "(Multiple Regression Analysis (Grids/Points)".
2. The entry for the 'Attribute' parameter specifies the data source for the dependent variable of the multiple regression analysis.
3. The entry for the 'Attribute' parameter must be a numeric variable.
4. The output shapes data layer for the '<Residuals' parameter will use the name of the input point shapes data layer concatenated with: "[Residuals]".
5. The default name for the output '<Details' table is "Multiple Regression Analysis".
References:
Cimmery, Vern (2011): Multiple Regression Analysis (Grids/Points), SAGA References, unpublished .pdf document. [PENDING]