A multiple regression is called multiple because it has several data points

Here Y is the dependent variable, and X1,…,Xn are the n independent variables. In calculating the weights, a, b1,…,bn, regression analysis ensures maximal prediction of the dependent variable from the set of independent variables. This is usually done by least squares estimation.

This approach can be applied to analyze multivariate time series data when one of the variables is dependent on a set of other variables. We can model the dependent variable Y on the set of independent variables. At any time instant when we are given the values of the independent variables, we can predict the value of Y from Eq. 1.

In time series analysis, it is possible to do regression analysis against a set of past values of the variables. This is known as autoregression (AR). Let us consider n variables. We have a time series corresponding to each variable. At time t, the vector Zt represents the values of the n variables. The general autoregressive model assumes that Zt can be represented as:

(2)Zt=A1Zt−1+A2Zt−2+…+ApZt−p+Et

where each Ai (an n × n matrix) is the autoregression coefficient. Zt is the column vector of length n, denoting the values of the time series variables at time t. p is the order of the filter which is generally much less than the length of the series. The noise term or residual, Et, is almost always assumed to be Gaussian white noise.

In a more general case, we can consider that the values Zt-1, Zt-2 … Zt-p are themselves noisy values. Adding the noise values to Eq. 2, we get the ARMA (autoregressive moving average) equation:

(3)Zt=A1Zt−1+A2Zt−2+…+ApZt−p+Et−B1Et−1−B2Et−2…−BpEt−p

Here B1 … Bp, (each an n × n matrix), are the MA coefficients. These coefficients can be determined using the standard Box-Jenkins methodology (Box et al., 1994).

AR and ARMA provide a nice way to predict what should be happening to all of the time series simultaneously, and they allow one time series to use other series to increase their accuracy.

Read full chapterView PDFDownload book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012369378550017X

Multivariate Analysis of Forensic Fraud, 2000–2010

Brent E. Turvey, in Forensic Fraud, 2013

Hierarchical multiple regression analyses presented in Tables 9-4 through 9-6 revealed numerous “pools of variation,” each suggesting immediate recommendations for additional related research to determine the nature of potentially significant relationships between specific variables examined in the present study.9

1.

In Table 9-4, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 17% of all variance among Dissemblers. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

2.

In Table 9-4, Examiner characteristics (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) combine to account for 2% of all variance found among Dissemblers. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

3.

In Table 9-4, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 6% of all variance found among Simulators. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

4.

In Table 9-5, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 6% of all variance found related to Severity of Consequences to the examiner. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

5.

In Table 9-5, Examiner characteristics (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) combine to account for 17% of all variance found related to Severity of Consequences to the examiner. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

6.

In Table 9-5, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 10% of all variance found related to Cases Reviewed. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

7.

In Table 9-5, Examiner characteristics (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) combine to account for 6% of all variance found related to Cases Reviewed. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

8.

In Table 9-6, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 5% of all variance found related to Money. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

9.

In Table 9-6, Examiner characteristics (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) combine to account for 7% of all variance found related to Money. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

10.

In Table 9-6, Employer characteristics (Employer Independence, Accredited Lab, and Independent Audit) combine to account for 28% of all variance related to Chain of Custody evidence. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

11.

In Table 9-6, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 5% of all variance related to Chain of Custody evidence. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

12.

In Table 9-6, Examiner characteristics (Isolated Incident, Science Education, History of Addiction, Criminal History, and History of Fraud) combine to account for 8% of all variance related to Chain of Custody evidence. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

13.

In Table 9-6, Job Description characteristics (JLAB, JTEC, JLEX, and JMED) combine to account for 16% of all variance related to Drug1 evidence. This evidences that one or more are significantly correlated with that particular approach to fraud. Further study is required to reveal which.

Again these research recommendations are those explicitly suggested by the current data, in order to gain a greater understanding of the relationships that the current data suggests.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080737000094

Regression Analysis

Sue Nugus, in Financial Planning Using Excel (Second Edition), 2009

Multiple linear regression

When two or more dependent (X) variables are required for a prediction the analysis is referred to as multiple linear regression.

The simple linear regression equation can be adapted to accommodate multiple dependent variables in the following way:

Y=A0+A1X1+A2X2+⋯+AnXn

Theoretically there is no limit to the number of independent variables that can be analysed, but within the spreadsheet the maximum is 75. This is a far greater number than would ever actually be required in a business situation.

For example, Figure 5.14 represents expenditure on advertising and sales promotion together with sales achieved, where the advertising and sales promotion are independent variables and the sales achieved is the dependent variable. The requirement is to estimate future sales by entering advertising and promotion expenditure, and applying the multiple regression equation to predict the sales value.

A multiple regression is called multiple because it has several data points

Figure 5.14. Multiple regression scenario

In Excel there are no built-in functions for multiple regression and therefore the command method is required. Figure 5.15 shows the results of the Regression command where the range c3:c12 from Figure 5.14 was specified as the dependent variable and the range a3:b12 from Figure 5.14 was specified as the independent variables. The regression output has been placed on the same sheet commencing in cell a15.

A multiple regression is called multiple because it has several data points

Figure 5.15. Results of Regression command for multiple linear regression

In order to calculate the estimated sales value in cell g5 the data in cells b30, b31 and b32 shown in Figure 5.15 are required and the formula for cell g5 is:

=(G6*B31)+(G7*B32)+B30

To use this multiple regression model, data must first be entered into cells g6 and g7. Figure 5.16 shows the estimated sales value if the advertising expenditure is 250 and the promotion expenditure is 125.

A multiple regression is called multiple because it has several data points

Figure 5.16. Results of multiple regression using the command method

Provided the X and Y data does not change, different values can be entered into cells g6 and g7 and the estimated sales value will recalculate correctly. However, because the command method has been used for the regression, if any of the X or Y data changes it will be necessary to execute the Regression command again.

Selecting variables for multiple regression

A problem encountered when applying multiple regression methods in forecasting is deciding which variables to use. In the interests of obtaining as accurate a forecast as possible, it is obvious that all the relevant variables should be included. However, it is often not feasible to do this, because, apart from the complexity of the resulting equation, the following factors must be considered:

In a forecasting situation, the inclusion of variables that do not significantly contribute to the regression merely inflate the error of the resulting prediction.

The cost of monitoring many variables may be very high – so it is advisable to exclude variables that are not contributing significantly.

There is no point including terms that contribute less than the error variance, unless there is strong prior evidence that a particular variable should be included.

For successful forecasting, an equation that is stable over a wide range of conditions is necessary. The smaller the number of variables in the equation, the more stable and reliable the equation.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781856175517000057

Markov Chain Monte Carlo Methods: Computation and Inference

Siddhartha Chib, in Handbook of Econometrics, 2001

8.3 Normal and student-t regression models

Consider the univariate regression model defined by the specification

yi|ℳ,β,σ2~Nxi′β,σ2,i≤n,β~Nkβ0β0,σ2~ℐGυ,02δ02.

The target distribution is

πβ,σ2|ℳ,y∝pβpσ2∏i=1nfyi|xi′β,σ2,

and MCMC simulation proceeds by a Gibbs chain defined through the full conditional distributions

β|y,ℳ,σ2;σ2|y,ℳ,β.

Each of these distributions is straightforward to derive because conditioned on σ2 both the prior and the likelihood have Gaussian forms (and hence the updated distribution is Gaussian with moments found by completing the square for the terms in the exponential function) while conditioned on β, the updated distribution of σ2 is inverse gamma with parameters found by adding the exponents of the prior and the likelihood.

Algorithm 5: Gaussian multiple regression

(1)

Sample

β∼NkBnB0−1β0+σ−2∑i=1nxiyi,Bn=B0−1+σ−2∑i=1nxixi′−1

(2)

Sample

σ2~ℐGυ0+n2δ0+∑i=1nyi−xi′β22

(3)

Goto 1.

This algorithm can be easily modified to permit the observations yi to follow a Student-t distribution. The modification, proposed by Carlin and Poison (1991), utilizes the fact that if

λi~Gξ2ξ2,

and

yi|ℳ,β,σ2,λi~Nxi′β,λi−1σ2,

then

yi|ℳ,β,σ2~fTyi|xi′β,σ2,ξ,i≤n.

Hence, if one defines ψ = (β, σ2,{λi}) then, conditioned on {λi}, the model is Gaussian and a variant of Algorithm 5 can be used. Furthermore, conditioned on (β, σ2), the full conditional distribution of {λi} factors into a product of independent Gamma distributions.

Algorithm 6: Student-t multiple regression

(1)

Sample

β~NkBn,λB0−1β0+σ−2∑i=1nλixiyi,Bn,λ=B0−1+σ−2∑i=1nλixixi′−1.

(2)

Sample

σ2~ℐGυ0+n2δ0+∑i=1nλiyi−xi′β22.

(3)

Sample

λi~Gξ+12ξ+σ−2yi−xiβ22,i≤n.

(4)

Goto 1.

Another modification of Algorithm 5 is to Zellner’s seemingly unrelated regression model (SUR). In this case a vector of ρ observations are generated from the model

yt|ℳ,β,Ω~NXtβΩ,t≤n,β~Nkβ0,B0,Ω−1~Wpν0R0,

where yt = (y1t,…, ypt)′, Xt = diag(x′1t,…, x′pt), β = (β′1,…, β′p)′: k × 1, and k=∑iki.

To deal with this model, a two block MCMC approach can be used as proposed by Blattberg and George (1991) and Percy (1992). Chib and Greenberg (1995b) extend that algorithm to SUR models with hierarchical priors and time-varying parameters of the type considered by Gammerman and Migon (1993).

For the SUR model, the posterior density of the parameters is proportional to

πβπΩ−1×|Ω−1|n/2exp−12∑t=1nyt−Xtβ′Ω−1yt−Xtβ,

and the MCMC algorithm is defined by the full conditional distributions

β|y,ℳ,Ω−1;Ω−1|y,ℳ,β.

These are both tractable, with the former a normal distribution and the latter a Wishart distribution.

Algorithm 7: Gaussian SUR

(1)

Sample

β~NkBnB0−1β0+∑t=1nXt′Ω−1yt,Bn=B0−1+∑t=1nXt′Ω−1Xt−1.

(2)

Sample

Ω−1~Wpν0+n,R0−1+∑t=1nyt−Xtβyt−Xtβ′−1

(3)

Goto 1.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S1573441201050103

The Poland Medical Bundle

Jacek Jakubowski PhD, ... Grzegorz Migut MSc, in Practical Predictive Analytics and Decisioning Systems for Medicine, 2015

Collinearity of Predictors

Highly correlated predictors used in the multiple regression do not contribute any new information to the model. Such variables unnecessarily complicate the model and usually lead to improper results. Collinearity leads to overestimation of standard errors of estimates, and hence to false assessment of the significance of the analyzed variables. In many cases it disables the calculation of the model (Hosmer and Lemeshow, 2000; Vittinghoff et al., 2005). If two or more variables are highly correlated with each other, it is worth considering the selection of one only. To identify highly correlated variables, the user can analyze the correlation matrix. In a situation where the number of predictors exceeds 10, this analysis may be very time consuming; hence, a better approach is to use advanced statistical methods such as the principal components analysis for identifying bundles of highly correlated variables. To conduct such an analysis, click the Representatives button located in the Collinearity group box.

In the Select representatives dialog (Figure N.59) there are two distinguished groups of potentially correlated variables: “Weight pound” and “BMI,” and “SBP” and “DBP.” Now we can generate a correlation matrix for bundles of correlated variables. In Figure N.60, you can see the correlation matrix of variables included in the first bundle that appears after clicking the Correlations button. You can see a very strong correlation between the variables, allowing you to safely eliminate one of them. To remove variables, deselect the Include field in the rows corresponding to those variables. If a bundle of variables consists of more than two variables, the user can automatically select a certain number of representatives by clicking the blue tick.

A multiple regression is called multiple because it has several data points

Figure N.59. The Select representatives dialog.

A multiple regression is called multiple because it has several data points

Figure N.60. The correlation matrix for bundle of variables.

After selection of representatives is complete, click the OK button to go back to the Simple logistic regression dialog. Eliminated variables are excluded from further analyses.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124116436000314

Conditional Credit Migrations

StefanTrueck , Svetlozar T.Rachev , in Rating Based Modeling of Credit Risk, 2009

9.5.1 Forecasts Using the Factor Model Approach

Following Kim (1999) the multiple regression model (11.3) is used for modeling and forecasting the continuous credit cycle index Zt. It is assumed that the index follows a standardized normal distribution. Thus, a probit model will allow us to create unbiased forecasts of the inverse normal CDF of Zt, given the recent information of the last period about the economic state and the estimated coefficients. Note that unlike Kim (1999), who uses only one credit cycle index based on speculative default probabilities, we will consider two credit cycle indices: one for speculative grade and one for investment grade issues. For the investment grade issues, we use cumulative defaults of issuers rated Aaa, Aa, A, and Baa, and Baa, while for the speculative grade issues, default probabilities from Ba to C were included. Figure 9.4 exemplarily reports the observed default frequencies for the noninvestment grade rating classes Ba, B, and C that were used for estimation of the speculative grade credit cycle index.

In a second step the forecasts of the credit cycle indices are used for determining conditional migration probabilities p^t(i,j|Zt). The adjustment is conducted following the procedure described in Section 3.1. However, for finding the optimal weights for the systematic risk indices wInv and wSpec, minimizing the discrepancies between the forecasted conditional and the actually observed transition probabilities, we introduce some model extensions. We allow for a more general weighting of the difference between forecasted and empirical observation for the transition probability in each cell. Hence, the weights for each of the cells are assigned according to some function f:

A multiple regression is called multiple because it has several data points

FIGURE 9.4. Moody’s historical default rates for speculative rating classes Ba (dotted), B (dashed), and C (solid) for the period 1984–1999.

(9.27)min⁡∑j∑if(i,j,pt(i,j),p^t(i,j|Zt))

where the outcome of f(i,j,pt(i,j),p^t(i,j|Zt))may be dependent on the row i and column j of the cell as well as on the forecasted and actually observed transition probabilities p^t(i,j|Zt)and pt(i, j).

To achieve a better interpretation of the results, we will also use risk-sensitive difference indices suggested in Chapter 7 as optimization criteria for the distance between forecasted and actual migration matrix. Recall that based on the estimated model, the parameter w and the shifts on the migrations according to some optimization criteria are determined. In fact, this a crucial point of the model as it comes to forecasting credit migration matrices. While Belkin et al. (1998b) suggest minimizing a weighted expression of the form, Wei (2003) uses the absolute percentage deviation based on the L1 norm or a pseudo R2 as goodness-of-fit criteria. As it was illustrated in Chapter 6, most of the distance measures suggested in the literature so far do not quantify differences between migration matrices adequately in terms of risk. However, forecasts for transition matrices will be especially used for determining credit VaR, portfolio management, and risk management purposes. Therefore, especially risk-sensitive difference indices may be a rewarding approach for measuring the difference between forecasted and observed matrices. Following the results in Chapter 7, we suggest that the difference between a migration matrix P = (pij) and Q = (qij) can be determined in a weighted cell-by-cell calculation. Following Trueck and Rachev (2007), we will include two risk-sensitive directed difference indices in our analysis as optimization criteria:

(9.28)D1(P,Q)undefined=∑i=1n∑j=1n−1d(i,j)+∑i=1nn⋅d(i,n)

(9.29)D2(P,Q)undefined=∑i=1n∑j=1n−1d(i,j)+∑i=1nn2⋅d(i,n)

To compare the results with standard criteria we will consider the classic L1 and L2 metric

(9.30)DL1(P,Q)=∑i=1n∑j=1n|pij−qij|

and

(9.31)DL2(P,Q)=∑i=1n∑j=1n(pij−qij)2

Further, the measure of so-called normalized squared differences NSDsymm

(9.32)DNSD(P,Q)=∑i=1n∑j=1n(pij−qij)2pij for pij≠0

is included in the analysis. Note that these criteria can also be used to evaluate the distance between forecasted and observed transition matrices for the numerical adjustment methods and the chosen benchmark models.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123736833000105

Maternal education and community characteristics as indicators of nutritional status of children—application of multivariate regression

Suresh C. Babu, Shailendra N. Gajanan, in Food Security, Poverty and Nutrition Policy Analysis (Third Edition), 2022

Tests about the equation

There are several hypotheses tests in a multiple regression output, but all of them try to determine whether the underlying model parameters are actually zero. The first question that may be asked is: “Is the multiple regression model any good at all?” To address the aforementioned question, we want to test the null hypothesis in Eq. (10.1) as follows:

(10.11)H0:β1=β2=…=βk=0

The alternative hypothesis is that the slope coefficients are not all equal to zero. The null hypothesis can be tested using the F-statistic as given by Eq. (10.8). The following analysis of variance (ANOVA) tables (Tables 10.6 and 10.7) divides the observed variability in the dependent variables (namely ZWA and ZHA) into two parts: the regression sum of squares (SSB) and the residual sum of squares (SSW). The total sum of squares is the sum of these two numbers. We can derive the coefficient of multiple determination (R2) by dividing the regression sum of squares by the total sum of squares.

Table 10.6. Analysis of variance table for weight-for-age Z-scores.

ModelsSum of squaresdfMean squareFSig.IRegression89.139118.1046.7970.00Residual284.9362391.192Total374.075250IIRegression95.757118.7057.4750.00Residual278.3182391.165Total374.075250

Table 10.7. Analysis of variance table for height-for-age Z-scores.

ModelsSum of squaresdfMean squareFSig.IRegression70.948116.453.9010.00Residual335.6492031.653Total406.597214IIRegression81.525117.4114.6280.00Residual325.0722031.601Total406.597214

The observed significance level for the F-statistic tells us if the null hypothesis can be rejected as in Eq. (10.11).

As is evident from Tables 10.6 and 10.7, the coefficient of multiple determination (R2) of Tables 10.4 and 10.5 can be obtained as the ratio of the regression sum of squares to the total sum of squares. For example, for Model 1 of determinants of ZWA, the ratio of regression sum of squares to the total sum of squares is (89.139/374.075) = 0.238. The computation of the F-values from Eq. (10.8) can be demonstrated as follows for Model 1 of the weight-for-age regressions:

F=MSB/MSW=8.1035/1.1922=6.797

The critical value of F for 11 and 239 degrees of freedom at the significance level α = 0.05 is 1.84. Since the obtained F value exceeds the critical value, we can infer that the slope coefficients are not all zero and conclude that the multiple regression model is better than just using the mean.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128204771000140

Statistics, Overview

Guy M. Robinson, in International Encyclopedia of Human Geography (Second Edition), 2020

Multiple Regression

Simple linear regression [y = f(x)] may be extended to more complex multiple regression [y = f(x1, x2, x3, …, xn)]. In multiple regression, the equation takes the following form:

y=a+b1x1+b2x2±e

where a = the intercept on the y-axis, b1 and b2 = partial regression coefficients, x1 and x2 = the independent variables, and e = an error term. The partial regression coefficients can only be compared with one another if they are transformed into beta weights (βi), where βi = bi(sxi/syi), bi = slope coefficient of independent variable xi, and sxi and syi = standard deviation of the dependent and independent variables, respectively.

βi-weights show how much change in the dependent variable is produced by a standardized change in one of the independent variables, with the influence of the other independent variables controlled, enabling assessment of the effects of the individual independent variables in the regression equation: the higher the βi-weight, the greater the rate at which the dependent variable increases with an increase in the particular independent variable.

In multiple regression the multiple correlation coefficient (ry.x1x2x3…xn) measures the correlation between y and all the independent variables. If there is correlation between the independent variables then collinearity exists, reducing the extent of the explanation of the dependent variable, and indicates redundancy in the independent variables. This redundancy may be eliminated using stepwise entry of variables into the equation by adding variables to the equation in their order of importance in reducing the variance of y, starting with the most important first.

Why is multiple regression called multiple?

Multiple regression requires two or more predictor variables, and this is why it is called multiple regression.

What is multiple regression called?

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. Multiple regression is an extension of linear (OLS) regression that uses just one explanatory variable.

What is multiple regression quizlet?

Multiple regression extends the principles of linear regression by using more than one variable as a predictor. It shows the relative importance of the predictors (if one predicts a higher amount of variance), and whether a dependent variable is best predicted by a combination of variables rather than one.

What is regression and multiple regression?

Regression analysis is a common statistical method used in finance and investing. Linear regression is one of the most common techniques of regression analysis. Multiple regression is a broader class of regressions that encompasses linear and nonlinear regressions with multiple explanatory variables.