The Two-Way ANOVA compares 1) the mean differences in the test variable (dependent variable) between groups (independent variable) which is the first "main effects" factor; and 2) the same mean differences between these same groups, but they are sub-divided by a second independent variable which is the second "main effects" factor. The primary purpose of a Two-Way ANOVA is to understand if there is an interaction between the two main effects factors (the two independent variables) on the test variable (dependent variable).

(1) The dependent variable (test variable) is continuous (interval or ratio).

(2) The two independent variables (factor variables) are categorical (nominal or ordinal) and there can be two, three, or more groups in each independent variable.

(3) The participants (samples) have no relationship between the other participants in their group or between the participants from the other groups.

(4) The participants (samples) for each group are taken at random from the population.

(5) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(6) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.

** (Q1)** Does the dependent variable (Calories) have a reasonably normal distribution across the three venue groups?

(Answer: Yes). All the p-values for the three venue groups in both the Kolmogorov-Smirnov and Shapiro-Wilks tests are above the critical .05 alpha level. Therefore, we can say that as these tests for normal distribution have not be violated, then data for each venue can be considered as having a normal distribution.

** (Q2)** Does the dependent variable (Calories) have a equal variance between the three venue groups?

(Answer: Yes). The variance (whisker-to-whisker) for the three groups, although not exactly 100% equal, are roughly the same to each other... happiness.

To start the analysis, click *Analyze > General Linear Mode > Univariate*

This will bring up the ** Univariate ** dialogue box. To carry out the test, move the dependent (scale) variable into the

We will walk through several of the option buttons to bolt-on a few helpful and important measurements. For a simple Two-Way ANOVA test where you have __not__ included extra random factors or covariates, then the ** Model** and

There are only two factors, so place one on the ** Horizontal Axis:** placard and the other on the

Next click the ** Add ** button, then select the type of chart you want (Line or Bar) and select the

Next, open the ** Post Hoc** button. You can only carry out Post Hoc testing 1) on main effect variables, 2) the main effect variables have three or more groups (or levels), and 3) it is important to know between which groups the differences that exist are statistically significant. In our example, the Venue variable is the only main effect variable that has three levels (Putney, East Sheen, and Tooting), and I moved it into the

In every Two-Way ANOVA there is an interaction variable created between the two main effects variables. In our example, it will be the interaction between Venue and Gender (Venue * Gender), which examines if there is a statistically significant difference between a male and a female at each respective venue. To examine this interaction variable, open the ** EM Means** (Estimated Marginal Means) button.

In our example, I moved the interaction variable (Venue * Gender) into the ** Display Means for:** placard. Next, tick the

The next area to open is the ** Save** button. You would have tested for the assumption of a reasonably normal distribution for the dependent variable for each group prior to running the test. However, if you have a complex ANOVA model, such as, 2 x 2 x 4 (which has 16 groups); or if there are only a few observations per group, which makes it difficult to check for normal distribution; or if you have a covariate in the model, then saving and testing the residuals is often the better way to test that the assumption of normal distribution is satisfied. In our example, I selected

Finally, open the ** Options** button. Here you want to select from the

Remember, not all these extra options you will need to run an initial and straight forward Two-Way ANOVA test. But we have walked through most of them to give you the confidence to find and add what you may need for your project. When finished, click the ** OK** button to run the test.

The results will appear in the SPSS Output Viewer. In the ** Between-Subject Factors** table there is the sample size (N) for each group. And in the

Here is this example, the differences in calories for the meals eaten across the three venues is statistically significant (F = 11.532 ; p <.001). But the differences in calories for the meals eaten across the two genders is __not__ statistically significant (F = .012 ; p = .915). Equally, the differences in the calories for the meals eaten across the three venues and whether it was a man or woman eating the meal (interaction factor) is __not__ statistically significant (F = .641 ; p = .533).

This is the standard result for the Two-way ANOVA test if you simply entered the variables in the main Univariate dialogue box and did not select any of the extra options. As you can see it give the basic result for the variables entered, but it provides no further (and often important) details.

Looking at the extra options that were selected, in the * Descriptives* table there are the key metrics for each group -- sample size (N), mean, and standard deviation. From these measurements you can understand in greater detail as to why the test indicated (or did not indicate) a statistically significant result for the variables tested.

Here is this example, the differences in calories between the three venues moves from 3972 (Putney) to 5465 (East Sheen) to 7012 (Tooting). This is an average difference of 1520 calories (minimum) to 3040 calories (maximum) depending on how you pair-up the three venues, which gives numeric evidence as to why the differences between the venues were statistically significant. Conversely, the difference between the males (5761) and females (5541) was only 220 calories, which which gives similar numeric evidence as to why the difference between the genders was __not__ statistically significant.

Finally, for the interaction between the venue and gender, at Putney the difference between male (3429) and female (4282) was 853 calories, at East Sheen the difference between male (5846) and female (5179) was 667 calories, and at Tooting the difference between male (7020) and female (7006) was 14 calories. Here we can see that there was virtually no difference between a man eating at Tooting as compared to a woman eating at Tooting. Equally, you might suspect the difference between a man eating at Putney as compared to a woman eating at Putney would be statistically significant, as there is a 24.8% difference in calories. However, the sample size is very small with only 4 males as compared to 7 females -- SPSS is counter-balancing (mathematically) a robust difference against an extremely small sample sizes with reasonable large variances (standard deviations).

In the ** Levene's Test of Equality of Error Variances** table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect all four measurements (p-values) to agree. And this is true with our example with all the p-values virtually the same between 0.287 and 0.308. The measurement you would refer to (and quote) in your write-up would be the top row titled,

Normal distribution and homogeneity of variance are important test assumptions; and having met these two assumptions will give you strong confidence that the test results are reliable. Later, we will conduct a secondary analysis on the Residuals and Cook's Distance metrics (that we saved earlier) as further evidence that the model meets these two important assumptions.

The ** Post Hoc Tests** table will indicate where (between which pair-wise comparisons of the three venues) the statistical significant difference has occurred. Here in our example it was only in the main effect Venue variable (and not in the main effect Gender variable) that the differences in calories for the meals eaten was statistically significant (F

The ** Post Hoc** test examines only the main effect variables and not the interaction variable (Venue*Gender). Here in our example, we have seen all three pair-wise comparisons of the venues have differences in the calories that are statistically significant. But we do not know if there are statistically significant differences between the male and female genders at each of these venues. The result of the

As mentioned earlier, here again we are looking for any large mean difference that would have a p-value below the critical alpha level 0.05. Here in our example for the female gender there is a statistically significant difference at the Putney to Tooting (p = .002) and the East Sheen to Tooting (p = .029) venues. But for the male gender there is only a statistically significant difference at the Putney to Tooting (p = .001) venues.

The ** Profile Plots** will give a snapshot of the test results of the two main effect factors (Venue and Gender) and the interaction factor (Venue*Gender). Here in our example you can visualise the robust jumps (the green dashes are the estimated average) in calories across the three venues -- 3972 (Putney) to 5465 (East Sheen) to 7012 (Tooting) which is where the statistically significant differences are occurring.

If you imagine the three venues collected into one plot, the three red dots (males) would merged near an average of 5761 calories and the three blue dots (females) would merged near an average of 5542 calories. This is a very small mean difference of 219 calories, (only a 3.95% increase in calories from females to males), and hence we can see why the effect of the second main factor gender was not statistically significant.

You can also visualize the interaction between a man or woman at each individual venue with Tooting being virtual no difference between the genders and Putney being the greatest difference between the genders. You should also notice the the red line (male) and the blue line (female) cross over each other between Putney and East Sheen. This 'crossing over' is very typical of an interaction effect rather than the lines staying parallel to each other. However, despite this crossing over, there is still not enough evidence to indicate the interaction factor is statistically significant. This could be because of the small sample sizes, the actual mean difference between the male and female at each venue, and the high degree of overlapping in the 95% confidence intervals.

Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the Two-Way ANOVA model. However, this may not always be possible with complex modelling tests that include random factors or covariates.

Therefore, it is equally important to carry out secondary analysis on two model created variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.

In the histogram chart of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data. The histogram is showing the data values are centrally gathered around the mean (not skewed to the left or right tails and no significant outliers).

In the scatter chart, which is a comparison of Cook's Distance to the model's dependent variable Calories, we can see that for every observation in Calories (N = 40) that the Cook's D values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array.

That said, there are two values that are just sneaking over the allowable limit (values that are 3X higher than the mean of the data array) which are indicators of likely influential observations which should be investigated. In our example the mean of our Cook's Distance variable is 0.03 and therefore 3X this mean is 0.09. Any up-spike values in Cook's Distance could bring into question the 100% numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model. In our example it is only 2 values out of 40 samples (5.0%), and they are both just over the allowable limit.

For secondary analysis these charts provide evidence that the Two-Way ANOVA model with the dependent variable Calories as an outcome of (or predicted by) the two independent variables (Venue and Gender) meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.

Happiness... you should now understand how to perform the Two-Way ANOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

- how2stats (External videos)This YouTube channel host various video guides for statistic metrics and analysis tests using SPSS.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

- SPSS Tutorials (External website)Useful SPSS guides, videos, and quizzes from Loughborough and Coventry Universities.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

A Repeated Measures ANOVA is used to compare the means of three (or more) variables where the participants (or cases) are the same for each variable. This can occurs: (1) when participants are measured multiple times to see changes to an intervention; or (2) when participants are exposed to a different condition for each variable and we want to compare the response to each of these conditions, for example, we measured response time in a driving simulator while listening to 1-heavy metal, 2-jazz, and 3-classical.

As illustrated above the simplest Repeated Measures ANOVA involves three variables all measured on the same participants. Whatever distinguishes these three variables (time of measurement, an intervention, a different condition) is titled the "Within-Subjects Factor" in SPSS, which will determine if differences in the three means between the repeated measurements are statistically significant.

(1) The participants (samples) are the same participants in all the variables, and are taken at random from the population.

(2) The dependent variables (test variables) are continuous (interval or ratio).

(3) The independent variables (if you have these) are categorical (nominal or ordinal), and there can be two, three, four, or more groups in each independent variable.

(6) The dependent variables (test variables) have a reasonably normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(6) The dependent variables (test variables) has equal variance (sphericity) when compared to each other in every possible multi-pairwise combination. This will normally occur when the standard deviations for each variable are roughly the same.

** (Q1)** Do the four dependent variables (Time.0900-2000) have a reasonably normal distribution?

(Answer: It seems, Yes). All the histograms display a reasonably normal distribution. Albeit the chart for Time 1300 looks slightly skewed (weighted) toward lower numbers (2.5, 5.0, 7.5); but we can still say, it is within the limits of a normal distribution.

** (Q2)** Do the four dependent variables (Time.0900-2000) have equal variance (sphericity)?

(Answer: It seems, Yes). The variances (standard deviations) for all four time variables by smokers (e-vape, cigarette, and cigar) are roughly similar. Imagine in the above multi-bar chart a balloon could take the exact shape of the upper and lower limits of all the standard deviations. All the differences in the standard deviations would distort the balloon away from 100% perfect sphericity (equal variance). The large deviations (as at Time 0900 and Time 1300 for the cigarette smokers) would expand bumps in the balloon; and the small deviations (as at Time 1600 and Time 2000 for e-vape smokers) would contract dimples in the balloon. The question is how many and how great must these areas of expansion and contraction be to violate sphericity taking in account all the deviations in all the data groups? Happiness for us, there is a test within the Repeated Measures ANOVA that examines sphericity.

To start the analysis, click *Analyze > General Linear Model > Repeated Measures*

This will bring up the ** Repeated Measures Define Factor(s)** dialogue box. First give the factor a suitable title in the

The ** Repeated Measures** dialogue box will open. In the

As we are running a simple Repeated Measures ANOVA, that is, without any between-subject factors or any covariates, we can therefore accept the default setting for the ** Model** option. But click the

** Note 1:** There is the

In the ** Contrasts** options box, in the

Next click the ** Plots** button, move the one factor you created into the

As we are running a simple Repeated Measures ANOVA, that is, without any between-subject factors or any covariates, we can therefore accept the default setting for the ** Post Hoc , EM Means, and Save** options.

** Note 2:** There is the

** Note 3:** There is the

Finally, open the ** Options** button. Here you want to select from the

In the main ** Repeated Measures** dialogue box, click the

The results will appear in the SPSS Output Viewer. In the ** Within-Subject Factors** and

Here is our example there is a 4.4 point decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases between 1300 to 1600 to 2000 to not be statistically significant.

Next, there is the ** Mauchly's Test of Sphericity** table which is the key measurement for the equality of variance between the dependent variables. In an ideal world, you want the p-value (Sig) to be above the critical alpha threshold (0.05), as this would indicated the variance in all the dependent variables does not violate sphericity (equality of variance). Here in our example, this is the case with the p-value reported at 0.068. Look back at the

However, if the p-value is below the critical alpha threshold (0.05), and therefore sphericity is violated, then you will have to report one of the other results (Greenhouse-Geisser or Huynd-Feldt) which is based on the epsilon value for these corrections for sphericity (Girden (1992), Howell (2002), Field (2013)).

In the ** Tests of Within-Subjects Effects** table are the key test results to report -- F-value (test statistic), p-value (Sig), and df (degrees of freedom). Here in our example (as mentioned earlier) the Mauchly's Test of Sphericity was not violated and therefore we report the Sphericity Assumed result. If needed, the other results (Greenhouse-Geisser and Huynh-Feldt) are listed here. When reporting the degrees of freedom (df), be sure to include both the actual value and the error value.

Here the ** Tests Within-Subjects Effects** table indicates that in the Time factor (there are four dependent variables in this factor -- Time.0900, Time.1300, Time.1600, and Time.2000), the differences in the means (across some or all) of the four repeated measurements are statistically significant (F

The following ** Tests Within-Subjects Contrasts** table pairs together the four repeated measurements to indicate where the statistically significant difference is occurring in the Time factor.

Here in our example the difference in the means that is statistically significant occurs at Level 1 vs Level 2 (Time.0900 vs Time.1300) and again at Level 3 vs Level 4 (Time.1600 vs Time.2000). Both these pairs of repeated measurements have large F scores and p-values below the critical alpha threshold (0.05).

Finally, the ** Profile Plots** provide a snapshot of the test results for the means of cortisol across the four repeated measurements. The plot is good visual evidence of the test result. The chart clearly displays the statistically significant decrease (36.0%) between Time.0900 and Time.1300 (12.22 down to 7.82). Equally, there is a statistically significant increase (18.2%) between Time.1600 and Time.2000 (8.35 up to 9.87).

Happiness... you should now understand how to perform the Repeated Measures ANOVA test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

- how2stats (External videos)This YouTube channel host various video guides for statistic metrics and analysis tests using SPSS.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

- SPSS Tutorials (External website)Useful SPSS guides, videos, and quizzes from Loughborough and Coventry Universities.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

A Repeated Measures (Factorial) ANOVA is used to compare the means of three (or more) variables where the participants (or cases) are the same for each variable. This can occurs: (1) when participants are measured multiple times to see changes to an intervention; or (2) when participants are exposed to a different condition for each variable and we want to compare the response to each of these conditions; for example, we measured response time in a driving simulator while listening to 1-heavy metal, 2-jazz, and 3-classical.

As illustrated above Repeated Measures (Factorial) ANOVA involves three (or more) variables all measured on the same participants. Whatever distinguishes these variables (time of measurement, an intervention, a different condition) is the "Within-Subjects Factor" in SPSS, which will determine if differences in the means between the repeated measurements are statistically significant.

In addition, for the *Factorial* part of the test, you must have an independent grouping variable with two, three, four, or more groups. Here in our example we will have a SmokerType variable with three groups (cigars, cigarettes, and e-vape).

(1) The participants (samples) are the same participants in all the variables, and are taken at random from the population.

(2) The dependent variables (test variables) are continuous (interval or ratio).

(3) The independent variables are categorical (nominal or ordinal), and there can be two, three, four, or more groups in each independent variable.

(6) The dependent variables (test variables) have a reasonably normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(6) The dependent variables (test variables) has equal variance (sphericity) when compared to each other in every possible pairwise combination of the independent variable. This will normally occur when the standard deviations for each pairwise combination are roughly the same.

** (Q1)** Do the four dependent variables (Time.XX:XX) have a reasonably normal distribution?

(Answer: It seems, Yes). All the histograms display a reasonably normal distribution. Albeit the chart for Time 1300 looks slightly skewed (weighted) toward lower numbers (2.5, 5.0, 7.5); but we can still say, it is within the limits of a normal distribution.

** (Q2)** Do the four dependent variables (Time.XX:XX) divided into the three groups have equal variance (sphericity)?

(Answer: It seems, Yes). The variances (standard deviations) for all four Time variables divided by smokers (cigar, cigarette, and e-vape) are roughly similar. Imagine in the above multi-bar chart that a balloon could take the exact shape of the upper and lower limits of all the standard deviations. All the differences in the standard deviations would distort the balloon away from 100% perfect sphericity (equal variance). The large deviations (as at Time 0900 and Time 1300 for the cigarette smokers) would expand bumps in the balloon; and the small deviations (as at Time 1600 and Time 2000 for e-vape smokers) would contract dimples in the balloon. The question is how great must these areas of expansion and contraction be to violate sphericity taking in account all the deviations in all the data groups? Happiness for us, there is a test within the Repeated Measures (Factorial) ANOVA that examines sphericity.

To start the analysis, click *Analyze > General Linear Model > Repeated Measures*

This will bring up the ** Repeated Measures Define Factor(s)** dialogue box. First give the factor a suitable title in the

The ** Repeated Measures** dialogue box will open. In the

Next, move the independent variable into the ** Between-Subject Factor(s):** placard. As mentioned earlier, this is the

We can accept the SPSS default setting for the ** Model** option. Next, click the

Next, click the ** Plots** button, move each main effect factor (one at a time) into the

Next, open the ** Post Hoc** button. If you remember from the Two-Way ANOVA test, you can only carry out Post Hoc testing 1) on main effect variables that have three or more groups (levels), and 2) it is important to know between which groups the differences that exist are statistically significant. In our example, the SmokerType variable is the only main effect variable we have, and (happiness) it has three levels (cigars, cigarettes, and e-vape). I have moved it into the

In this model, SPSS will create an interaction variable (SmokerType * Time) to examine if there are statistically significant differences between the times we took a measurement (0900. 1300, 1600, 2000) and what type of smoker was measured (cigar, cigarette, e-vape) at each time. To examine this interaction in detail, open the ** EM Means** button.

In the ** EM Means** dialogue box, first move the interaction variable (SmokerType * Time) into the

.

The next area to open is the ** Save** button. You would have tested for the assumption of a reasonably normal distribution for the repeated measurements of the dependent variable for each of the independent groups prior to running the test. However, if you have a complex Repeated Measures (Factorial) ANOVA model, such as, 2 x 2 x 3 (which has 12 groups); or if there are only a few observations per group, which makes it difficult to check for normal distribution; or if you have a covariate in the model, then saving and testing the residuals is often the better way to test that the assumption of normal distribution is satisfied. In the Save box, select the

Finally, open the ** Options** button. Here you want to select from the

In the main ** Repeated Measures** dialogue box, click the

The results will appear in the SPSS Output Viewer. In the ** Within-Subject Factors** and

In the ** Descriptive Statistics** table there are the key statistical measurements -- the sample size (N), the mean, and the standard deviation. By reviewing these you should have some intuitive perception as to where the results might be statistically significant, that is, 1) comparing between the four repeated times (the rows labelled, Totals) and 2) comparing between each smoker type (cigar, cigarette, and e-vape) at each time measured. Here is our example there is a 4.4-point decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases from 1300 to 1600 to 2000 to be not statistically significant. Interesting to note, is that at all four measurements the cigar smoker had the highest level of cortisol.

Next, there is the ** Mauchly's Test of Sphericity** table which is the key measurement for the equality of variance between the dependent variables and the independent variable. In an ideal world, you want the p-value (Sig) to be above the critical alpha threshold (0.05), as this would indicated all the variance in the pairwise comparisons (four repeated measurements divided by three groups) does not violate sphericity (equality of variance). Here in our example, this is the case with the p-value reported at 0.077. Look back at the

However, if the p-value is below the critical alpha threshold (0.05), and therefore sphericity is violated, then you will have to report one of the other results (Greenhouse-Geisser or Huynd-Feldt). These corrections on sphericity are based on their epsilon value, as per the following diagram [Girden (1992), Howell (2002), and Field (2013)].

In the ** Tests of Within-Subjects Effects** table are the key test results to report -- F-value (test statistic), p-value (Sig), and df (degrees of freedom). Here in our example (as mentioned earlier) the Mauchly's Test of Sphericity was not violated and therefore we report the Sphericity Assumed result. If needed, the other results (Greenhouse-Geisser and Huynh-Feldt) are listed here. When reporting the degrees of freedom (df), be sure to include both the actual value and the error value.

Here the ** Tests Within-Subjects Effects** table indicates that in the Time factor (there are four dependent variables in this factor -- Time.0900, Time.1300, Time.1600, and Time.2000), the differences in the means (across some or all) of the four repeated measurements are statistically significant (F

In addition, the interaction variable (Time * SmokerType) in not statistically significant (F _{(6)} = .665 , p = .678). This is indicating that the differences in cortisol measured between a cigar, cigarette, and e-vape smoker at each of the four repeated measurements were not statistically significant.

Before we go further, we will look at two of the plots (line charts) we created that illustrate these two different results. The first plot is for the Time factor with the four different repeated measurements. Note: it is important to remember that the Time factor is not only the four repeated measurements, but it is also all smokers (cigar, cigarette, and e-vape) combined. As the line chart shows (and as mentioned earlier), there is a 4.4 unit decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases from 1300 to 1600 to 2000 to not be statistically significant.

Remember the ** Tests Within-Subjects Effects** table is not indicating exactly where the statistically significant differences are occurring between the repeated measurements, but just that somewhere across the four measurements there are differences that are statistically significant (F

The next plot we want to look at is for the interaction variable (Time * SmokerType) which was not statistically significant (F _{(6)} = .665 , p = .678). However, unlike the first plot, here the four repeated measurements in the Time factor are divided by the three smoker types (cigar, cigarette, and e-vape). Here we can see all three lines are roughly running parallel with each other, which is a strong indication of no interaction, that is, all the smokers are basically doing the same in their cortisol levels... all are dropping down at the 1300 time (#2), and then all gradually increase across the 1600 time (#3) and the 2000 time (#4).

That said there is a crossover between the red line (cigarette) and green line (e-vape) from the 1300 time to 1600 time which does indicate at these two times it did make a difference which smoker it was. However, not enough evidence to be statistically significant. Equally, across all the times the blue line (cigar) is always much higher, while the red (cigarette) and green (e-vape) lines are always nearly side-by-side to each other. This does indicate that across at the times, it does make a difference that it is a cigar smoker and not a cigarette or e-vape smoker. But again, not enough evidence to be statistically significant.

The following ** Tests of Within-Subjects Contrasts** table pairs together the four repeated measurements to indicate where the statistically significant differences are occurring in both the main effect factor (Time) and in the interaction variable (Time * SmokerType).

Here in our example the difference in the means that is statistically significant occurs at Level 1 vs Level 2 (Time.0900 vs Time.1300). Remember, this is where we had the 4.4 unit decrease (36.0%) in cortisol measured in the smokers. Well, something we did not notice earlier, is that at Level 3 vs Level 4 (Time.1600 vs Time.2000) the difference in cortisol measured is also statistically significant (F _{(1, 25)} = 11.435 , p = 0.002). Again, be sure to report the Error(Time) degrees of freedom (df) value when quoting these metrics.

In this same ** Tests of Within-Subjects Contrasts** table, if we look at the interaction variable (Time * SmokerType) in the sequential comparisons across the four repeated measurements, there are no comparisons that are statistically significant... they all have low F-scores and p-values above the critical alpha level (0.05).

However, if there were comparisons that were statistically significant, and you want to examine between which smoker type at each of the four repeated measurements, then the ** Pairwise Comparisons** table under the

In our example, out of 12 possible pairs (3 smokers X 4 repeated measurements) this occurred only twice (16.6%) at the #3 level (1600 Time) -- the cigar against cigarette smokers and the cigar against e-vape smokers. But only 2 comparisons out of 12 total comparisons are not enough mathematical evidence (from an overall perspective) to indicate the interaction is statistically significant (16.6% as Yes versus 83.4% as No).

Finally, the ** Tests of Between-Subjects Effects** table provides the results for the SmokerType independent variable. Here in these results the four repeated measurements are merged as one variable and then compared between the three smoker types (cigar, cigarette, and e-vape). As the table shows the differences between the three smoker types is not statistically significant (F

Interesting results here in the ** Tests of Between-Subjects Effects** table, as the line chart shows the decrease between the cigar and cigarette smokers is 3.02 units of cortisol which is a 26.5% decrease (on average). In setting up this model we configured the

Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the model. However, this may not always be possible with complex modelling tests that include random factors or covariates.

Therefore, it is equally important to carry out secondary analysis on the two 'model created' variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.

In the histogram charts of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data in all but the 0900 time. Well, that said, all the histograms are showing a large proportion of the data gathered on the negative side rather than centrally around the mean with the 0900 time being the biggest offender.

And you might expect this, as we are measuring cortisol levels which would lend itself to more low-range scores rather than high-end scores. The simple solution would be to apply a log transformation to the original data, and then re-run the model using the log transformed data.

In the scatter chart (which is an average of the four repeated measurements), is a comparison of Cook's Distance to the model's dependent variable (Time). In an ideal world, we want to see that for every observation in our dependent variable (N = 28) that the Cook's values all have relatively the same distance up from the baseline (X-axis) with no up-spike values. Any up-spike values are indicators of likely influential observations which should be investigated. Well, that said, there are 6 data values (21.4%) that are at (or above) the allowable limit (Cook's values that are 3X higher than the mean of the data array).

Here in our example the mean of the data array is 0.04 and the allowable limit is therefore 0.12 (3X the mean). The six up-spike values (albeit only two have breeched the allowable limit) would need to be investigated to verify the degree of influence (over-bearing weight) they may be having on the model. Up-spike values in Cook's can bring into question the numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.

For secondary analysis these charts provide evidence that the model meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.

Happiness... you should now understand how to perform the Repeated Measures (Factorial) ANOVA test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

- how2stats (External videos)This YouTube channel host various video guides for statistic metrics and analysis tests using SPSS.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

- SPSS Tutorials (External website)Useful SPSS guides, videos, and quizzes from Loughborough and Coventry Universities.

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

The ANCOVA (Analysis of Covariance) is similar to the One-way ANOVA, as it is used to detect a difference in means of three (or more) independent groups; but the difference occurs in that at the same time we are controlling for a 'secondary' variable (covariate).

In any experiment some of the unexplained variability can be due to some additional, secondary variable (covariate). The covariate may not be the targeted focus of the research hypothesis but could influence the main dependent (test) variable. If we can remove (or isolate) the effect of this secondary variable, we could demonstrate a more accurate picture of the true effect from the independent (factor) variable. This is the main goal of ANCOVA (Analysis of Covariance.

(1) The dependent variable (test variable) is continuous (interval or ratio).

(2) The independent (factor) variables are categorical (nominal or ordinal) and there should be at least three (or more groups) in each independent (factor) variable.

(3) The independent covariates (secondary variables) are continuous (interval or ratio). And there is a linear relationship between the dependent test variable and the independent covariates. This linear relationship must be for all the groups in the factor variables.

(4) The regression lines of slope expressing these linear relationships should all be reasonably parallel (homogeneity of regression slopes).

(5) The participants (samples) have no relationship between the other participants in their group or between the participants from the other groups.

(6) The participants (samples) for each group are taken at random from the population.

(7) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(8) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.

** (Q1)** Does the dependent test variable Calories 1) have a reasonably normal distribution in each of the venue groups and 2) is there homogeneity of variance between the the venue groups?

(Answer: (1) Yes and (2) Yes). All the IQR ranges (blue boxes) are reasonably central to the box plot with the median (black line) not excessively off-centred within the IRQ range. This centrality (symmetrical shape) of the data are strong indicators of normal distribution across each venue. Equally the whisker-to-whisker spread (variance) of each box plot is reasonably similar and therefore exhibiting good evidence of homogeneity of variance across each venue.

** (Q2)** Do the regression lines of slope for the three venue groups demonstrate reasonable homogeneity for their slopes?

(Answer: Yes). Although the three lines are not 100% parallel (which would be a perfect homogeneity of regression slopes), the angles of the slope of the three lines are not wildly different to each other. Therefore, we can affirm that the regression slopes display reasonable homogeneity across the venue groups.

To start the analysis, click *Analyze > General Linear Mode > Univariate*

This will bring up the ** Univariate ** dialogue box. To carry out the test, move the dependent (scale) variable into the

We will walk through several of the option buttons to bolt-on a few helpful and important measurements. For a simple ANCOVA test where you have __not__ included several random factors or covariates, then the ** Model** and

There is only one factor, and we can place one factor on the ** Horizontal Axis:** placard. Next click the

The ** Post Hoc** button is greyed-out, and this is because we have included a covariate in the model. Similar to the Post Hoc testing, we can still examine between which venue groups the differences are statistically significant. But SPSS we can only

Next open the ** EM Means** button. Move the one main factor into the

The next area to open is the ** Save** button. You would have tested for the assumption of a reasonably normal distribution for the dependent variable across each group prior to running the test. However, if you have a complex model, such as, two main factor variables; or if there are only a few observations across the groups in the main factor variables (which makes it difficult to check for normal distribution); or if you have a covariate in the model; then saving and testing the residuals is often the better way to verify that the assumption of normal distribution is satisfied. In our example, I selected

Finally, open the ** Options** button. Here you want to tick the option for

Remember, not all these extra options you will need to run an initial and straight forward ANCOVA test. But we have walked through most of them to give you the confidence to find and add what you may need for your project. When finished, click the ** OK** button to run the test.

The results will appear in the SPSS Output Viewer. As mentioned earlier in the Univariate dialogue box, you can just move the variables into the correct placards, and then click the ** OK** button to obtain a quick and simple result.

In the ** Between-Subject Factors** table there is the sample size (N) for each group in the Venue variable. And in the

- The test is controlling (compensating for) the dependent variable (Calories) by the shared variance with the covariate (SatFats), that is, how much variation in SatFats accounts for the variation in Calories.
- The test examines the dependent-to-covariate variables as a relationship (correlation test), and the test examines the dependent-to-factor variables as a difference (t-test)

Here is this example, the differences in calories for the meals eaten across the three venues is statistically significant (F = 4.916 ; p = .013) while controlling for the saturated fats measured in those meals. Equally, the relationship (correlation) in the calories for the meals eaten across the three venues and the saturated fats measured in those meals, is also statistically significant (F = 32.067 ; p = .001).

Looking at the extra options that were selected, in the ** Levene's Test of Equality of Error Variances** table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect the p-value to be above the critical alpha threshold (0.05).

This is true with our example where the p-value (Sig) is 0.188. These measurements (F score and p-value) you would refer to (and quote) in your write-up which indicates that between the three venue groups the variances (estimated standard deviations) in the dependent variable (Calories) does not violate homogeneity of variance. Remember, you may have looked at homogeneity of variance for calories between the three venue groups separately before using this test. But now, in the ANCOVA model, the calories variable is being adjusted, controlled for, by saturated fats which will change the earlier measurements.

If we look at the Estimated Marginal Marginal option, in the ** Estimates** table there are the key metrics for each group -- estimated mean, standard error, and 95% confidence interval. From these measurements you can understand in greater detail as to why the test indicated (or did not indicate) a statistically significant result for the dependent variable tested. Here in this example, the estimated means in calories between the three venues moves from 4579 (Putney) to 5843 (East Sheen) to 6014 (Tooting).

In the ** Pairwise Comparison** table there are all the possible 'pairs' of venue comparisons. You are looking for large differences and consequently a p-value below the critical alpha threshold (0.05). In our example there are only two venue pairs -- Putney to East Sheen and Putney to Tooting -- that meet this criteria. These numbers (mean difference, p-value, and 95% confidence intervals) provide evidence as to why the differences between these two venues were statistically significant, and in your write-up you would quote these numbers.

Please note that we could not preform ** Post Hoc** tests (this button was greyed-out) because in the ANCOVA model there is a covariate. Post Hoc tests are preformed on the actual means and actual standard deviations of the dependent variable (Calories) across the three groups of the fixed factor (Venue). We cannot determine these 'actual' measurements here in this model because of the control (or compensation allowed for) by the covariate (SatFats) which therefore only allows for 'estimates'.

The option ** Profile Plots** will give a snapshot of the test results of the dependent variable (Calories) across the three groups of the fixed factor (Venue) as compensated for, or controlled for, by the covariate (SatFats). Here in this example I have added an extra plot of the actual means across the three venues as a comparison.

The two plots together allow you to understand better what is happening to the dependent variable (Calories) when you control for the amount that saturated fats are accounting for the variance in calories -- Putney increased from 3973 to 4580 (roughly 600 calories), East Sheen increased from 5251 to 5844 (again 600 calories), while Tooting decreased from 7013 to 6014 (roughly 1000 calories).

It is important to note that SPSS has indicated (via footnotes in different result tables) that the estimates for dependent variable (Calories) in this ANCOVA model were determined by the covariate (SatFats) at 56.4850. This is the mean value of the SatFats variable over the entire dataset, that is, all 40 samples.

Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the ANCOVA model. However, this may not always be possible with complex modelling tests that include random factors or covariates.

Therefore, it is equally important to carry out secondary analysis on the two 'model created' variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.

In the histogram chart of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data. The histogram is showing the data values are centrally gathered around the mean (not skewed to the left or right tails and no significant outliers). As confirmation, the p-value in the Shapiro-Wilk test is above the critical alpha threshold (0.05) indicating the data are not violating a normal distribution (not statistically different from a normal distribution).

In the scatter chart, which is a comparison of Cook's Distance to the dependent variable (Calories), we want to see that for every observation in Calories (N = 40) that the Cook's values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array. Any outstanding up-spikes (Cook's values that are 3X higher than the mean of the data array) are indicators of likely influential observations which should be investigated. These up-spike Cook's values would bring into question the numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.

Here in this example, there is one up-spike value out of the 40 samples. This up-spike value would need to be investigated to verify the degree of influence (over-bearing weight) it may be having an on the model.

For secondary analysis these charts provide evidence that the ANCOVA model meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.

Happiness... you should now understand how to perform the ANCOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

The Single Linear Regression test (aka: Simple Regression) can be seen as the continuation of correlation, that is, the two test variables 1) should have a correlation with each other, and 2) the correlation should be statistically significant.

Here in regression we want to be able to predict the value of one test variable by the value of the other test variable (hence the need for a correlation between them). The variable we want to predict is called the dependent variable (or outcome variable); and the variable we are using to predict is called the independent variable (or predictor variable).

(1) The two test variables are continuous (interval or ratio).

(2) There is a linear relationship between the two test variables

(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.

(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).

(5) The two test variables have equal variance (homogeneity / homoscedasticity) when compared to each other. Homogeneity (or homoscedasticity) means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.

(6) After completing the single linear regression test, you will need to check that the residuals (errors) of the regression line have a reasonably normal distribution as confirmation that the regression model is reliable.

** (Q1)** For a single linear regression test which predictor variable could you use for the outcome variable Energy (kcal)?

(Answer: Fats). You might say (and you would be correct to say this) that all the predictor variables in the ** Correlations** table (Fats, Sugar, Protein, Fibre) have a correlation with the outcome variable which is Energy (kcal). Equally, every correlation is statistically significant, as all the p-values are below the critical 0.05 alpha level. However, the best predictor variable would be Fats, as it has the highest (.477) coefficient score.

** (Q2)** Do the two test variables have homogeneity (or homoscedasticity) between each other?

(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Fats and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.

To start the analysis,click *Analyze > Regression > Linear...*

This will bring up the Linear Regression dialogue box. To carry out the test, move the outcome variable into the ** Dependent:** placard and the predictor variable into the

After returning to the Linear Regression dialogue box, click the ** Plots...** button. Move the *ZPRED variable into the

Next, select the ** Histogram** and

Click the ** Continue** button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.

After returning to the Linear Regression dialogue box, click the ** OK** button at the bottom of the dialogue box... Wow, that were a lot of boxes!

The result will appear in the SPSS Output Viewer. There are three key tables with several important test metrics. In the ** Model Summary** table, there is

Next, there is the ** ANOVA** table, and we might consider this as the 'fitting room' for the regression model. You are in a store and you pick out some clothes you want to buy. But you go into a fitting room to see how well the clothes fit to your body shape. This is what is happening here in this

There is ** F-test** score which indicates the strength (magnitude) for how well the linear regression equation fits the two variables in the model, as opposed to the null hypothesis (i.e., there is no (null) fit with the two variables used). There is also the

Finally, there is the ** Coefficients** table which lists the regression equation coefficients, the intercept, and their statistical significance. In our example of white, brown, and seeded breads, the regression equation (Y = A + (B * X

Y (Energy (kcal)) = 221.461 + (5.865 * (Fats))

There are also the** 95% C.I.** around both Y-intercept (Constant) value and the X-predictor (Fats) value in the regression equation. This gives us a high-low measure of accuracy (confidence) as to how well our sample data values are likely to represent (or include) the actual values in the population. There are also the

Coming back to our example if we randomly took a loaf of white, brown, and seeded bread off the supermarket shelf and we read from the label there were 3, or 5, or 8 grams of fats, then we could estimate (predict) the levels of energy (kcal) that loaf is likely to have from our regression equation. And we would have a reasonably high level of confidence that the estimate would be accurate, as the 95% C.I. in the regression model are very narrow -- 213 to 230 for our Y-intercept and 4.2 to 7.5 for our X-predictor (Fats).

Finally, as required by the earlier test assumptions, we have the charts and plots to confirm if the residuals between the two variables tested in the regression model meet the assumption for normal distribution and the assumption for homoscedasticity (equal variance).

In the both the histogram and P-P plot we can see there is a very reasonable normal distribution for the residuals. The histogram is showing the data values are centrally gathered around the mean (i.e., not skewed to the left or right tails and no significant outliers). Equally, the P-P plot is showing a very tight wrapping (closeness) of the plotted data values to the line of fit.

The scatter chart of the predictor (X-axis) to outcome (Y-axis) residuals are showing reasonably good homoscedasticity, that is, there is the same variance in the plotted data value across the chart, there is very little bunching up of the data values to form tight clumps, and the top and bottom halves (split along the the red dashed line) are roughly mirror images of one of the other.

Therefore by this post-testing of the residuals, we have strong confirmation that the regression model meets the required assumptions for the test, and that the test result is therefore reliable.

Happiness... you should now understand how to perform the Single Linear Regression test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

Multiple linear regression is the next level higher to simple linear regression. It is used to predict a value of the dependent variable based on a value of two, three, or more independent variables.

As in single linear regression it attempts to model the relationship between dependent variable (outcome / target variable) and independent variables (predictor / explanatory variables) by fitting a linear equation to the observed data. Equally, multiple linear regression also allows you to determine the overall fit (variance accounted for by the predictors) of the model and the hierarchical degree of contribution of each of the predictors to the model.

(1) The dependent variable is continuous (interval or ratio).

(2) The independent variables can be either continuous (interval or ratio) or categorical (ordinal or nominal).

(3) The participants (observed samples) have no relationship between the other participants and are taken at random from the population. This independence of observations can be verified using the Durbin-Watson statistic in SPSS.

(4) There is a linear correlation between the dependent variable (outcome) and each of the independent variables (predictors).

(5) The independent variables should not be strongly correlated with each other (multicollinearity), which compromises the model's accuracy to determine the degree each independent variable accounts for the variance in the dependent variable.

(6) All the continuous (interval or ratio) variables should have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).

(7) The variables in the model need to show equal variance (homoscedasticity) when compared to each other. Homoscedasticity means you want the data (as plotted collectively between the variables) to be reasonably the same variance along the entire line of best fit.

(8) After completing the multiple linear regression test, you will need to check that the residuals (errors) of the regression line have a reasonably normal distribution as confirmation that the regression model is reliable.

** (Q1)** For a multiple linear regression test does the dependent variable (Muscle (kg)) and the independent variables (Weight(kg), Fat(%), BMI, BMR) indicate a reasonably normal distribution?

(Answer: Yes). All the p-values for the Shapiro-Wilk test are above the critical alpha threshold (0.05). In our example when the two tests (Kolmogorov-Smirnov and Shapiro-Wilk) show disagreement, as in the BMR variable, preference should be given to the Shapiro-Wilk result (Razali and Yap, 2011: 25 ; Moni and Shuaib, 2015: 18).

** (Q2)** Do the four independent variables (predictors) show signs of collinearity between each other?

(Answer: Yes). There is a robust correlation between the two predictor variables Weight (kg) and BMR (rho = 0.804). If we calculate the r^{2} for these two variables (0.804 x 0.804 = 0.646), which indicates these two variables account for (or share) variance with one another at 64.4%. In the multiple regression model you would exclude one of these as a predictor variable, as they are violating the assumption of no strong multicollinearity between the independent (predictor) variables.

To start the analysis, click *Analyze > Regression > Linear*

This will bring up the Linear Regression dialogue box. To carry out the test, move the one outcome variable into the ** Dependent:** placard and the multiple predictor variables into the

Before moving on, be sure to select ** Backward** in the drop-down menu for the

First, click the ** Statistics** button and select the confidence intervals option, which is set for the 95% level. Be advised there is an option for correlation tests and collinearity diagnostics. However, we did this separately by running a Spearman's Rho correlation on all the model variables, as mentioned in the Quick Quiz section. Click the

After returning to the Linear Regression dialogue box, click the ** Plots** button. In the

In the ** Standardize Residual Plots** section, select the

Some tutors may want you to perform some in-depth secondary analysis to verify the model is meeting the test assumptions. There are two common statistical measurements found in the ** Save** options -- Standardized (or Unstandardized) Residuals and Cook's Distance. If you select these, SPSS will create these variables in your dataset to be used for secondary analysis. Click the

After returning to the main Linear Regression dialogue box, click the ** OK** button at the bottom of the dialogue box to run the test.

The results will appear in the SPSS Output Viewer. There are several key tables with the important test metrics. In the ** Variables Entered / Removed** table, there are two regression models listed with the predictor variables that were used (or removed) in each model.

Next in the ** Model Summary** table, there is

Shared variance means to what degree the predictor variables accounts for (or can explain) the variance in the outcome variable. Here is our example, the predictor variables used in each model are accounting for 99.9% (** r^{2} **= .999) of the variance in the outcome variable (Muscle) -- astonishing -- the

Next, there is the ** ANOVA** table, and we might consider this as the 'fitting room' for the regression model. You are in a store and you pick out some clothes you want to buy. But, you go into a fitting room to see how well the clothes fit to your body shape. This is what is happening here in this

There is F-score which indicates the strength (magnitude) for how well a linear, straight-line equation fits the variables in each model, as opposed to the null hypothesis (i.e., there is no (null) fit with the variables used). There is also the p-value (Sig) for the regression model indicating if the fit (suitability) of the linear, straight-line equation is statistically significant. Be sure to report both the regression and residual degrees of freedom (df) in your write-up, for example, F _{(2, 39)} = 22915.4 , p < .001

Here in our example, (for both models) we have a high magnitude F-score (14916.5 and 22915.4), and both models are statistically significant (p < .001). There results are indicating that 1) a linear, straight-line equation has a strong, robust fit with the shape of the predicting variables to the outcome variable used in each model, and 2) that this fit (suitability) of the linear, straight-line equation is statistically significant.

Finally, there is the ** Coefficients** table, which lists the regression equation coefficients and their statistical significance for both models. This table gives the answer as to which predictors (independent variables) are the best to estimate the outcome (dependent variable).

The first key metric is the p-values (Sig.) for each predictor variable in each model presented. In our example, for Model #1 the predictor BMI has a p-value of 0.779, which is over the critical alpha level (0.05). This indicates that in the presence of the other predictors (Weight and Fat), the BMI variable is not statistically significant; and therefore, it should be excluded from the regression model.

SPSS will then work "backwards" and recalculate the regression model with the remaining predictor variables to verify the new coefficients, p-values, t-scores, 95% confidence intervals, etc. It will continue this backwards process until all predictor variables that are __not__ statistically significant are removed.

Happiness in our example, SPSS only had to make one reiteration of the regression model, which is Model #2 with the two best predictors (Weight and Fat). Therefore, the final regression equation [Y = A + (B * X_{1}) + (C * X_{2})] would become:

Y (Muscle) = 11.381 + (0.811 * Weight) + (-0.778 * Fats)

There are additional metrics that you can include in your final write-up: 1) There are the 95% Confidence Intervals around both the constant value (Y-intercept) and the predictor values (Weight and Fats). This gives us a high-low measure of accuracy (confidence) as to how well our sample data are likely to represent the actual values in the population. 2) There are the t- scores and the p-values (Sig), which indicate the strength and statistical significance of the coefficients as compared to the null hypothesis (a zero numeric value). 3) There are the Beta scores (Standardized Coefficients), which indicate in the presence of each other which predictors are stronger (accounting for more shared variance) than the other predictors.

Coming back to our example if we measured a person's weight and fat percentage, we then could estimate (predict) the amount of muscle that person is likely to have from our regression equation. And we would have a reasonably high level of confidence that the estimate would be accurate, as the 95% C.I. in the regression model are all very narrow at 10.7 to 12.1 for the Constant (Y-intercept), at 0.80 to 0.82 for the X_{1} predictor (Weight), and at -0.79 to -0.76 for the X_{2} predictor (Fats).

Finally, as required by the earlier test assumptions, we have the charts and plots to confirm if the residuals between the variables tested in the regression model meet the assumption for normal distribution and the assumption for homoscedasticity (equal variance).

In the both the histogram and P-P plot we can see there is a reasonably normal distribution for the residuals. The histogram is showing the data values are centrally gathered around the mean (i.e., not skewed to the left or right tails). Equally, the P-P plot is showing a reasonable degree of wrapping (closeness) of the data values to the line of fit. However, in an ideal world, we would prefer the data values to be tighter to the line of fit.

However, both charts are showing an issue with outliers (red arrows) which will cause an unwanted distortion to the numerical accuracy of the statistical measurements (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.

Next, the scatter chart below -- the predictors (X-axis) to outcome (Y-axis) residuals -- shows a reasonably degree of homoscedasticity. That is, there is the same variance in the plotted data values across the chart, and there is very little bunching up of the data values to form tight clumps. However, and the top and bottom halves (split along the red dashed line) are __not__ as strong of a mirror image to each other as would be preferred.

Coming back to the outliers, the final scatter chart is a comparison of Cook's Distance to the dependent variable (Muscle). We want to see that for every observation in Muscle (N = 42) that the Cook's values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array. Any outstanding up-spikes (Cook's values that are 3X higher than the mean of the data array) are indicators of likely influential observations, which should be investigated. As mentioned previously, these up-spike Cook's values would bring into question the numerical accuracy of the statistical measurements (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.

Here in this regression model, there are 5 up-spike values (12%) out of the 42 observations. These up-spike values would need to be investigated to verify the degree of influence (over-bearing weight) they may have on the model. That said, if on an exam you were 88% accurate in your answers, you might be well chuffed with that result.

These charts for secondary analysis will provide evidence that the regression model is meeting, or indicate areas of concern, as regards the key assumptions of normal distribution, homoscedasticity (equality of variance), and the degree of numerical accuracy from our sample.

Happiness... you should now understand how to perform the Multiple Linear Regression test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results

By the end you should be able to:

--Understand statistical metrics and what they measure

--Know how to run various statistical tests

--Read and interrupt the SPSS results