There are a number of different ways to calculate descriptive statistics in SPSS. We will use the Frequencies menu option. To start the analysis, click on Analyze > Descriptive Statistics > Frequencies.
This will bring up the Frequencies dialogue box. You can move the scale variable you wish to calculate the descriptive statistics into the Variable(s) box. You can drag and drop the scale variable; or first select it, and then click the arrow button in the centre of the dialogue box.
Once you have moved the scale variable into the right-hand Variable(s) box, first untick the Display frequency tables option. Next, click the Statistics button. This will bring up the Frequency Statistics dialogue box, where it is possible to choose a number of descriptive measures.
Once you have ticked the descriptive measures you want, tick the Continue button, and then click the OK button in the Frequencies dialogue box to carry out the analysis.
The table of statistics is displayed in the SPSS Output Viewer. It is fairly self-explanatory displaying all the descriptive measures that you selected.
Happiness... you should now be able to complete frequency tables in SPSS. However, if you want to explore further, here are two sites:
A frequency table will display the count and percentage for each level (group) in a categorical variable. We will use the Frequencies menu option. To start the analysis, click on Analyze > Descriptive Statistics > Frequencies.
This will bring up the Frequencies dialogue box. You can move the categorical variable (nominal or ordinal) you wish to create the frequency table into the Variable(s) box. You can drag and drop the categorial variable; or first select it, and then click the arrow button in the centre of the dialogue box.
Once you have moved the categorial variable into the right-hand Variable(s) box, be sure the Display frequency tables option is ticked. Next, click the OK button to create the table.
The table of frequency is displayed in the SPSS Output Viewer. It is fairly self-explanatory displaying the count (frequency) and the percentage for each level (group) within the categorial variable that you selected (below are two examples).
Happiness... you should now be able to complete frequency tables in SPSS. However, if you want to explore further, here are two sites:
There are a number of excellent charts in SPSS to give visual interpretation to your data. We will look at four key charts as a starting reference, but you should be able to develop more charts as a follow-on from this guide.
For all the charts in this guide, we will use the Chart Builder. To start, click on Graphs > Chart Builder.
This will open the Chart Builder dialogue box, and I have labelled 6 areas to help navigate through the Chart Builder:
After opening the Chart Builder, select Histogram from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the scale variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Finally, open the (6) properties side panel and tick the Display normal curve option. Click the OK button when finished.
After opening the Chart Builder, select Bar from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the categorial variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Gender). And then, drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Finally, open the (6) properties side panel and tick the Display error bars option, and select the type of error bars -- Confidence Intervals, or Standard Error (with 1 as the multiplier), or Standard Deviation (with 1 as the multiplier). Click the OK button when finished.
After opening the Chart Builder, select Boxplot from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the categorial variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Gender). And then, drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Click the OK button when finished.
After opening the Chart Builder, select Scatter/Dot from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the scale variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). And then drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Muscle_kg). Finally, open the (6) properties side panel and tick the Linear Fit Lines option, and select the Total as the type of line. Click the OK button when finished.
After creating any chart in SPSS it will appear in the SPSS Output Viewer. If you double-click on the chart the Chart Editor will open; and there are menus and quick tools to change the text formatting, the scaling of X-axis and Y-axis, to add data labels, to add trendlines, and much more. When finished, close the Chart Editor and the changes will update on the original chart.
Happiness... you should now be able to create charts in SPSS. However, if you want to explore further, here are two sites:
There are a number of parametric assumptions that are requirements for certain statistical tests in SPSS. We will look at four key assumptions as the starting requirement for the majority of these tests.
The variable must be a scale measurement type. You are not concerned with parametric assumptions for variables that are nominal or ordinal measurement types.
A scale variable (interval or ratio) measures quantity and where every unit of measure is at equal divisions. Equal divisions means that 4 feet is 2x longer than 2 feet and that 10 minutes is 5x longer than 2 minutes.
The normal distribution, also known as the Gaussian distribution, is a probability function that describes how the values of a variable are spread out. It is a symmetric distribution showing that data near the mean are more frequent in occurrence and the probabilities for values further away from the mean taper off equally in both directions. In a graph, normal distribution will appear as a bell-shaped curve.
You can examine a scale variable for normal distribution either with a histogram (as above) or with a Q-Q plot (not shown). You can test for normal distribution with the Kolmogorov-Smirnov test or the Shapiro-Wilk test. To start the analysis, click on Analyze > Descriptive Statistics > Explore.
This will open the Explore dialogue box. Move the scale variable to be tested into the right-hand Dependent List: box. [As a side note: you can put a categorial variable in the Factor List: box if you want to split the dependent list variable in order to test each group separately.] Next in the Display section (at the bottom), tick the Plots radio button. Finally, open the Plots options button (on the far right side).
This will open the Explore: Plots dialogue box. Tick the Normality plots with tests option. There are other options you may (or may not) want to tick. When finished click the Continue button, and then the OK button in the original Explore dialogue box.
The result will appear in the SPSS Output Viewer. The Kolmogorov-Smirnov test and the Shapiro-Wilk test result appear in the Test of Normality statistics table.
Most often they will agree. However (as is the case in our example), the Kolmogorov-Smirnov test (p = .042) shows the data failed normal distribution, but the Shapiro-Wilk test (p= .090) shows the data passed normal distribution. When they do not agree, most researchers will select the Shapiro-Wilk result. It is a more robust test, it does not have the Lilliefors correction applied, and it manages small sample sizes better.
There are also a variety of charts in the result -- Q-Q plot, Stem & Leaf, Histogram (if you ticked this option), and Boxplot. All of which provide good visual evidence of normal (or non-normal) distribution, as confirmation and a visual inspection into the normality test result.
Another important property within parametric assumptions involves outliers -- you do not want to have many of these. There are several ways to detect these little monsters in the data with charts, such as, Stem & Leaf, Histogram, Q-Q Plots, and Boxplots. Below is a Q-Q plot and a Boxplot of the Muscle_kg data from the earlier Explore result (the outliers are underlined in green).
There are three outliers in the data, and one is an extreme outlier (marked as an asterisk symbol in the boxplot). With three of these little monsters in the data, you can understand better why the two normality tests are disagreeing. And you can also understand why I call them 'monsters'. In this case, as already stated, you would accept the Shapiro-Wilk result and consider the data as having normal distribution.
If you scroll back up to the Q-Q plot, and imagine the three outliers not there, you can see that the rest of the data (40 out of 43 values which is 93%) has a fairly good distribution around the line of fit. Again this may help to understand why these two tests of normality are contradicting each other. It seems the Kolmogorov-Smirnov test is more influenced by the outliers (thus failing normal distribution), while the Shapiro-Wilk test gives more weight to the 93% majority (thus passing normal distribution).
The final property we want to add into the mix of parametric assumptions is homogeneity of variance. This means a scale variable should have fairly equal variance when split into the respective levels within a categorial variable. For example, the male's data for Muscle_kg should have a similar variance to the female's data for Muscle_kg. I have used a Bar chart (see below) with standard deviation error bars as a good visual check for homogeneity of variance.
You can see the two error bars (black whiskers) are not exactly equal. And therefore initially you might think the two groups (male and female) do not have homogeneity of variance. However, you can allow for a certain amount of discrepancy from exact equality and still not violate homogeneity of variance.
The error bar in the female data is about 40% longer than the error bar in the male data. This small percentage of difference is allowable; and in fact it is not until the difference exceeds 200% (double) or even 300% (triple) that the property of homogeneity of variance is violated... amazing!
Homogeneity of variance must also be checked when testing two scale variables against each other. In this case a Scatter/Dot chart can be used as a good visual check.
We can see that throughout the Weight variable (70kg - 75kg - 80kg - 85kg - 90kg) the Muscle_kg variable is spread between a fairly parallel pathway. Well except at the 100kg, however there are only two values out that far which is a small percentage (4.5%) of all the values. Therefore is is fairly reasonable to say these two variables have homogeneity of variance.
A quick review of our top four parametric assumptions:
Happiness... you should now be able to test for parametric assumptions in SPSS. However, if you want to explore further, here are two sites:
The Student T-test (aka: Independent Samples T-test) compares the means of two independent groups to determine if there is reasonable evidence (within the sample) that the population means for these two groups are statistically significant in their difference.
(1) The dependent variable (test variable) is continuous (interval or ratio).
(2) The independent variable (factor variable) should be two independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other group.
(3) The samples (participants) for each group are taken at random from the population.
(4) The sample size for both groups have roughly the same number of participants.
(5) The dependent variable (test variable) has a reasonably normal distribution for both groups. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(6) The dependent variable (test variable) has equal variance (homogeneity) between both groups. Homogeneity means you want the standard deviation measurements for the two groups to be reasonable the same to each other.
(Q1) Does the dependent variable (Weight) have a reasonably normal distribution for both groups (male and female)?
(Answer: Yes). The data for both groups is certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve... happiness.
(Q2) Does the dependent variable (Weight) have a equal variance between both groups (male and female)?
(Answer: Yes). The variance (whisker-to-whisker) for both groups, although not exactly equal, is certainly not excessively different to each other. But wait... no, no, no... there are a few outliers, and this could violate one of the assumptions of this test. One interesting point regarding these outliers is that none are measured as extreme. In SPSS extreme outliers are marked with the asterisk (*) symbol in a Boxplot chart.
Here is where SPSS will not help you. You as the researcher must look at the SPSS results and make some relevant interpretation. In this example there are 3 outliers, and the Student T-test would prefer 0 outliers. You the researcher will need to make a decision and support that decision with evidence.
In the write-up for this test you could indicate that you elected to run the Student T-test because the data met the assumptions of normal distribution and homogeneity of variance across the two groups. Equally, there are similar sample sizes in the two groups with 21 female to 22 males (include the histogram charts, a gender frequency table, and a Kolmogorov-Smirnov or Shapiro-Wilk test as evidence). However, not all the assumptions for this test were met perfectly. There were 3 outliers which 1) is only 6.9% of the data, and 2) none of the 3 outliers were measured as extreme (including the Boxplot chart as evidence). In an ideal world this test prefers 0 outliers, but the few outliers that exist are certainly not excessive in number or significant in distance from the median.
To start the analysis,click Analyze > Compare Means > Independent-Samples T Test
This will bring up the Independent-Samples T Test dialogue box. To carry out the test, move the dependent (scale) variable into the Test Variable(s): placard. Next move the independent (nominal or ordinal) variable into the Grouping Variable: placard. Click on the Define Groups... button, and enter the correct numeric values that represent each group. Click the Continue button, and then click the OK button at the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. In the Group Statistics table there are the key group metrics -- sample size (N), mean, and standard deviation. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 3 kg difference in weight between the females and the males -- the females (83.42 kg) weigh only 3.7% more than the males (80.40 kg). You would not expect this small difference to be statistically significant. Equally there is a reasonably similar standard deviation measurement for the two gender groups, and therefore, you would expect that the two groups do not violate homogeneity of variance.
In the Independent Samples Test table there are the key test metrics -- equality of variance and then all the t-test measurements. In this example, first we see (as we estimated earlier from the two standard deviations) the two groups do not violate homogeneity as the p-value (0.147) in the Levene's Test for Equality of Variances is above the critical 0.05 alpha level. Therefore, in the second part of this table, we read (and report) all the metrics from the top row which is labeled, Equal variances assumed.
These measurements in this second part of the table give you the t-score, the degrees of freedom, the p-value, the mean difference, and the 95% C.I. of the difference. Here in this example the t-score (1.407) is relatively small, and we were expecting that as there is only a 3 kg difference in weight. Equally, the p-value (0.167) is above the critical 0.05 alpha level indicating this difference between the females weight and the males weight in not statistically significant, which also we were expecting as the 3 kg difference is only a 3.7% magnitude of change.
Finally, the 95% C.I. of the difference provides a high / low range as to where the difference (3 kg) between these two gender groups might actually exist in the population. Here the male's weight could actually be 7.3 kg lower than the female's weight, or the male's weight could actual overtake and exceed the female's weight by 1.3 kg. This is a range of 8 kg from the high to low which indeed is very narrow -- excellent. But keep in mind this range moves from a negative scale, and crosses the 0 threshold, and then moves into a positive scale. So, at some point the difference could be 0 kg, that is, the females and males weigh the same -- a nil difference.
Happiness... you should now understand how to perform the Student T-test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Mann-Whitney U test compares the medians or mean ranks of two independent groups and is commonly used when the dependent variable is either categorial (ordinal) or continuous (interval or ratio), and does not meet the assumptions for the Independent Samples T-test (aka:Student T-test).
(1) The dependent variable (test variable) can be categorial (ordinal) or continuous (interval or ratio) in its measurement type.
(2) The independent variable should be two independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other group
(3) The samples (participants) for each group are taken at random from the population.
(4) The sample size can be disproportionate or unbalanced in the number of participants in each group.
(5) The dependent variable (test variable) for one or both groups can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.
(6) The dependent variable (test variable) for one or both groups may (or may not) have a similar shape (homogeneity) in its variance. It is extremely unlikely that the variance of the two groups will be identical, and therefore, the Mann-Whitney U test will test between the mean ranks of the dependent variable for both groups.
(Q1) Would you use the Mann-Whitney U test for the following data on users and non-users of a weight training supplement ?
(Answer: Yes). The participant count (frequency) for the two groups is certainly not in a balanced proportion with User at 52 (73%) and Non-user at 19 (27%). Also the dependent variable (Muscle_kg) violates normal distribution for the smaller Non-user group as indicated by both the Kolmogorov-Smirnov (p = .006) and the Shapiro-Wilk (p = .020) tests of normality, which are below the critical 0.05 alpha level.
(Q2) Does the Boxplot give support for using the Mann-Whitney U test to compare between the users and non-users of the weight training supplement ?
(Answer: Yes). The total variance (whisker-to-whisker and including outliers) between the two groups, although not exactly equal, is certainly not wildly different to each other. And you could argue the two groups have homogeneity of variance. However, there are a several outliers in the non-user group; and one is an extreme outlier, as marked with the asterisk (*) symbol. This number and condition of outliers in the non-user group would give support for choosing the Mann-Whitney U test to analyse the data.
To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples...
This will bring up the Two-Independent-Samples Tests dialogue box. To carry out the test, move the dependent variable (scale or ordinal) into the Test Variable List: placard. Next move the independent variable (nominal or ordinal) into the Grouping Variable: placard. Click on the Define Groups... button, and enter the correct numeric values that represent each group. Click the Continue button. Verify that the Mann-Whitney U test is selected in the Test Type section. Finally, click the OK button on the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N) and mean rank. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 3.8 point difference between the mean rank of the non-users (19.84) and the mean rank of the users (23.71) as regards to their muscle mass. This is a differences of 4 places in rank, and you would not expect this small difference to be statistically significant.
In the Test Statistics table there are the key test metrics -- the Mann-Whitney U score, the p-value (Asymp. Sig.), and some researchers will also report the Z score. In this example, we see (as we estimated earlier from the two mean ranks) the difference between the two groups is not statistically significant as the p-value (0.316) is above the critical 0.05 alpha level. In your report write-up you should also include the Mann-Whitney U score as further support that indicates the difference is not statistically significant.
The Mann-Whitney U tests converts the raw data values for the dependent variable into a rank -- 1st, 2nd, 3rd, 4th, and so forth. Then it adds all the converted ranks for all the participant in their respective group to achieve that group's total "sum of ranks". If you divide the sum of ranks by the number of participants, you will get the mean rank (or what is the typical participant's rank). Remember in statistics we tend to determine 1) what is a typical member in my sample and 2) what is the variance around that typical member.
The Mann-Whitney U test is much simpler to understand and appreciate, as it is not concerned with normal distribution of the dependent variable, and it is not concerned with homogeneity of variance between the two groups.
Happiness... you should now understand how to perform the Mann-Whitney U test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
In this tutorial, we will look at how to conduct the One-Way ANOVA test in SPSS (aka: One Factor ANOVA or One-Way Analysis of Variance), and how to interpret the results of the test. The One-Way ANOVA test compares the means of three or more independent groups to determine if there is reasonable evidence that the population means for these three or more groups have a statistically significant difference.
(1) The dependent variable (test variable) is continuous (interval or ratio).
(2) The independent variable (factor variable) is categorical (nominal or ordinal) and should be three or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other groups.
(3) The samples (participants) for each group are taken at random from the population.
(4) The sample size for all the groups have roughly the same number of participants.
(5) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(6) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.
(Q1) Does the dependent variable (Fat %) have a reasonably normal distribution across the three groups?
(Answer: Yes). The data for the three groups is certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve... happiness.
(Q2) Does the dependent variable (Fat %) have a equal variance between the three groups?
(Answer: Yes). The variance (whisker-to-whisker) for the three groups, although not exactly equal, is certainly not excessively different to each other. But wait... no, no, no... there are a few outliers, and this could violate one of the assumptions of this test. One interesting point regarding these outliers is that none are measured as extreme. In SPSS extreme outliers are marked with the asterisk (*) symbol in a Boxplot chart.
Here is where SPSS will not help you. You as the researcher must look at the SPSS results and make some relevant interpretation. In this example there are two outliers, and the One-Way ANOVA test would prefer zero outliers. You the researcher will need to make a decision and support that decision with evidence.
In the write-up for this test you could indicate that you elected to run the ANOVA test because the data met the assumptions of normal distribution and homogeneity of variance across the three groups. Equally, there are similar sample sizes from 13 to 15 participants in each group (include the histogram charts, a participants per sessions Frequency table, and a Kolmogorov-Smirnov or Shapiro-Wilk test as evidence). However, not all the assumptions for this test were met perfectly. There were two outliers which 1) is only 4.6% of the data, and 2) neither of the two outliers were measured as extreme (including the Boxplots as evidence). In an ideal world this test prefers 0 outliers, but the few outliers that exist are certainly not excessive in number nor significant in distance from the median.
To start the analysis,click Analyze > Compare Means > One-Way ANOVA
This will bring up the One-Way ANOVA dialogue box. To carry out the test, move the dependent (scale) variable into the Dependent List: placard. Next move the independent (nominal or ordinal) variable into the Factor: placard. There is an Options... button, where you can select descriptive statistics for the three groups and a homogeneity of variance test, and there are other extra statistics and a mean plot.
Note: If the dependent variable violates the homogeneity of variance test (a p-value above the critical 0.05 alpha level), then researchers will re-run the One-Way ANOVA test and in the Options... section they will select the Welch statistics. This variant of the One-Way ANOVA test is not concerned with homogeneity of variance between the different groups.
After selecting any extra options that you want, click the Continue button, and then click the OK button at the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. In the Descriptives table there are the key group metrics -- sample size (N), mean, standard deviation, and 95% C.I. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 1.5 and 1.9 point difference in fat (%) between the three groups with 3 times a week at 14.8%, and 4 times a week at 16.3%, and 5 times a week at 14.4%. You would expect this small difference to not be statistically significant. Equally there are almost identical standard deviation measurements for the three groups; and therefore, you would expect that the groups do not violate homogeneity of variance.
In the Test of Homogeneity of Variances table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect all four measurements to agree. And this is true with our example with all the p-values virtually the same between 0.95 and 0.96. The measurement you would refer to (and quote) in your write-up would be the top row titled, Based on Mean. Here in our example the p-value is 0.955 (well above the critical 0.05 alpha level) which indicates that between the three groups the dependent variable does not violate homogeneity of variance.
Finally, in the ANOVA table you have the the degrees of freedom, the F-score, and the p-value. Here in this example the F-score (1.418) is relative small, and we were expecting that as there is only a 1.5 and 1.9 point difference in fat (%). Equally, the p-value (0.254) is above the critical 0.05 alpha level indicating this difference between the three groups in not statistically significant, which we were expecting based on the results in the Descriptives table as mentioned earlier.
Secondary post hoc testing can be completed if the original One-Way ANOVA result indicated that the differences between the groups were statistically significant . You would need to re-run the test, and in the One-Way ANOVA dialogue box there is a Post Hoc button you would click.
This will open the One-Way ANOVA: Post Hoc Multiple Comparison dialogue box, and you can select one of the Post Hoc tests recommended by your tutor. The three most common seem to be: LSD, Bonferroni, and Tukey; and in my example I have selected the LSD test.
In the Multiple Comparison table you are looking for any comparison with a large mean difference which should result in a corresponding p-value below the critical 0.05 alpha level. In my example there are two comparisons like this, which are the 3 times / week to the 5 times / week, and the 4 times / week to the 5 times / week. In your write-up you would list these two comparisons with the evidence of the mean difference and p-value respectively.
Happiness... you should now understand how to perform the One-Way ANOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Kruskal-Wallis test compares the medians or mean ranks of three or more independent groups and is commonly used when the dependent variable is either categorial (ordinal) or continuous (interval or ratio), and does not meet the assumptions for the One-Way ANOVA test.
(1) The dependent variable (test variable) can be categorial (ordinal) or continuous (interval or ratio) in its measure type.
(2) The independent variable should be three or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other groups.
(3) The sample size can be disproportionate or unbalanced in the number of participants in each group.
(4) The dependent variable (test variable) for one or all the groups can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.
(5) The dependent variable (test variable) for one or all the groups may (or may not) have a similar shape (homogeneity) in its variance. It is extremely unlikely that the variance for the groups will be identical, and therefore, the Kruskal-Wallis test will test between the mean ranks of the dependent variable for the all the groups.
(Q1) Do the three bread types (White, Brown, Seeded) have balanced or equal proportions?
(Answer: No) The Brown and Seeded bread types are fairly balanced (equal) in their sample size at 24 (16.1%) and 28 (18.8%) respectively. However the White bread type has a sample size that is more than 3 times larger at 97 (65.1%). The Kruskal-Wallis test is more suited to manage groups with disproportionate (unequal) sample sizes.
(Q2) Do the three bread types (White, Brown, Seeded) have a normal distribution?
(Answer: No) The Seeded bread appears the most normal with the data values located around the mean (top of the bell curve). However, the Brown bread is starting to show a higher distribution of data values on the left tail and some outliers above 2.0 grams. And the White bread has increased this same skewed distribution (overweight on the left tail and outliers on the right tail) to a much higher degree with several extreme outliers at 4.0 to 8.0 grams. The Kruskal-Wallis test is more suited to manage groups where their test data are not normally distributed and/or have a high number of outliers.
To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples
This will bring up the Tests for Several Independent Samples dialogue box. To carry out the test, move the dependent (scale or ordinal) variable into the Test Variable List: placard. Next move the independent (nominal or ordinal) variable into the Grouping Variable: placard. Click on the Define Range... button, and enter the correct numeric values that represent all the groups. Click the Continue button. Verify that the Kruskal-Wallis test is selected in Test Type section. Finally, click the OK button at the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N) and mean rank. From these measurements you should develop an intuitive perspective as to the whether the Kruskal-Wallis test will indicate a statistically significant difference or not. Here is this example, there is approximately a 9.1 point difference between the mean rank of the White bread (65.12) and the mean rank of the Brown bread (74.29) as regards to their saturated fat. You would not expect this moderate (13.9%) difference to be statistically significant.
However, there is approximately a 44.7 point difference between the mean rank of the White bread (65.12) and the mean rank of the Seeded bread (109.84). You would expect this large (68.6%) difference to be statistically significant. Equally, there is approximately a 35.5 point difference between the mean rank of the Brown bread (74.29) and the mean rank of the Seeded bread (109.84). You would expect this large (47.8%) difference to be statistically significant
In the Test Statistics table there are the key test metrics -- the Kruskal-Wallis H score, the degrees of freedom (df), and the p-value (Asymp. Sig.). In this example, we see (as we estimated earlier from the mean ranks between the bread types) the difference between the three groups is statistically significant as the p-value (0.000) is below the critical 0.05 alpha level. In your report write-up you should also include the Kruskal-Wallis H score as further support that indicates the difference is statistically significant.
The Kruskal-Wallis test converts the raw data values for the dependent variable into a rank -- 1st, 2nd, 3rd, 4th, and so forth. Then it adds all the converted ranks for all the participant in their respective group to achieve that group's "sum of ranks". If you divide the sum of ranks by the number of participants, you will get the mean rank (or what is a typical participant's rank). Remember, in statistics we tend to determine 1) what is a typical member in my sample and 2) what is the variance around that typical member.
The Kruskal-Wallis test is much simpler to understand and appreciate, as it is not concerned with normal distribution of the dependent variable, and it is not concerned with homogeneity of variance between the three groups.
Sadly, what it does not report is exactly between which groups the statistical difference exists. In our example, is the statistical difference between the White and Brown breads, or between the White and Seeded breads, or between the Brown and Seeded breads? To find exactly where the statistical difference exists between our three bread groups, you would need to run three separate Mann-Whitney U tests on each of three pair-wise comparisons listed above.
Happiness... you should now understand how to perform the Kruskal-Wallis test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Paired Samples T-test (aka: Repeated Measures T-test) compares the means of two measurements taken from the same participant or sample object. It is commonly used for a measurement at two different times (e.g., pre-test and post-test score with an intervention administered between the two scores), or a measurement taken under two different conditions (e.g., a test under a control condition and an experiment condition).
The Paired Samples T-test determines if there is evidence that the mean difference between the paired measurements is significantly different from a zero difference.
(1) The dependent variable (test variable) is continuous (interval or ratio).
(2) The independent variable consist of two related groups. Related groups means the participants (or sample objects) for both measurements of the dependent variable are the same participants.
(3) The participants (or sample objects) are taken at random from the population.
(4) The dependent variables (test variables) have a reasonable normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(Note) When testing the assumptions related to normal distribution and outliers, you must create and use a new variable that represents the difference between the two paired measurements. Do not test the original two paired measurements themselves.
(Q1) You want to examine the alertness of both male and female students at 09:00 am lectures and at 1:00 pm (after lunch) lectures. Do you use the Independent Samples T-test or the Paired Samples T-test?
(Answer: Independent Samples T-test). Although the experiment design sounds like a before and after intervention, it would be highly unlikely that at the two different times (09:00 am and 1:00 pm) the students in the lectures would be the same identical students.
(Q2) You have surveyed students on what they eat (over one week) for breakfast and lunch. A diet-plan app has calculated the energy level of the food eaten for each participant. You used SPSS to create a new variable which is the difference between the breakfast meal and the lunch meal, and you created a histogram to check for normal distribution and outliers. From the histogram below, would you use the Independent Samples T-test or the Paired Samples T-test?
(Answer: Paired Samples T-test). Here in this experiment design there are the same students surveyed for their breakfast meal and for their lunch meal. Equally, the histogram shows a very reasonable normal distribution (no extreme skewness on the left or right tails) and with no significant outliers... happiness!
To start the analysis,click Analyze > Compare Means > Paired Samples T Test
This will bring up the Paired-Samples T Test dialogue box. To carry out the test, move the two dependent (scale) variables into the Paired Variables: placard. And then click the OK button at the bottom of dialogue box.
The results will appear in the SPSS Output Viewer. In the Paired Samples Statistics table there are the key group metrics -- sample size (N), mean, and standard deviation. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 505 calorie difference (on average) in the lunch meal (880 calories) and the dinner meal (1385 calories) which is a 57.4% increase in calories from lunch to dinner. You would expect this sizeable difference to be statistically significant.
In the Paired Samples Test table there are the key test metrics -- 95% confidence intervals, the t-score, the degrees of freedom (df), and the p-value. In this example, we can see (as we estimated earlier from the two means) the t-score (34.242) is extremely large, and we were expecting this as there was a 505 calorie difference between the two meals. Equally, the p-value (0.000) is well below the critical 0.05 alpha level indicating the difference in calories between the lunch meal and dinner meal is statistically significant, which also we were expecting as the 505 calorie difference is a 57.4% magnitude of change.
Finally, the 95% C.I. of the difference provides a high / low range of accuracy as to where this difference (505 calories) between the two meals might actually exist in the population. Here the calorie difference could actually be as high as 534 calories or as low as 474 calories. This is only a 60 calorie range from high to low providing strong confidence that the mean difference in our sample accurately represents what is likely to be the mean difference in the population.
Happiness... you should now understand how to perform the Paired Samples T-test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Wilcoxon Sign test (aka: Wilcoxon Signed-Rank) compares the means of two measurements taken from the same participant or sample object. It is commonly used for a measurement at two different times (e.g., pre-test and post-test score with an intervention administered between the two scores), or a measurement taken under two different conditions (e.g., a test under a control condition and an experiment condition).
The Wilcoxon Sign test determines if there is evidence that the mean difference between the paired measurements is significantly different from a zero difference.
(1) The dependent variable (test variable) is continuous (interval or ratio) or it can be categorical (ordinal).
(2) The independent variable consist of two related groups. Related groups means the participants (or sample objects) for both measurements of the dependent variable are the same participants.
(3) The participants (or sample objects) are taken at random from the population.
(4) The dependent variables (test variables) do not need to have a normal distribution. This test does not require normality or homoscedasticity (the data having the same scatter or spread) within the dependent variables. Non-normal distribution means the data can be skewed to the left or right tails, and the data can have a significant number outliers.
(Q1) You want to examine caffeine markers in a group of students. One week the students will receive a normal cup of coffee (control group), and the next week the same students will receive a cup of coffee with an additive (experiment group). The research is set up as a double blind, so that neither the students nor the researches know which cup of coffee is normal or with the additive. Could you use the Wilcoxon Sign test to analyse the data?
(Answer: Yes) The experiment design is set-up as two related (dependent) groups tested twice, once as the control group and once as the experiment group.
(Q2) You have collected the data for the two coffee groups (control and experiment). You used SPSS to create a boxplot to visualise the data side-by-side. From the boxplot, as an intuitive perspective, would the Wilcoxon Sign test indicate a statistically significant difference?
(Answer: Yes) Here in this boxplot there is very little overlap between the two interquartile ranges (IQR). Remember, the IQRs represent 50% the the data values. Therefore for almost 50% of experiment group (or greater if we include the whiskers), the caffeine markers are different from when the same person was in the control group.
To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples
This will bring up the Two-Related-Samples Tests dialogue box. To carry out the test, move the two dependent (scale or ordinal) variables into the Test Pairs: placard. And then click the OK button at the bottom of dialogue box.
The results will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N), mean rank, and sum of ranks. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there are 0 negative ranks to 50 positive ranks -- take note there are only 50 students in the sample. So, out of 50 students all of them had a positive rank, not a single student had a negative rank. If you had 50 darts and threw that at a dart board (a random action), would they all land on the top half and not a single dart would land on bottom half? Never! Something is happening here that is violating the laws of random probability and equality. If there is no bias, trickery, or tom-foolery, if everything is equal with the students and the coffee, then you would expect 25 negative ranks and 25 positive ranks. As the result is extremely skewed with everyone in a positive rank, we would expect the test to indicate the difference is statistically significant.
As the footnotes under the Ranks table indicate a negative rank is where the experiment group's caffeine marker (BPM) is lower than the same person's caffeine marker when they were in the control group. In other words, their second test measurement (experiment) was lower than their first test measurement (control). And, of course, a positive rank is just the opposite. As mentioned earlier, in the mathematics of random probability we are expecting a 25 to 25 ratio, that is, half the students to have a lower second score and half the students to have a higher second score. The further we move away from this equal and random ratio, the more likely the result will be statistically significant.
In the Test Statistics table there are the key test metrics -- the test score (Z) and the p-value (Asymp. Sig). In this example, we can see (as we estimated earlier from the negative and positive ranks) the z-score (6.169) is extremely large, and we were expecting this as there was a 0 to 50 negative to positive ratio in the ranks. Equally, the p-value (0.000) is well below the critical 0.05 alpha level indicating the difference in the caffeine markers for the control to the experiment is statistically significant, which also we were expecting as the 0 to 50 ratio in ranks is a 100% magnitude of change.
Happiness... you should now understand how to perform the Wilcoxon Sign Test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
In this guide, we will look at how to conduct the Chi-Square Test for Independence (aka: Chi-Square Test of Association or Pearson's Chi-Square Test), and how to interpret the results of the test. The Chi-Square Test for Independence determines whether there is a relationship (association) between categorical variables. Equally, this test only determines an association between categorical variables, and will not provide any indications about causation.
(1) Only categorical variables can be analysed.
(2) Each categorical variable (nominal or ordinal) and should have two or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants in the other groups in the variable.
(3) The samples (participants) for each variable are taken at random from the population.
(4) The categorical variables are not paired samples (pre-test/post-test observations).
(5) There should be relatively large sample sizes for each group in all the variables (e.g. the expected frequencies should be at least 5 for the majority (80%) of the groups for all the variables).
(Q1) You want to test for an association between which gender is likely to indicate better lighting will improve safety. Can you use the two variable listed below?
(Answer: Yes). Both variables are categorical in their type. Equally there are adequate sample sizes for all the groups.
(Q2) Does the clustered bar chart for the two test variables indicate there is likely to be a statistically significant association?
(Answer: Yes). We can see the male are more associated with the No response, while the females are more associated with the Yes response.
To start the analysis, click Analyze > Descriptive Statistics > Crosstabs
This will bring up the Crosstabs dialogue box. To perform the analysis, move one categorical variable into the Row(s) placard and the other categorical variable into the Column(s) placard. Next, click on the Statistics option button.
In the Crosstabs: Statistics box tick the Chi-Square option, and then click the Continue button to return to the main dialogue box. After returning to the main Crosstabs dialogue box, click the Cells option button.
In the Crosstabs: Cell Display box tick the Observed and Expected options, and then click the Continue button to return to the main dialogue box.
Next, at the bottom left corner of the main dialogue box, tick the Display clustered bar charts option. Finally, click the OK button at the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. The Crosstabulation table provides the observed count and expected count for the groups in relation to each categorical variable. Similar to the clustered bar chart discussed earlier, these observed and expected counts should give you an intuitive perspective as to whether an association is likely (or not likely) to exist.
In our example, for the males we observed a 14 to 7 (No/Yes) split, and we should have achieved a 11 to 11 split; we are roughly 3 (No) and 4 (Yes) participants out of balance from our expected split. For the females we observed a 5 to 12 (No/Yes) split, and we should have achieved a 9 to 9 split; we are roughly 4 (No) and 3 (yes) participants out of balance from our expected split.
The Chi-Square Tests table provides the test metrics -- Pearson Chi-Square statistic and the p-value. Here in our example we have a reasonable strong Pearson test statistics (5.216) and a p-value (0.022) which is below the critical alpha level, and therefore indicating a statistically significant result.
In your write-up you should quote both these metrics as evidence that there is a statistically significant association between males who are more likely to answer No to the question if better lighting would improve safety, while females are more likely to answer Yes to this same question.
Happiness... you should now understand how to perform the Chi-Square Test for Independence in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Pearson Correlation test (aka: Pearson Product-Movement) measures the strength and direction, which is the r coefficient in the test, of a linear relationship between two continuous variables. The Pearson's correlation attempts to draw a line of best fit through the data of the two variables, and the r coefficient indicates how far away these data points are from this line of best fit. (e.g., if the data values are all compacted on and squeezed around the line, the r coefficient is high. And conversely, if the data values are spread out and dispersed away from the line, the r coefficient is low).
(1) The two test variables are continuous (interval or ratio).
(2) There is a linear relationship between the two test variables
(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.
(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).
(5) The two test variables have equal variance (homogeneity) when compared to each other. Homogeneity means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.
(Q1) Do the two test variables have a reasonably normal distribution?
(Answer: Yes). The data for both variables are certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve. That said, the distribution for Protein is starting to spread toward the two tails which may be questionable, and a Kolmogorov-Smirnov or Shapiro-Wilk test would be advisable to run to confirm any suspicions.
(Q2) Do the two test variables have homogeneity between each other?
(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Protein and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.
To start the analysis,click Analyze > Correlate ;> Bivariate
This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Pearson option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.
The results will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Pearson's Coefficient (which is the r score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the r score (.373) is low-medium in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this low-medium correlation is statistically significant.
I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the r score can vary from a -.999 to .000 to +.999. The further the r score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.
These measurements indicate that as the protein levels in the 149 breads tested increases so also energy (kcal) increases as the r score is a positive number. However, the strength (or magnitude) of this correlation is low-medium (r = .373). Finally, this mild correlation is statistically significant (p <.001) which implies there is good evidence from the sample data that this correlation between protein levels in breads and energy (kcal) is very likely to exist for white, brown, and seeded breads in general.
Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.
Happiness... you should now understand how to perform the Pearson's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Spearman's Correlation test (aka: Spearman Rank) measures the strength and direction, which is the rho coefficient (r_{s}) in the test, of a monotonic relationship between two continuous or ordinal variables.
The Spearman's correlation is the nonparametric version of the Pearson's correlation, that is, the Spearman's correlation should be used when the parametric assumptions (normal distribution and homogeneity of variance) for the Pearson's correlation are violated.
(1) The two test variables are continuous (interval or ratio) or they can be categorical (ordinal).
(2) There is a monotonic relationship between the two test variables.
(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.
(4) The two test variables (for one or both) can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.
(5) The two test variables (for one or both) can have unequal variance (heterogeneity) when compared to each other. Heterogeneity means the variance in the data (as plotted between the two variables) will not be the same along the entire line of best fit.
(Q1) Do the two test variables have a reasonably normal distribution?
(Answer: No). The data for the Energy (kcal) variable certainly has a normal distribution with the data values (the blue bins) centrally gathered in and around the top of the bell curve. However, the data for the Fats variable in heavily skewed and has some extreme outliers beyond 6.0 grams. As both variables do not meet the assumption of normal distribution, the Spearman's Correlation test should be used.
(Q2) Is the relationship between the two test variables linear or monotonic?
(Answer: monotonic). The movement (rate of change) of the plotted data values is always progressing in a positive direction. However, it is an exponential rate of change from 0.0 to 2.0 grams, and then from 2.0 to 8.0 grams the rate of change becomes relatively flat. As this relationship is monotonic, a Spearman's Correlation test should be used.
To start the analysis,click Analyze > Correlate > Bivariate
This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Spearman option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.
The results will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Spearman's rho (which is the r_{s} score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the r_{s} score (.543) is high-moderate in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this high-moderate correlation is statistically significant.
I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the r_{s} score can vary from a -.999 to .000 to +.999. The further the r_{s} score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.
These measurements indicate that as the fat levels in the 149 breads tested increases so also energy (kcal) increases as the r_{s} score is a positive number. Equally, the strength (or magnitude) of this correlation is high-moderate (r_{s} = .543). Finally, this high-moderate correlation is statistically significant (p <.001) which implies there is good evidence that this correlation between fat and energy is very likely to exist for white, brown, and seeded breads in general.
Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.
Happiness... you should now understand how to perform the Spearman's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Two-Way ANOVA compares 1) the mean differences in the test variable (dependent variable) between groups (independent variable) which is the first "main effects" factor; and 2) the same mean differences between these same groups, but they are sub-divided by a second independent variable which is the second "main effects" factor. The primary purpose of a Two-Way ANOVA is to understand if there is an interaction between the two main effects factors (the two independent variables) on the test variable (dependent variable).
(1) The dependent variable (test variable) is continuous (interval or ratio).
(2) The two independent variables (factor variables) are categorical (nominal or ordinal) and there can be two, three, or more groups in each independent variable.
(3) The participants (samples) have no relationship between the other participants in their group or between the participants from the other groups.
(4) The participants (samples) for each group are taken at random from the population.
(5) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(6) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.
(Q1) Does the dependent variable (Calories) have a reasonably normal distribution across the three venue groups?
(Answer: Yes). All the p-values for the three venue groups in both the Kolmogorov-Smirnov and Shapiro-Wilks tests are above the critical .05 alpha level. Therefore, we can say that as these tests for normal distribution have not be violated, then data for each venue can be considered as having a normal distribution.
(Q2) Does the dependent variable (Calories) have a equal variance between the three venue groups?
(Answer: Yes). The variance (whisker-to-whisker) for the three groups, although not exactly 100% equal, are roughly the same to each other... happiness.
To start the analysis, click Analyze > General Linear Mode > Univariate
This will bring up the Univariate dialogue box. To carry out the test, move the dependent (scale) variable into the Dependent Variable: placard. Next move the two independent (nominal or ordinal) variables into the Fixed Factor(s): placard. There are eight extra option buttons which you can configure with helpful and important statistical metrics. However, you can just move the variables into the correct placards, and then click the OK button to obtain a quick and simple result.
We will walk through several of the option buttons to bolt-on a few helpful and important measurements. For a simple Two-Way ANOVA test where you have not included extra random factors or covariates, then the Model and Contrasts buttons can be left as is on the SPSS default settings. We will click the Plots button for some helpful chart tools.
There are only two factors, so place one on the Horizontal Axis: placard and the other on the Separate Lines: placard. [[ Tip 1: place the factor variable with the least number of groups as the lines; the Gender variable with male and female means there will be only two lines in the chart.]]
Next click the Add button, then select the type of chart you want (Line or Bar) and select the Include error bars option. You can have either Confidence Intervals or Standard Error. When finished, click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
Next, open the Post Hoc button. You can only carry out Post Hoc testing 1) on main effect variables, 2) the main effect variables have three or more groups (or levels), and 3) it is important to know between which groups the differences that exist are statistically significant. In our example, the Venue variable is the only main effect variable that has three levels (Putney, East Sheen, and Tooting), and I moved it into the Post Hoc Tests for: placard. Select the type of Post Hoc test you want, (it seems the most common are LSD, Bonferroni, and Tukey), and then click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
In every Two-Way ANOVA there is an interaction variable created between the two main effects variables. In our example, it will be the interaction between Venue and Gender (Venue * Gender), which examines if there is a statistically significant difference between a male and a female at each respective venue. To examine this interaction variable, open the EM Means (Estimated Marginal Means) button.
In our example, I moved the interaction variable (Venue * Gender) into the Display Means for: placard. Next, tick the Compare simple main effects option, and then from the drop-menu select the Confidence interval adjustment. In our example, I selected the LSD adjustment. Finally, click the Continue button at the bottom of this box to return to the main Univariate dialogue box
The next area to open is the Save button. You would have tested for the assumption of a reasonably normal distribution for the dependent variable for each group prior to running the test. However, if you have a complex ANOVA model, such as, 2 x 2 x 4 (which has 16 groups); or if there are only a few observations per group, which makes it difficult to check for normal distribution; or if you have a covariate in the model, then saving and testing the residuals is often the better way to test that the assumption of normal distribution is satisfied. In our example, I selected Residuals -- Unstandardized and Diagnostics -- Cook's Distance . Click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
Finally, open the Options button. Here you want to select from the Display section Descriptive statistics and Homogeneity tests . But there are other statistical metrics that could be important to your project, such as, Estimates of effect size and Observed power . Click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
Remember, not all these extra options you will need to run an initial and straight forward Two-Way ANOVA test. But we have walked through most of them to give you the confidence to find and add what you may need for your project. When finished, click the OK button to run the test.
The results will appear in the SPSS Output Viewer. In the Between-Subject Factors table there is the sample size (N) for each group. And in the Tests of Between-Subjects Effects table there is the test result for the two main effect factors (Venue and Gender) and for the interaction factor (Venue*Gender).
Here is this example, the differences in calories for the meals eaten across the three venues is statistically significant (F = 11.532 ; p <.001). But the differences in calories for the meals eaten across the two genders is not statistically significant (F = .012 ; p = .915). Equally, the differences in the calories for the meals eaten across the three venues and whether it was a man or woman eating the meal (interaction factor) is not statistically significant (F = .641 ; p = .533).
This is the standard result for the Two-way ANOVA test if you simply entered the variables in the main Univariate dialogue box and did not select any of the extra options. As you can see it give the basic result for the variables entered, but it provides no further (and often important) details.
Looking at the extra options that were selected, in the Descriptives table there are the key metrics for each group -- sample size (N), mean, and standard deviation. From these measurements you can understand in greater detail as to why the test indicated (or did not indicate) a statistically significant result for the variables tested.
Here is this example, the differences in calories between the three venues moves from 3972 (Putney) to 5465 (East Sheen) to 7012 (Tooting). This is an average difference of 1520 calories (minimum) to 3040 calories (maximum) depending on how you pair-up the three venues, which gives numeric evidence as to why the differences between the venues were statistically significant. Conversely, the difference between the males (5761) and females (5541) was only 220 calories, which which gives similar numeric evidence as to why the difference between the genders was not statistically significant.
Finally, for the interaction between the venue and gender, at Putney the difference between male (3429) and female (4282) was 853 calories, at East Sheen the difference between male (5846) and female (5179) was 667 calories, and at Tooting the difference between male (7020) and female (7006) was 14 calories. Here we can see that there was virtually no difference between a man eating at Tooting as compared to a woman eating at Tooting. Equally, you might suspect the difference between a man eating at Putney as compared to a woman eating at Putney would be statistically significant, as there is a 24.8% difference in calories. However, the sample size is very small with only 4 males as compared to 7 females -- SPSS is counter-balancing (mathematically) a robust difference against an extremely small sample sizes with reasonable large variances (standard deviations).
In the Levene's Test of Equality of Error Variances table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect all four measurements (p-values) to agree. And this is true with our example with all the p-values virtually the same between 0.287 and 0.308. The measurement you would refer to (and quote) in your write-up would be the top row titled, Based on Mean. Here in our example the p-value is 0.289 (well above the critical 0.05 alpha level) which indicates that between the three groups the variances (standard deviations) in the dependent variable does not violate homogeneity of variance.
Normal distribution and homogeneity of variance are important test assumptions; and having met these two assumptions will give you strong confidence that the test results are reliable. Later, we will conduct a secondary analysis on the Residuals and Cook's Distance metrics (that we saved earlier) as further evidence that the model meets these two important assumptions.
The Post Hoc Tests table will indicate where (between which pair-wise comparisons of the three venues) the statistical significant difference has occurred. Here in our example it was only in the main effect Venue variable (and not in the main effect Gender variable) that the differences in calories for the meals eaten was statistically significant (F_{(2)} = 11.532 ; p <.001). In this Post Hoc Tests table we are looking for any large mean difference that would have a p-value below the critical alpha level 0.05. Here in our example all three pair-wise comparisons (Putney to East Sheen, East Sheen to Tooting, and Tooting to Putney) have differences in the calories of the meals eaten that are statistically significant. In your write-up you would list the pairs and quote the mean differences and p-values; you could also include the 95% confidence interval around that difference as evidence for where mean difference in the population is likely to exist.
The Post Hoc test examines only the main effect variables and not the interaction variable (Venue*Gender). Here in our example, we have seen all three pair-wise comparisons of the venues have differences in the calories that are statistically significant. But we do not know if there are statistically significant differences between the male and female genders at each of these venues. The result of the EM Means will examine this interaction of gender on each venue.
As mentioned earlier, here again we are looking for any large mean difference that would have a p-value below the critical alpha level 0.05. Here in our example for the female gender there is a statistically significant difference at the Putney to Tooting (p = .002) and the East Sheen to Tooting (p = .029) venues. But for the male gender there is only a statistically significant difference at the Putney to Tooting (p = .001) venues.
The Profile Plots will give a snapshot of the test results of the two main effect factors (Venue and Gender) and the interaction factor (Venue*Gender). Here in our example you can visualise the robust jumps (the green dashes are the estimated average) in calories across the three venues -- 3972 (Putney) to 5465 (East Sheen) to 7012 (Tooting) which is where the statistically significant differences are occurring.
If you imagine the three venues collected into one plot, the three red dots (males) would merged near an average of 5761 calories and the three blue dots (females) would merged near an average of 5542 calories. This is a very small mean difference of 219 calories, (only a 3.95% increase in calories from females to males), and hence we can see why the effect of the second main factor gender was not statistically significant.
You can also visualize the interaction between a man or woman at each individual venue with Tooting being virtual no difference between the genders and Putney being the greatest difference between the genders. You should also notice the the red line (male) and the blue line (female) cross over each other between Putney and East Sheen. This 'crossing over' is very typical of an interaction effect rather than the lines staying parallel to each other. However, despite this crossing over, there is still not enough evidence to indicate the interaction factor is statistically significant. This could be because of the small sample sizes, the actual mean difference between the male and female at each venue, and the high degree of overlapping in the 95% confidence intervals.
Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the Two-Way ANOVA model. However, this may not always be possible with complex modelling tests that include random factors or covariates.
Therefore, it is equally important to carry out secondary analysis on two model created variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.
In the histogram chart of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data. The histogram is showing the data values are centrally gathered around the mean (not skewed to the left or right tails and no significant outliers).
In the scatter chart, which is a comparison of Cook's Distance to the model's dependent variable Calories, we can see that for every observation in Calories (N = 40) that the Cook's D values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array.
That said, there are two values that are just sneaking over the allowable limit (values that are 3X higher than the mean of the data array) which are indicators of likely influential observations which should be investigated. In our example the mean of our Cook's Distance variable is 0.03 and therefore 3X this mean is 0.09. Any up-spike values in Cook's Distance could bring into question the 100% numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model. In our example it is only 2 values out of 40 samples (5.0%), and they are both just over the allowable limit.
For secondary analysis these charts provide evidence that the Two-Way ANOVA model with the dependent variable Calories as an outcome of (or predicted by) the two independent variables (Venue and Gender) meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.
Happiness... you should now understand how to perform the Two-Way ANOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
A Repeated Measures ANOVA is used to compare the means of three (or more) variables where the participants (or cases) are the same for each variable. This can occurs: (1) when participants are measured multiple times to see changes to an intervention; or (2) when participants are exposed to a different condition for each variable and we want to compare the response to each of these conditions, for example, we measured response time in a driving simulator while listening to 1-heavy metal, 2-jazz, and 3-classical.
As illustrated above the simplest Repeated Measures ANOVA involves three variables all measured on the same participants. Whatever distinguishes these three variables (time of measurement, an intervention, a different condition) is titled the "Within-Subjects Factor" in SPSS, which will determine if differences in the three means between the repeated measurements are statistically significant.
(1) The participants (samples) are the same participants in all the variables, and are taken at random from the population.
(2) The dependent variables (test variables) are continuous (interval or ratio).
(3) The independent variables (if you have these) are categorical (nominal or ordinal), and there can be two, three, four, or more groups in each independent variable.
(6) The dependent variables (test variables) have a reasonably normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(6) The dependent variables (test variables) has equal variance (sphericity) when compared to each other in every possible multi-pairwise combination. This will normally occur when the standard deviations for each variable are roughly the same.
(Q1) Do the four dependent variables (Time.0900-2000) have a reasonably normal distribution?
(Answer: It seems, Yes). All the histograms display a reasonably normal distribution. Albeit the chart for Time 1300 looks slightly skewed (weighted) toward lower numbers (2.5, 5.0, 7.5); but we can still say, it is within the limits of a normal distribution.
(Q2) Do the four dependent variables (Time.0900-2000) have equal variance (sphericity)?
(Answer: It seems, Yes). The variances (standard deviations) for all four time variables by smokers (e-vape, cigarette, and cigar) are roughly similar. Imagine in the above multi-bar chart a balloon could take the exact shape of the upper and lower limits of all the standard deviations. All the differences in the standard deviations would distort the balloon away from 100% perfect sphericity (equal variance). The large deviations (as at Time 0900 and Time 1300 for the cigarette smokers) would expand bumps in the balloon; and the small deviations (as at Time 1600 and Time 2000 for e-vape smokers) would contract dimples in the balloon. The question is how many and how great must these areas of expansion and contraction be to violate sphericity taking in account all the deviations in all the data groups? Happiness for us, there is a test within the Repeated Measures ANOVA that examines sphericity.
To start the analysis, click Analyze > General Linear Model > Repeated Measures
This will bring up the Repeated Measures Define Factor(s) dialogue box. First give the factor a suitable title in the Within-Subject Factor Name: placard and enter a number in the Numbers of Levels: placard for the total variables used, and then click the Add button. In our example the variables are different hours and we repeated the measurement 4 times. Second, enter a suitable title in the Measure Name: placard, and then click the Add button. In our example the variables are measuring cortisol levels in saliva samples. Finally, click the Define button an the bottom of the dialogue box.
The Repeated Measures dialogue box will open. In the Within-Subject Variable: placard, you need to replace the ? marks with the variables you have measured. In our example, it is Time.0900, Time.1300, Time.1600, and Time.2000. You can select them all (in the correct order) and use the move-across arrow in the middle. We will walk through several of the option buttons to bolt-on a few helpful and important measurements.
As we are running a simple Repeated Measures ANOVA, that is, without any between-subject factors or any covariates, we can therefore accept the default setting for the Model option. But click the Contrasts button as we need to change this option.
Note 1: There is the Two-Way (Factorial) ANOVA guide to review if you are including any between-subject factors, such as, between cigars, cigarettes, and e-vape (Smoker.Type).
In the Contrasts options box, in the Change Contrast section, select Repeated from the drop-down menu. Be sure to click the Change button after you make the selection. When finished, click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
Next click the Plots button, move the one factor you created into the Horizontal Axis: placard, and then select the type of chart you want (Line or Bar). Next, select the Include error bars option and choose either Confidence Intervals or Standard Error. When finished, click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
As we are running a simple Repeated Measures ANOVA, that is, without any between-subject factors or any covariates, we can therefore accept the default setting for the Post Hoc , EM Means, and Save options.
Note 2: There is the Two-Way (Factorial) ANOVA guide to review if you are including any between-subject factors with Post Hoc analysis.
Note 3: There is the ANCOVA (Analysis of Covariance) guide to review if you are including any covariates with EM Means analysis.
Finally, open the Options button. Here you want to select from the Display section Descriptive statistics. There are other statistical metrics that could be important to your project, such as, Estimates of effect size and Observed power . Click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
In the main Repeated Measures dialogue box, click the OK button to run the test.
The results will appear in the SPSS Output Viewer. In the Within-Subject Factors and Descriptive Statistics tables there are the key statistical measurements -- the dependent variables, the sample size (N), the mean, and the standard deviation. By reviewing these you should have some intuitive perception as to where the results might be statistically significant.
Here is our example there is a 4.4 point decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases between 1300 to 1600 to 2000 to not be statistically significant.
Next, there is the Mauchly's Test of Sphericity table which is the key measurement for the equality of variance between the dependent variables. In an ideal world, you want the p-value (Sig) to be above the critical alpha threshold (0.05), as this would indicated the variance in all the dependent variables does not violate sphericity (equality of variance). Here in our example, this is the case with the p-value reported at 0.068. Look back at the Descriptive Statistics table, and the very close similarity in the four standard deviations which also gives evidence that sphericity would most likely not be violated.
However, if the p-value is below the critical alpha threshold (0.05), and therefore sphericity is violated, then you will have to report one of the other results (Greenhouse-Geisser or Huynd-Feldt) which is based on the epsilon value for these corrections for sphericity (Girden (1992), Howell (2002), Field (2013)).
In the Tests of Within-Subjects Effects table are the key test results to report -- F-value (test statistic), p-value (Sig), and df (degrees of freedom). Here in our example (as mentioned earlier) the Mauchly's Test of Sphericity was not violated and therefore we report the Sphericity Assumed result. If needed, the other results (Greenhouse-Geisser and Huynh-Feldt) are listed here. When reporting the degrees of freedom (df), be sure to include both the actual value and the error value.
Here the Tests Within-Subjects Effects table indicates that in the Time factor (there are four dependent variables in this factor -- Time.0900, Time.1300, Time.1600, and Time.2000), the differences in the means (across some or all) of the four repeated measurements are statistically significant (F_{(3, 81)} = 22.428 , p <0.001). However, it is not indicating exactly where the statistically significant differences are occurring, that is, between which repeated measurements.
The following Tests Within-Subjects Contrasts table pairs together the four repeated measurements to indicate where the statistically significant difference is occurring in the Time factor.
Here in our example the difference in the means that is statistically significant occurs at Level 1 vs Level 2 (Time.0900 vs Time.1300) and again at Level 3 vs Level 4 (Time.1600 vs Time.2000). Both these pairs of repeated measurements have large F scores and p-values below the critical alpha threshold (0.05).
Finally, the Profile Plots provide a snapshot of the test results for the means of cortisol across the four repeated measurements. The plot is good visual evidence of the test result. The chart clearly displays the statistically significant decrease (36.0%) between Time.0900 and Time.1300 (12.22 down to 7.82). Equally, there is a statistically significant increase (18.2%) between Time.1600 and Time.2000 (8.35 up to 9.87).
Happiness... you should now understand how to perform the Repeated Measures ANOVA test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
A Repeated Measures (Factorial) ANOVA is used to compare the means of three (or more) variables where the participants (or cases) are the same for each variable. This can occurs: (1) when participants are measured multiple times to see changes to an intervention; or (2) when participants are exposed to a different condition for each variable and we want to compare the response to each of these conditions; for example, we measured response time in a driving simulator while listening to 1-heavy metal, 2-jazz, and 3-classical.
As illustrated above Repeated Measures (Factorial) ANOVA involves three (or more) variables all measured on the same participants. Whatever distinguishes these variables (time of measurement, an intervention, a different condition) is the "Within-Subjects Factor" in SPSS, which will determine if differences in the means between the repeated measurements are statistically significant.
In addition, for the Factorial part of the test, you must have an independent grouping variable with two, three, four, or more groups. Here in our example we will have a SmokerType variable with three groups (cigars, cigarettes, and e-vape).
(1) The participants (samples) are the same participants in all the variables, and are taken at random from the population.
(2) The dependent variables (test variables) are continuous (interval or ratio).
(3) The independent variables are categorical (nominal or ordinal), and there can be two, three, four, or more groups in each independent variable.
(6) The dependent variables (test variables) have a reasonably normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(6) The dependent variables (test variables) has equal variance (sphericity) when compared to each other in every possible pairwise combination of the independent variable. This will normally occur when the standard deviations for each pairwise combination are roughly the same.
(Q1) Do the four dependent variables (Time.XX:XX) have a reasonably normal distribution?
(Answer: It seems, Yes). All the histograms display a reasonably normal distribution. Albeit the chart for Time 1300 looks slightly skewed (weighted) toward lower numbers (2.5, 5.0, 7.5); but we can still say, it is within the limits of a normal distribution.
(Q2) Do the four dependent variables (Time.XX:XX) divided into the three groups have equal variance (sphericity)?
(Answer: It seems, Yes). The variances (standard deviations) for all four Time variables divided by smokers (cigar, cigarette, and e-vape) are roughly similar. Imagine in the above multi-bar chart that a balloon could take the exact shape of the upper and lower limits of all the standard deviations. All the differences in the standard deviations would distort the balloon away from 100% perfect sphericity (equal variance). The large deviations (as at Time 0900 and Time 1300 for the cigarette smokers) would expand bumps in the balloon; and the small deviations (as at Time 1600 and Time 2000 for e-vape smokers) would contract dimples in the balloon. The question is how great must these areas of expansion and contraction be to violate sphericity taking in account all the deviations in all the data groups? Happiness for us, there is a test within the Repeated Measures (Factorial) ANOVA that examines sphericity.
To start the analysis, click Analyze > General Linear Model > Repeated Measures
This will bring up the Repeated Measures Define Factor(s) dialogue box. First give the factor a suitable title in the Within-Subject Factor Name: placard and enter a number in the Numbers of Levels: placard for the total variables used, and then click the Add button. In our example, the variables are different hours and we repeated the measurement 4 times. Second, enter a suitable title in the Measure Name: placard, and then click the Add button. In our example, the variables are measuring cortisol levels in saliva samples. Finally, click the Define button at the bottom of the dialogue box.
The Repeated Measures dialogue box will open. In the Within-Subject Variable: placard, you need to replace the ? marks with the variables you have measured. In our example, it is Time.0900, Time.1300, Time.1600, and Time.2000. You can select them all (in the correct order) and use the move-across arrow in the middle. We will walk through several of the option buttons to bolt-on a few helpful and important measurements.
Next, move the independent variable into the Between-Subject Factor(s): placard. As mentioned earlier, this is the Factorial part of the test. In our example, we are using the SmokerType variable, which has three groups (cigars, cigarettes, and e-vape).
We can accept the SPSS default setting for the Model option. Next, click the Contrasts button, as we need to change this option. In the Contrasts options box, in the Change Contrast section, select Repeated from the drop-down menu. Be sure to click the Change button after you make the selection. When finished, click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
Next, click the Plots button, move each main effect factor (one at a time) into the Horizontal Axis: placard, and click the Add button. Also create a chart for the interaction between the Time and SmokerType variables by moving Time on the Horizontal Axis: placard and SmokerType on the Separate Lines: placard, and click the Add button. Select the type of chart you want (Line or Bar). Then select the Include error bars option and choose either Confidence Intervals or Standard Error. When finished, click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
Next, open the Post Hoc button. If you remember from the Two-Way ANOVA test, you can only carry out Post Hoc testing 1) on main effect variables that have three or more groups (levels), and 2) it is important to know between which groups the differences that exist are statistically significant. In our example, the SmokerType variable is the only main effect variable we have, and (happiness) it has three levels (cigars, cigarettes, and e-vape). I have moved it into the Post Hoc Tests for: placard. Then select the type of Post Hoc test you want, (it seems the most common are LSD, Bonferroni, and Tukey), and then click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
In this model, SPSS will create an interaction variable (SmokerType * Time) to examine if there are statistically significant differences between the times we took a measurement (0900. 1300, 1600, 2000) and what type of smoker was measured (cigar, cigarette, e-vape) at each time. To examine this interaction in detail, open the EM Means button.
In the EM Means dialogue box, first move the interaction variable (SmokerType * Time) into the Display Means for: placard. Next, tick the Compare simple main effects option, and then from the Confidence interval adjustment drop-menu select the adjustment type. In our example, I selected the LSD adjustment. Finally, click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
.
The next area to open is the Save button. You would have tested for the assumption of a reasonably normal distribution for the repeated measurements of the dependent variable for each of the independent groups prior to running the test. However, if you have a complex Repeated Measures (Factorial) ANOVA model, such as, 2 x 2 x 3 (which has 12 groups); or if there are only a few observations per group, which makes it difficult to check for normal distribution; or if you have a covariate in the model, then saving and testing the residuals is often the better way to test that the assumption of normal distribution is satisfied. In the Save box, select the Residuals -- Unstandardized and Diagnostics -- Cook's Distance. Click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
Finally, open the Options button. Here you want to select from the Display section Descriptive statistics. However, there are other statistical metrics that could be important to your project, such as, Estimates of effect size and Observed power. You only need to tick the option Homogeneity tests if you have only two repeated measurements. As our example has four repeated measurements, we do not need to tick this metric. Click the Continue button at the bottom of this box to return to the main Repeated Measures dialogue box.
In the main Repeated Measures dialogue box, click the OK button to run the test.
The results will appear in the SPSS Output Viewer. In the Within-Subject Factors and Between-Subject Factors tables, there are the basic variable details -- the repeated measurements for the dependent variable and the levels (groups) for the independent variable with their sample size (N).
In the Descriptive Statistics table there are the key statistical measurements -- the sample size (N), the mean, and the standard deviation. By reviewing these you should have some intuitive perception as to where the results might be statistically significant, that is, 1) comparing between the four repeated times (the rows labelled, Totals) and 2) comparing between each smoker type (cigar, cigarette, and e-vape) at each time measured. Here is our example there is a 4.4-point decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases from 1300 to 1600 to 2000 to be not statistically significant. Interesting to note, is that at all four measurements the cigar smoker had the highest level of cortisol.
Next, there is the Mauchly's Test of Sphericity table which is the key measurement for the equality of variance between the dependent variables and the independent variable. In an ideal world, you want the p-value (Sig) to be above the critical alpha threshold (0.05), as this would indicated all the variance in the pairwise comparisons (four repeated measurements divided by three groups) does not violate sphericity (equality of variance). Here in our example, this is the case with the p-value reported at 0.077. Look back at the Descriptive Statistics table, and the close similarity in all the standard deviations would give you intuitive evidence that sphericity would not be violated.
However, if the p-value is below the critical alpha threshold (0.05), and therefore sphericity is violated, then you will have to report one of the other results (Greenhouse-Geisser or Huynd-Feldt). These corrections on sphericity are based on their epsilon value, as per the following diagram [Girden (1992), Howell (2002), and Field (2013)].
In the Tests of Within-Subjects Effects table are the key test results to report -- F-value (test statistic), p-value (Sig), and df (degrees of freedom). Here in our example (as mentioned earlier) the Mauchly's Test of Sphericity was not violated and therefore we report the Sphericity Assumed result. If needed, the other results (Greenhouse-Geisser and Huynh-Feldt) are listed here. When reporting the degrees of freedom (df), be sure to include both the actual value and the error value.
Here the Tests Within-Subjects Effects table indicates that in the Time factor (there are four dependent variables in this factor -- Time.0900, Time.1300, Time.1600, and Time.2000), the differences in the means (across some or all) of the four repeated measurements are statistically significant (F _{(3, 75)} = 21.759 , p <0.001). However, it is not indicating exactly where the statistically significant differences are occurring, that is, between which repeated measurements.
In addition, the interaction variable (Time * SmokerType) in not statistically significant (F _{(6)} = .665 , p = .678). This is indicating that the differences in cortisol measured between a cigar, cigarette, and e-vape smoker at each of the four repeated measurements were not statistically significant.
Before we go further, we will look at two of the plots (line charts) we created that illustrate these two different results. The first plot is for the Time factor with the four different repeated measurements. Note: it is important to remember that the Time factor is not only the four repeated measurements, but it is also all smokers (cigar, cigarette, and e-vape) combined. As the line chart shows (and as mentioned earlier), there is a 4.4 unit decrease (36.0%) in cortisol measured between 0900 and 1300; and as they are the same people, you would expect this large decrease to be statistically significant. Conversely, you might expect the gradual increases from 1300 to 1600 to 2000 to not be statistically significant.
Remember the Tests Within-Subjects Effects table is not indicating exactly where the statistically significant differences are occurring between the repeated measurements, but just that somewhere across the four measurements there are differences that are statistically significant (F _{(3, 75)} = 21.759 , p <0.001).
The next plot we want to look at is for the interaction variable (Time * SmokerType) which was not statistically significant (F _{(6)} = .665 , p = .678). However, unlike the first plot, here the four repeated measurements in the Time factor are divided by the three smoker types (cigar, cigarette, and e-vape). Here we can see all three lines are roughly running parallel with each other, which is a strong indication of no interaction, that is, all the smokers are basically doing the same in their cortisol levels... all are dropping down at the 1300 time (#2), and then all gradually increase across the 1600 time (#3) and the 2000 time (#4).
That said there is a crossover between the red line (cigarette) and green line (e-vape) from the 1300 time to 1600 time which does indicate at these two times it did make a difference which smoker it was. However, not enough evidence to be statistically significant. Equally, across all the times the blue line (cigar) is always much higher, while the red (cigarette) and green (e-vape) lines are always nearly side-by-side to each other. This does indicate that across at the times, it does make a difference that it is a cigar smoker and not a cigarette or e-vape smoker. But again, not enough evidence to be statistically significant.
The following Tests of Within-Subjects Contrasts table pairs together the four repeated measurements to indicate where the statistically significant differences are occurring in both the main effect factor (Time) and in the interaction variable (Time * SmokerType).
Here in our example the difference in the means that is statistically significant occurs at Level 1 vs Level 2 (Time.0900 vs Time.1300). Remember, this is where we had the 4.4 unit decrease (36.0%) in cortisol measured in the smokers. Well, something we did not notice earlier, is that at Level 3 vs Level 4 (Time.1600 vs Time.2000) the difference in cortisol measured is also statistically significant (F _{(1, 25)} = 11.435 , p = 0.002). Again, be sure to report the Error(Time) degrees of freedom (df) value when quoting these metrics.
In this same Tests of Within-Subjects Contrasts table, if we look at the interaction variable (Time * SmokerType) in the sequential comparisons across the four repeated measurements, there are no comparisons that are statistically significant... they all have low F-scores and p-values above the critical alpha level (0.05).
However, if there were comparisons that were statistically significant, and you want to examine between which smoker type at each of the four repeated measurements, then the Pairwise Comparisons table under the Estimated Marginal Means result would provide these details (listed below).
In our example, out of 12 possible pairs (3 smokers X 4 repeated measurements) this occurred only twice (16.6%) at the #3 level (1600 Time) -- the cigar against cigarette smokers and the cigar against e-vape smokers. But only 2 comparisons out of 12 total comparisons are not enough mathematical evidence (from an overall perspective) to indicate the interaction is statistically significant (16.6% as Yes versus 83.4% as No).
Finally, the Tests of Between-Subjects Effects table provides the results for the SmokerType independent variable. Here in these results the four repeated measurements are merged as one variable and then compared between the three smoker types (cigar, cigarette, and e-vape). As the table shows the differences between the three smoker types is not statistically significant (F _{(2, 25)} = 2.956 , p = 0.070). We can see these results in the third plot (line chart) we created.
Interesting results here in the Tests of Between-Subjects Effects table, as the line chart shows the decrease between the cigar and cigarette smokers is 3.02 units of cortisol which is a 26.5% decrease (on average). In setting up this model we configured the Post Hoc option to examine the main effects of the independent variable (SmokerType). However, as there is no statistical significance here, there is no need to look at all the pairwise comparisons in the Post Hoc table.
Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the model. However, this may not always be possible with complex modelling tests that include random factors or covariates.
Therefore, it is equally important to carry out secondary analysis on the two 'model created' variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.
In the histogram charts of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data in all but the 0900 time. Well, that said, all the histograms are showing a large proportion of the data gathered on the negative side rather than centrally around the mean with the 0900 time being the biggest offender.
And you might expect this, as we are measuring cortisol levels which would lend itself to more low-range scores rather than high-end scores. The simple solution would be to apply a log transformation to the original data, and then re-run the model using the log transformed data.
In the scatter chart (which is an average of the four repeated measurements), is a comparison of Cook's Distance to the model's dependent variable (Time). In an ideal world, we want to see that for every observation in our dependent variable (N = 28) that the Cook's values all have relatively the same distance up from the baseline (X-axis) with no up-spike values. Any up-spike values are indicators of likely influential observations which should be investigated. Well, that said, there are 6 data values (21.4%) that are at (or above) the allowable limit (Cook's values that are 3X higher than the mean of the data array).
Here in our example the mean of the data array is 0.04 and the allowable limit is therefore 0.12 (3X the mean). The six up-spike values (albeit only two have breeched the allowable limit) would need to be investigated to verify the degree of influence (over-bearing weight) they may be having on the model. Up-spike values in Cook's can bring into question the numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.
For secondary analysis these charts provide evidence that the model meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.
Happiness... you should now understand how to perform the Repeated Measures (Factorial) ANOVA test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The ANCOVA (Analysis of Covariance) is similar to the One-way ANOVA, as it is used to detect a difference in means of three (or more) independent groups; but the difference occurs in that at the same time we are controlling for a 'secondary' variable (covariate).
In any experiment some of the unexplained variability can be due to some additional, secondary variable (covariate). The covariate may not be the targeted focus of the research hypothesis but could influence the main dependent (test) variable. If we can remove (or isolate) the effect of this secondary variable, we could demonstrate a more accurate picture of the true effect from the independent (factor) variable. This is the main goal of ANCOVA (Analysis of Covariance.
(1) The dependent variable (test variable) is continuous (interval or ratio).
(2) The independent (factor) variables are categorical (nominal or ordinal) and there should be at least three (or more groups) in each independent (factor) variable.
(3) The independent covariates (secondary variables) are continuous (interval or ratio). And there is a linear relationship between the dependent test variable and the independent covariates. This linear relationship must be for all the groups in the factor variables.
(4) The regression lines of slope expressing these linear relationships should all be reasonably parallel (homogeneity of regression slopes).
(5) The participants (samples) have no relationship between the other participants in their group or between the participants from the other groups.
(6) The participants (samples) for each group are taken at random from the population.
(7) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).
(8) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.
(Q1) Does the dependent test variable Calories 1) have a reasonably normal distribution in each of the venue groups and 2) is there homogeneity of variance between the the venue groups?
(Answer: (1) Yes and (2) Yes). All the IQR ranges (blue boxes) are reasonably central to the box plot with the median (black line) not excessively off-centred within the IRQ range. This centrality (symmetrical shape) of the data are strong indicators of normal distribution across each venue. Equally the whisker-to-whisker spread (variance) of each box plot is reasonably similar and therefore exhibiting good evidence of homogeneity of variance across each venue.
(Q2) Do the regression lines of slope for the three venue groups demonstrate reasonable homogeneity for their slopes?
(Answer: Yes). Although the three lines are not 100% parallel (which would be a perfect homogeneity of regression slopes), the angles of the slope of the three lines are not wildly different to each other. Therefore, we can affirm that the regression slopes display reasonable homogeneity across the venue groups.
To start the analysis, click Analyze > General Linear Mode > Univariate
This will bring up the Univariate dialogue box. To carry out the test, move the dependent (scale) variable into the Dependent Variable: placard. Next move the independent (nominal or ordinal) variable into the Fixed Factor(s): placard. Finally, move the independent (scale) covariate into the Covariate(s): placard. There are eight extra option buttons which you can configure with helpful and important statistical metrics. However, you can just move the variables into the correct placards, and then click the OK button to obtain a quick and simple result.
We will walk through several of the option buttons to bolt-on a few helpful and important measurements. For a simple ANCOVA test where you have not included several random factors or covariates, then the Model and Contrasts buttons can be left as is on the SPSS default settings. We will click the Plots button for some helpful chart tools.
There is only one factor, and we can place one factor on the Horizontal Axis: placard. Next click the Add button, then select the type of chart you want (Line or Bar) and select the Include error bars option. You can have either Confidence Intervals or Standard Error. When finished, click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
The Post Hoc button is greyed-out, and this is because we have included a covariate in the model. Similar to the Post Hoc testing, we can still examine between which venue groups the differences are statistically significant. But SPSS we can only 'estimate' these differences as we are controlling (or adjusting) the dependent variable (Calories) by the covariate (SatsFats).
Next open the EM Means button. Move the one main factor into the Display Means for: placard, and then tick the Compare main effects option. There are three possible mathematical comparisons, and in my example I chose the LSD comparison. When finished, click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
The next area to open is the Save button. You would have tested for the assumption of a reasonably normal distribution for the dependent variable across each group prior to running the test. However, if you have a complex model, such as, two main factor variables; or if there are only a few observations across the groups in the main factor variables (which makes it difficult to check for normal distribution); or if you have a covariate in the model; then saving and testing the residuals is often the better way to verify that the assumption of normal distribution is satisfied. In our example, I selected Residuals -- Unstandardized and Diagnostics -- Cook's Distance . Click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
Finally, open the Options button. Here you want to tick the option for Homogeneity tests in the Display section . But there are other statistical metrics that could be important to your project, such as, Estimates of effect size and Observed power. Click the Continue button at the bottom of this box to return to the main Univariate dialogue box.
Remember, not all these extra options you will need to run an initial and straight forward ANCOVA test. But we have walked through most of them to give you the confidence to find and add what you may need for your project. When finished, click the OK button to run the test.
The results will appear in the SPSS Output Viewer. As mentioned earlier in the Univariate dialogue box, you can just move the variables into the correct placards, and then click the OK button to obtain a quick and simple result.
In the Between-Subject Factors table there is the sample size (N) for each group in the Venue variable. And in the Tests of Between-Subjects Effects table there is the test result for the main effect factor (Venue) and for the covariate (SatFats). Two important items to keep in mind here are:
Here is this example, the differences in calories for the meals eaten across the three venues is statistically significant (F = 4.916 ; p = .013) while controlling for the saturated fats measured in those meals. Equally, the relationship (correlation) in the calories for the meals eaten across the three venues and the saturated fats measured in those meals, is also statistically significant (F = 32.067 ; p = .001).
Looking at the extra options that were selected, in the Levene's Test of Equality of Error Variances table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect the p-value to be above the critical alpha threshold (0.05).
This is true with our example where the p-value (Sig) is 0.188. These measurements (F score and p-value) you would refer to (and quote) in your write-up which indicates that between the three venue groups the variances (estimated standard deviations) in the dependent variable (Calories) does not violate homogeneity of variance. Remember, you may have looked at homogeneity of variance for calories between the three venue groups separately before using this test. But now, in the ANCOVA model, the calories variable is being adjusted, controlled for, by saturated fats which will change the earlier measurements.
If we look at the Estimated Marginal Marginal option, in the Estimates table there are the key metrics for each group -- estimated mean, standard error, and 95% confidence interval. From these measurements you can understand in greater detail as to why the test indicated (or did not indicate) a statistically significant result for the dependent variable tested. Here in this example, the estimated means in calories between the three venues moves from 4579 (Putney) to 5843 (East Sheen) to 6014 (Tooting).
In the Pairwise Comparison table there are all the possible 'pairs' of venue comparisons. You are looking for large differences and consequently a p-value below the critical alpha threshold (0.05). In our example there are only two venue pairs -- Putney to East Sheen and Putney to Tooting -- that meet this criteria. These numbers (mean difference, p-value, and 95% confidence intervals) provide evidence as to why the differences between these two venues were statistically significant, and in your write-up you would quote these numbers.
Please note that we could not preform Post Hoc tests (this button was greyed-out) because in the ANCOVA model there is a covariate. Post Hoc tests are preformed on the actual means and actual standard deviations of the dependent variable (Calories) across the three groups of the fixed factor (Venue). We cannot determine these 'actual' measurements here in this model because of the control (or compensation allowed for) by the covariate (SatFats) which therefore only allows for 'estimates'.
The option Profile Plots will give a snapshot of the test results of the dependent variable (Calories) across the three groups of the fixed factor (Venue) as compensated for, or controlled for, by the covariate (SatFats). Here in this example I have added an extra plot of the actual means across the three venues as a comparison.
The two plots together allow you to understand better what is happening to the dependent variable (Calories) when you control for the amount that saturated fats are accounting for the variance in calories -- Putney increased from 3973 to 4580 (roughly 600 calories), East Sheen increased from 5251 to 5844 (again 600 calories), while Tooting decreased from 7013 to 6014 (roughly 1000 calories).
It is important to note that SPSS has indicated (via footnotes in different result tables) that the estimates for dependent variable (Calories) in this ANCOVA model were determined by the covariate (SatFats) at 56.4850. This is the mean value of the SatFats variable over the entire dataset, that is, all 40 samples.
Normal distribution and homogeneity of variance are important test assumptions that provide strong confidence that the test results are reliable; and you would normally confirm these assumptions are met prior to constructing the ANCOVA model. However, this may not always be possible with complex modelling tests that include random factors or covariates.
Therefore, it is equally important to carry out secondary analysis on the two 'model created' variables -- Unstandardized Residuals and Cook's Distance -- that we saved earlier as further evidence that the model meets these important assumptions.
In the histogram chart of the Unstandardized Residuals, we can see there is a reasonably normal distribution of the data. The histogram is showing the data values are centrally gathered around the mean (not skewed to the left or right tails and no significant outliers). As confirmation, the p-value in the Shapiro-Wilk test is above the critical alpha threshold (0.05) indicating the data are not violating a normal distribution (not statistically different from a normal distribution).
In the scatter chart, which is a comparison of Cook's Distance to the dependent variable (Calories), we want to see that for every observation in Calories (N = 40) that the Cook's values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array. Any outstanding up-spikes (Cook's values that are 3X higher than the mean of the data array) are indicators of likely influential observations which should be investigated. These up-spike Cook's values would bring into question the numerical accuracy (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.
Here in this example, there is one up-spike value out of the 40 samples. This up-spike value would need to be investigated to verify the degree of influence (over-bearing weight) it may be having an on the model.
For secondary analysis these charts provide evidence that the ANCOVA model meets, or indicates areas of concern, as regards the key assumptions of normal distribution, homogeneity of variance, and the degree of numerical accuracy from our sample.
Happiness... you should now understand how to perform the ANCOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Single Linear Regression test (aka: Simple Regression) can be seen as the continuation of correlation, that is, the two test variables 1) should have a correlation with each other, and 2) the correlation should be statistically significant.
Here in regression we want to be able to predict the value of one test variable by the value of the other test variable (hence the need for a correlation between them). The variable we want to predict is called the dependent variable (or outcome variable); and the variable we are using to predict is called the independent variable (or predictor variable).
(1) The two test variables are continuous (interval or ratio).
(2) There is a linear relationship between the two test variables
(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.
(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).
(5) The two test variables have equal variance (homogeneity / homoscedasticity) when compared to each other. Homogeneity (or homoscedasticity) means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.
(6) After completing the single linear regression test, you will need to check that the residuals (errors) of the regression line have a reasonably normal distribution as confirmation that the regression model is reliable.
(Q1) For a single linear regression test which predictor variable could you use for the outcome variable Energy (kcal)?
(Answer: Fats). You might say (and you would be correct to say this) that all the predictor variables in the Correlations table (Fats, Sugar, Protein, Fibre) have a correlation with the outcome variable which is Energy (kcal). Equally, every correlation is statistically significant, as all the p-values are below the critical 0.05 alpha level. However, the best predictor variable would be Fats, as it has the highest (.477) coefficient score.
(Q2) Do the two test variables have homogeneity (or homoscedasticity) between each other?
(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Fats and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.
To start the analysis,click Analyze > Regression > Linear...
This will bring up the Linear Regression dialogue box. To carry out the test, move the outcome variable into the Dependent: placard and the predictor variable into the Independent: placard. Next, click the Statistics... button and select the confidence intervals option which is set for the 95% level. Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.
After returning to the Linear Regression dialogue box, click the Plots... button. Move the *ZPRED variable into the X: axis placard and the *ZRESID variable into the Y: axis placard. [[Tip 1: In any chart the predictor (independent) variable should be on the X axis.]]
Next, select the Histogram and Normal probability plot options. [[Tip 2: Here you are creating a number of charts and plots to test that the residuals (errors) of the regression line have a reasonably normal distribution as per the earlier Test Assumptions section.]]
Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.
After returning to the Linear Regression dialogue box, click the OK button at the bottom of the dialogue box... Wow, that were a lot of boxes!
The result will appear in the SPSS Output Viewer. There are three key tables with several important test metrics. In the Model Summary table, there is R (the Pearson's Correlation coefficient) which is indicating the strength and direction of any correlation between the predictor and outcome variables. Also there is the R ^{2} (r multiplied by r) which is indicating the amount of shared variance between the two test variables. Here shared variance means to what degree the predictor variable accounts for (or can explain) the variance in the outcome variable. Here is our example the fat levels in the breads tested accounts for or can explain 25.6% (r^{2} = .256) of the variance in the energy (kcal) levels.
Next, there is the ANOVA table, and we might consider this as the 'fitting room' for the regression model. You are in a store and you pick out some clothes you want to buy. But you go into a fitting room to see how well the clothes fit to your body shape. This is what is happening here in this ANOVA table. There are different mathematical equations that can be used to predict one value from another value. Here SPSS is testing the fit (suitability) of a linear, straight-line equation with the shape of the two variable used in the model.
There is F-test score which indicates the strength (magnitude) for how well the linear regression equation fits the two variables in the model, as opposed to the null hypothesis (i.e., there is no (null) fit with the two variables used). There is also the p-value (Sig) for the regression model indicating if the fit (suitability) of the linear regression equation is statistically significant. Here in our example, we have a high magnitude F-test score (50.486), and the model is statistically significant (p < .001) indicating that 1) a linear, straight-line equation has a strong, robust fit with the shape of the correlation between the two variables in this model, and 2) that the fit (suitability) of this linear equation to the shape is statistically significant.
Finally, there is the Coefficients table which lists the regression equation coefficients, the intercept, and their statistical significance. In our example of white, brown, and seeded breads, the regression equation (Y = A + (B * X_{1})) would become:
Y (Energy (kcal)) = 221.461 + (5.865 * (Fats))
There are also the 95% C.I. around both Y-intercept (Constant) value and the X-predictor (Fats) value in the regression equation. This gives us a high-low measure of accuracy (confidence) as to how well our sample data values are likely to represent (or include) the actual values in the population. There are also the T-test score and the p-values (Sig) metrics which indicate the strength and statistical significance of the coefficients as compared to the null hypothesis (a zero numeric value).
Coming back to our example if we randomly took a loaf of white, brown, and seeded bread off the supermarket shelf and we read from the label there were 3, or 5, or 8 grams of fats, then we could estimate (predict) the levels of energy (kcal) that loaf is likely to have from our regression equation. And we would have a reasonably high level of confidence that the estimate would be accurate, as the 95% C.I. in the regression model are very narrow -- 213 to 230 for our Y-intercept and 4.2 to 7.5 for our X-predictor (Fats).
Finally, as required by the earlier test assumptions, we have the charts and plots to confirm if the residuals between the two variables tested in the regression model meet the assumption for normal distribution and the assumption for homoscedasticity (equal variance).
In the both the histogram and P-P plot we can see there is a very reasonable normal distribution for the residuals. The histogram is showing the data values are centrally gathered around the mean (i.e., not skewed to the left or right tails and no significant outliers). Equally, the P-P plot is showing a very tight wrapping (closeness) of the plotted data values to the line of fit.
The scatter chart of the predictor (X-axis) to outcome (Y-axis) residuals are showing reasonably good homoscedasticity, that is, there is the same variance in the plotted data value across the chart, there is very little bunching up of the data values to form tight clumps, and the top and bottom halves (split along the the red dashed line) are roughly mirror images of one of the other.
Therefore by this post-testing of the residuals, we have strong confirmation that the regression model meets the required assumptions for the test, and that the test result is therefore reliable.
Happiness... you should now understand how to perform the Single Linear Regression test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
Multiple linear regression is the next level higher to simple linear regression. It is used to predict a value of the dependent variable based on a value of two, three, or more independent variables.
As in single linear regression it attempts to model the relationship between dependent variable (outcome / target variable) and independent variables (predictor / explanatory variables) by fitting a linear equation to the observed data. Equally, multiple linear regression also allows you to determine the overall fit (variance accounted for by the predictors) of the model and the hierarchical degree of contribution of each of the predictors to the model.
(1) The dependent variable is continuous (interval or ratio).
(2) The independent variables can be either continuous (interval or ratio) or categorical (ordinal or nominal).
(3) The participants (observed samples) have no relationship between the other participants and are taken at random from the population. This independence of observations can be verified using the Durbin-Watson statistic in SPSS.
(4) There is a linear correlation between the dependent variable (outcome) and each of the independent variables (predictors).
(5) The independent variables should not be strongly correlated with each other (multicollinearity), which compromises the model's accuracy to determine the degree each independent variable accounts for the variance in the dependent variable.
(6) All the continuous (interval or ratio) variables should have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).
(7) The variables in the model need to show equal variance (homoscedasticity) when compared to each other. Homoscedasticity means you want the data (as plotted collectively between the variables) to be reasonably the same variance along the entire line of best fit.
(8) After completing the multiple linear regression test, you will need to check that the residuals (errors) of the regression line have a reasonably normal distribution as confirmation that the regression model is reliable.
(Q1) For a multiple linear regression test does the dependent variable (Muscle (kg)) and the independent variables (Weight(kg), Fat(%), BMI, BMR) indicate a reasonably normal distribution?
(Answer: Yes). All the p-values for the Shapiro-Wilk test are above the critical alpha threshold (0.05). In our example when the two tests (Kolmogorov-Smirnov and Shapiro-Wilk) show disagreement, as in the BMR variable, preference should be given to the Shapiro-Wilk result (Razali and Yap, 2011: 25 ; Moni and Shuaib, 2015: 18).
(Q2) Do the four independent variables (predictors) show signs of collinearity between each other?
(Answer: Yes). There is a robust correlation between the two predictor variables Weight (kg) and BMR (rho = 0.804). If we calculate the r^{2} for these two variables (0.804 x 0.804 = 0.646), which indicates these two variables account for (or share) variance with one another at 64.4%. In the multiple regression model you would exclude one of these as a predictor variable, as they are violating the assumption of no strong multicollinearity between the independent (predictor) variables.
To start the analysis, click Analyze > Regression > Linear
This will bring up the Linear Regression dialogue box. To carry out the test, move the one outcome variable into the Dependent: placard and the multiple predictor variables into the Independent: placard. In our example, we have excluded the predictor variable BMR because of a strong collinearity with Weight, as mentioned above in the Quick Quiz section. We will walk through several of the option buttons, which you can configure for helpful and important statistical metrics.
Before moving on, be sure to select Backward in the drop-down menu for the Method type.
First, click the Statistics button and select the confidence intervals option, which is set for the 95% level. Be advised there is an option for correlation tests and collinearity diagnostics. However, we did this separately by running a Spearman's Rho correlation on all the model variables, as mentioned in the Quick Quiz section. Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.
After returning to the Linear Regression dialogue box, click the Plots button. In the Scatter chart section, move the *ZPRED variable into the X: placard and the *ZRESID variable into the Y: placard. [[Tip 1: In any chart, the predictor (independent) variable should be on the X axis.]]
In the Standardize Residual Plots section, select the Histogram and Normal probability plot options. [[Tip 2: Here you are creating a number of charts and plots to test that the residuals (errors) of the regression line have a reasonably normal distribution as outlined in the Test Assumptions section.]] Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.
Some tutors may want you to perform some in-depth secondary analysis to verify the model is meeting the test assumptions. There are two common statistical measurements found in the Save options -- Standardized (or Unstandardized) Residuals and Cook's Distance. If you select these, SPSS will create these variables in your dataset to be used for secondary analysis. Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.
After returning to the main Linear Regression dialogue box, click the OK button at the bottom of the dialogue box to run the test.
The results will appear in the SPSS Output Viewer. There are several key tables with the important test metrics. In the Variables Entered / Removed table, there are two regression models listed with the predictor variables that were used (or removed) in each model.
Next in the Model Summary table, there is r metric (the Pearson's Correlation coefficient) which is indicating the strength and direction of any correlation between the predictors and outcome variable. There is the r^{2} metric (r multiplied by r) which is indicating the amount of shared variance between the predictor variables and outcome variable for each model (see the variables listed in the footnotes under the table).
Shared variance means to what degree the predictor variables accounts for (or can explain) the variance in the outcome variable. Here is our example, the predictor variables used in each model are accounting for 99.9% (r^{2} = .999) of the variance in the outcome variable (Muscle) -- astonishing -- the r^{2} value is normally never this high.
Next, there is the ANOVA table, and we might consider this as the 'fitting room' for the regression model. You are in a store and you pick out some clothes you want to buy. But, you go into a fitting room to see how well the clothes fit to your body shape. This is what is happening here in this ANOVA table. SPSS is mathematically testing the fit (suitability) of a linear, straight-line equation with the shape of the predictor variables to the outcome variable used in each model.
There is F-score which indicates the strength (magnitude) for how well a linear, straight-line equation fits the variables in each model, as opposed to the null hypothesis (i.e., there is no (null) fit with the variables used). There is also the p-value (Sig) for the regression model indicating if the fit (suitability) of the linear, straight-line equation is statistically significant. Be sure to report both the regression and residual degrees of freedom (df) in your write-up, for example, F _{(2, 39)} = 22915.4 , p < .001
Here in our example, (for both models) we have a high magnitude F-score (14916.5 and 22915.4), and both models are statistically significant (p < .001). There results are indicating that 1) a linear, straight-line equation has a strong, robust fit with the shape of the predicting variables to the outcome variable used in each model, and 2) that this fit (suitability) of the linear, straight-line equation is statistically significant.
Finally, there is the Coefficients table, which lists the regression equation coefficients and their statistical significance for both models. This table gives the answer as to which predictors (independent variables) are the best to estimate the outcome (dependent variable).
The first key metric is the p-values (Sig.) for each predictor variable in each model presented. In our example, for Model #1 the predictor BMI has a p-value of 0.779, which is over the critical alpha level (0.05). This indicates that in the presence of the other predictors (Weight and Fat), the BMI variable is not statistically significant; and therefore, it should be excluded from the regression model.
SPSS will then work "backwards" and recalculate the regression model with the remaining predictor variables to verify the new coefficients, p-values, t-scores, 95% confidence intervals, etc. It will continue this backwards process until all predictor variables that are not statistically significant are removed.
Happiness in our example, SPSS only had to make one reiteration of the regression model, which is Model #2 with the two best predictors (Weight and Fat). Therefore, the final regression equation [Y = A + (B * X_{1}) + (C * X_{2})] would become:
Y (Muscle) = 11.381 + (0.811 * Weight) + (-0.778 * Fats)
There are additional metrics that you can include in your final write-up: 1) There are the 95% Confidence Intervals around both the constant value (Y-intercept) and the predictor values (Weight and Fats). This gives us a high-low measure of accuracy (confidence) as to how well our sample data are likely to represent the actual values in the population. 2) There are the t- scores and the p-values (Sig), which indicate the strength and statistical significance of the coefficients as compared to the null hypothesis (a zero numeric value). 3) There are the Beta scores (Standardized Coefficients), which indicate in the presence of each other which predictors are stronger (accounting for more shared variance) than the other predictors.
Coming back to our example if we measured a person's weight and fat percentage, we then could estimate (predict) the amount of muscle that person is likely to have from our regression equation. And we would have a reasonably high level of confidence that the estimate would be accurate, as the 95% C.I. in the regression model are all very narrow at 10.7 to 12.1 for the Constant (Y-intercept), at 0.80 to 0.82 for the X_{1} predictor (Weight), and at -0.79 to -0.76 for the X_{2} predictor (Fats).
Finally, as required by the earlier test assumptions, we have the charts and plots to confirm if the residuals between the variables tested in the regression model meet the assumption for normal distribution and the assumption for homoscedasticity (equal variance).
In the both the histogram and P-P plot we can see there is a reasonably normal distribution for the residuals. The histogram is showing the data values are centrally gathered around the mean (i.e., not skewed to the left or right tails). Equally, the P-P plot is showing a reasonable degree of wrapping (closeness) of the data values to the line of fit. However, in an ideal world, we would prefer the data values to be tighter to the line of fit.
However, both charts are showing an issue with outliers (red arrows) which will cause an unwanted distortion to the numerical accuracy of the statistical measurements (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.
Next, the scatter chart below -- the predictors (X-axis) to outcome (Y-axis) residuals -- shows a reasonably degree of homoscedasticity. That is, there is the same variance in the plotted data values across the chart, and there is very little bunching up of the data values to form tight clumps. However, and the top and bottom halves (split along the red dashed line) are not as strong of a mirror image to each other as would be preferred.
Coming back to the outliers, the final scatter chart is a comparison of Cook's Distance to the dependent variable (Muscle). We want to see that for every observation in Muscle (N = 42) that the Cook's values all have relatively the same distance up from the baseline (X-axis), and that there are no predominant 'up-spikes' in the data array. Any outstanding up-spikes (Cook's values that are 3X higher than the mean of the data array) are indicators of likely influential observations, which should be investigated. As mentioned previously, these up-spike Cook's values would bring into question the numerical accuracy of the statistical measurements (e.g., means, standard deviations, 95% confidence intervals, F score, p-values, etc.) within the model.
Here in this regression model, there are 5 up-spike values (12%) out of the 42 observations. These up-spike values would need to be investigated to verify the degree of influence (over-bearing weight) they may have on the model. That said, if on an exam you were 88% accurate in your answers, you might be well chuffed with that result.
These charts for secondary analysis will provide evidence that the regression model is meeting, or indicate areas of concern, as regards the key assumptions of normal distribution, homoscedasticity (equality of variance), and the degree of numerical accuracy from our sample.
Happiness... you should now understand how to perform the Multiple Linear Regression test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites: