Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Online Study Tool Kit

A selection of our online resources to support your studies

Basic Statistics

AA Team Guide for Descriptive Statistics

There are a number of different ways to calculate descriptive statistics in SPSS. We will use the Frequencies menu option. To start the analysis, click on Analyze > Descriptive Statistics > Frequencies.

Frequencies menu path in SPSS

 

This will bring up the Frequencies dialogue box. You can move the scale variable you wish to calculate the descriptive statistics into the Variable(s) box. You can drag and drop the scale variable; or first select it, and then click the arrow button in the centre of the dialogue box.

Frequencies dialogue box in SPSS

 

Once you have moved the scale variable into the right-hand Variable(s) box, first untick the Display frequency tables option. Next, click the Statistics button. This will bring up the Frequency Statistics dialogue box, where it is possible to choose a number of descriptive measures.

Frequencies Statistics dialogue box in SPSS

 

Once you have ticked the descriptive measures you want, tick the Continue button, and then click the OK button in the Frequencies dialogue box to carry out the analysis.

 


The Result

The table of statistics is displayed in the SPSS Output Viewer. It is fairly self-explanatory displaying all the descriptive measures that you selected.

Table of descriptive statistics in SPSS

 


Further Study

Happiness... you should now be able to complete frequency tables in SPSS. However, if you want to explore further, here are two sites:


AA Team Guide for Frequency Tables

A frequency table will display the count and percentage for each level (group) in a categorical variable. We will use the Frequencies menu option. To start the analysis, click on Analyze > Descriptive Statistics > Frequencies.

Frequencies menu path in SPSS

 

This will bring up the Frequencies dialogue box. You can move the categorical variable (nominal or ordinal) you wish to create the frequency table into the Variable(s) box. You can drag and drop the categorial variable; or first select it, and then click the arrow button in the centre of the dialogue box.

Frequencies dialogue box in SPSS

 

Once you have moved the categorial variable into the right-hand Variable(s) box, be sure the Display frequency tables option is ticked. Next, click the OK button to create the table.

 


The Result

The table of frequency is displayed in the SPSS Output Viewer. It is fairly self-explanatory displaying the count (frequency) and the percentage for each level (group) within the categorial variable that you selected (below are two examples).

Frequency table results in SPSS

 


Further Study

Happiness... you should now be able to complete frequency tables in SPSS. However, if you want to explore further, here are two sites:


AA Team Guide for Charts

There are a number of excellent charts in SPSS to give visual interpretation to your data. We will look at four key charts as a starting reference, but you should be able to develop more charts as a follow-on from this guide.

  1. Histogram
  2. Bar
  3. Boxplot
  4. Scatter/Dot

For all the charts in this guide, we will use the Chart Builder. To start, click on Graphs > Chart Builder.

Menu path for the Chart Builder in SPSS

This will open the Chart Builder dialogue box, and I have labelled 6 areas to help navigate through the Chart Builder:

  1. list of variables
  2. chart construction tabs
  3. list of chart types
  4. variants for each chart type
  5. preview / sandbox area to construct the chart
  6. expand button for properties side panel

The Chart Builder dialogue box in SPSS

 


1) Histogram

After opening the Chart Builder, select Histogram from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the scale variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Finally, open the (6) properties side panel and tick the Display normal curve option. Click the OK button when finished.

The Chart Builder dialogue box for a Histogram in SPSS

 


2) Bar

After opening the Chart Builder, select Bar from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the categorial variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Gender). And then, drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Finally, open the (6) properties side panel and tick the Display error bars option, and select the type of error bars -- Confidence Intervals, or Standard Error (with 1 as the multiplier), or Standard Deviation (with 1 as the multiplier). Click the OK button when finished.

The Chart Builder dialogue box for a Bar chart in SPSS

 


3) Boxplot

After opening the Chart Builder, select Boxplot from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the categorial variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Gender). And then, drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). Click the OK button when finished.

The Chart Builder dialogue box for a Boxplot in SPSS

 


4) Scatter/Dot

After opening the Chart Builder, select Scatter/Dot from the (3) list of chart types, and choose the first variant from the (4) list of variants. Next, drag the scale variable you want to chart onto the X-axis placard in the (5) preview / sandbox area (in my example I used Weight_kg). And then drag the second scale variable you want to chart onto the Y-axis placard in the (5) preview / sandbox area (in my example I used Muscle_kg). Finally, open the (6) properties side panel and tick the Linear Fit Lines option, and select the Total as the type of line. Click the OK button when finished.

 


5) Chart Editor

After creating any chart in SPSS it will appear in the SPSS Output Viewer. If you double-click on the chart the Chart Editor will open; and there are menus and quick tools to change the text formatting, the scaling of X-axis and Y-axis, to add data labels, to add trendlines, and much more. When finished, close the Chart Editor and the changes will update on the original chart.

The Chart Editor in SPSS

 


Further Study

Happiness... you should now be able to create charts in SPSS. However, if you want to explore further, here are two sites:


AA Team Guide for Parametric Assumptions

There are a number of parametric assumptions that are requirements for certain statistical tests in SPSS. We will look at four key assumptions as the starting requirement for the majority of these tests.

  1. Scale Variable
  2. Normal Distribution
  3. Outliers
  4. Homogeneity of Variance

 


1) Scale Variable

The variable must be a scale measurement type. You are not concerned with parametric assumptions for variables that are nominal or ordinal measurement types.

List of variables in SPSS

 

A scale variable (interval or ratio) measures quantity and where every unit of measure is at equal divisions. Equal divisions means that 4 feet is 2x longer than 2 feet and that 10 minutes is 5x longer than 2 minutes.

Ruler as example of equal intervals

 


2) Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a probability function that describes how the values of a variable are spread out. It is a symmetric distribution showing that data near the mean are more frequent in occurrence and the probabilities for values further away from the mean taper off equally in both directions. In a graph, normal distribution will appear as a bell-shaped curve.

Bell curve of normal distribution

 

You can examine a scale variable for normal distribution either with a histogram (as above) or with a Q-Q plot (not shown). You can test for normal distribution with the Kolmogorov-Smirnov test or the Shapiro-Wilk test. To start the analysis, click on Analyze > Descriptive Statistics > Explore.

Explore menu path in SPSS

 

This will open the Explore dialogue box. Move the scale variable to be tested into the right-hand Dependent List: box. [As a side note: you can put a categorial variable in the Factor List: box if you want to split the dependent list variable in order to test each group separately.] Next in the Display section (at the bottom), tick the Plots radio button. Finally, open the Plots options button (on the far right side).

Explore dialogue box in SPSS

 

This will open the Explore: Plots dialogue box. Tick the Normality plots with tests option. There are other options you may (or may not) want to tick. When finished click the Continue button, and then the OK button in the original Explore dialogue box.

Explore: Plots dialogue box in SPSS

 


2A) The Result for Normal Distribution

The result will appear in the SPSS Output Viewer. The Kolmogorov-Smirnov test and the Shapiro-Wilk test result appear in the Test of Normality statistics table.

Test of normality result table in SPSS

 

Most often they will agree. However (as is the case in our example), the Kolmogorov-Smirnov test (p = .042) shows the data failed normal distribution, but the Shapiro-Wilk test (p= .090) shows the data passed normal distribution. When they do not agree, most researchers will select the Shapiro-Wilk result. It is a more robust test, it does not have the Lilliefors correction applied, and it manages small sample sizes better.

There are also a variety of charts in the result -- Q-Q plot, Stem & Leaf, Histogram (if you ticked this option), and Boxplot. All of which provide good visual evidence of normal (or non-normal) distribution, as confirmation and a visual inspection into the normality test result.

 


3) Outliers

Another important property within parametric assumptions involves outliers -- you do not want to have many of these. There are several ways to detect these little monsters in the data with charts, such as, Stem & Leaf, Histogram, Q-Q Plots, and Boxplots. Below is a Q-Q plot and a Boxplot of the Muscle_kg data from the earlier Explore result (the outliers are underlined in green).

Q-Q plot in SPSS

 

Boxplot in SPSS

 

There are three outliers in the data, and one is an extreme outlier (marked as an asterisk symbol in the boxplot). With three of these little monsters in the data, you can understand better why the two normality tests are disagreeing. And you can also understand why I call them 'monsters'. In this case, as already stated, you would accept the Shapiro-Wilk result and consider the data as having normal distribution.

If you scroll back up to the Q-Q plot, and imagine the three outliers not there, you can see that the rest of the data (40 out of 43 values which is 93%) has a fairly good distribution around the line of fit. Again this may help to understand why these two tests of normality are contradicting each other. It seems the Kolmogorov-Smirnov test is more influenced by the outliers (thus failing normal distribution), while the Shapiro-Wilk test gives more weight to the 93% majority (thus passing normal distribution).

 


4) Homogeneity of Variance

The final property we want to add into the mix of parametric assumptions is homogeneity of variance. This means a scale variable should have fairly equal variance when split into the respective levels within a categorial variable. For example, the male's data for Muscle_kg should have a similar variance to the female's data for Muscle_kg. I have used a Bar chart (see below) with standard deviation error bars as a good visual check for homogeneity of variance.

Bar chart with standard deviation error bars in SPSS

 

You can see the two error bars (black whiskers) are not exactly equal. And therefore initially you might think the two groups (male and female) do not have homogeneity of variance. However, you can allow for a certain amount of discrepancy from exact equality and still not violate homogeneity of variance.

The error bar in the female data is about 40% longer than the error bar in the male data. This small percentage of difference is allowable; and in fact it is not until the difference exceeds 200% (double) or even 300% (triple) that the property of homogeneity of variance is violated... amazing!

Homogeneity of variance must also be checked when testing two scale variables against each other. In this case a Scatter/Dot chart can be used as a good visual check.

Scatter/Dot chart for homogeneity of variance in SPSS

 

We can see that throughout the Weight variable (70kg - 75kg - 80kg - 85kg - 90kg) the Muscle_kg variable is spread between a fairly parallel pathway. Well except at the 100kg, however there are only two values out that far which is a small percentage (4.5%) of all the values. Therefore is is fairly reasonable to say these two variables have homogeneity of variance.

 


Review

A quick review of our top four parametric assumptions:

  • a scale (interval or ratio) measurement
  • normal distribution
  • few (or no significant) outliers
  • homogeneity of variance

 


Further Study

Happiness... you should now be able to test for parametric assumptions in SPSS. However, if you want to explore further, here are two sites:


Tests for Differences

AA Team Guide for the Student T-test

The Student T-test (aka: Independent Samples T-test) compares the means of two independent groups to determine if there is reasonable evidence (within the sample) that the population means for these two groups are statistically significant in their difference.

 


Test Assumptions

(1) The dependent variable (test variable) is continuous (interval or ratio).

(2) The independent variable (factor variable) should be two independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other group.

(3) The samples (participants) for each group are taken at random from the population.

(4) The sample size for both groups have roughly the same number of participants.

(5) The dependent variable (test variable) has a reasonably normal distribution for both groups. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(6) The dependent variable (test variable) has equal variance (homogeneity) between both groups. Homogeneity means you want the standard deviation measurements for the two groups to be reasonable the same to each other.

 


Quick Quiz

(Q1) Does the dependent variable (Weight) have a reasonably normal distribution for both groups (male and female)?

Histogram charts to inspect for normal distribution

 

(Answer: Yes). The data for both groups is certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve... happiness.

 

(Q2) Does the dependent variable (Weight) have a equal variance between both groups (male and female)?

Boxplots to inspect for equal variance

 

(Answer: Yes). The variance (whisker-to-whisker) for both groups, although not exactly equal, is certainly not excessively different to each other. But wait... no, no, no... there are a few outliers, and this could violate one of the assumptions of this test. One interesting point regarding these outliers is that none are measured as extreme. In SPSS extreme outliers are marked with the asterisk (*) symbol in a Boxplot chart.

Here is where SPSS will not help you. You as the researcher must look at the SPSS results and make some relevant interpretation. In this example there are 3 outliers, and the Student T-test would prefer 0 outliers. You the researcher will need to make a decision and support that decision with evidence.

In the write-up for this test you could indicate that you elected to run the Student T-test because the data met the assumptions of normal distribution and homogeneity of variance across the two groups. Equally, there are similar sample sizes in the two groups with 21 female to 22 males (include the histogram charts, a gender frequency table, and a Kolmogorov-Smirnov or Shapiro-Wilk test as evidence). However, not all the assumptions for this test were met perfectly. There were 3 outliers which 1) is only 6.9% of the data, and 2) none of the 3 outliers were measured as extreme (including the Boxplot chart as evidence). In an ideal world this test prefers 0 outliers, but the few outliers that exist are certainly not excessive in number or significant in distance from the median.

 


Student T-test

To start the analysis,click Analyze > Compare Means > Independent-Samples T Test

Student T-test menu path in SPSS

 

This will bring up the Independent-Samples T Test dialogue box. To carry out the test, move the dependent (scale) variable into the Test Variable(s): placard. Next move the independent (nominal or ordinal) variable into the Grouping Variable: placard. Click on the Define Groups... button, and enter the correct numeric values that represent each group. Click the Continue button, and then click the OK button at the bottom of the main dialogue box.

Student T-test dialogue box in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Group Statistics table there are the key group metrics -- sample size (N), mean, and standard deviation. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 3 kg difference in weight between the females and the males -- the females (83.42 kg) weigh only 3.7% more than the males (80.40 kg). You would not expect this small difference to be statistically significant. Equally there is a reasonably similar standard deviation measurement for the two gender groups, and therefore, you would expect that the two groups do not violate homogeneity of variance.

Student T-test result in SPSS

 

In the Independent Samples Test table there are the key test metrics -- equality of variance and then all the t-test measurements. In this example, first we see (as we estimated earlier from the two standard deviations) the two groups do not violate homogeneity as the p-value (0.147) in the Levene's Test for Equality of Variances is above the critical 0.05 alpha level. Therefore, in the second part of this table, we read (and report) all the metrics from the top row which is labeled, Equal variances assumed.

These measurements in this second part of the table give you the t-score, the degrees of freedom, the p-value, the mean difference, and the 95% C.I. of the difference. Here in this example the t-score (1.407) is relatively small, and we were expecting that as there is only a 3 kg difference in weight. Equally, the p-value (0.167) is above the critical 0.05 alpha level indicating this difference between the females weight and the males weight in not statistically significant, which also we were expecting as the 3 kg difference is only a 3.7% magnitude of change.

Finally, the 95% C.I. of the difference provides a high / low range as to where the difference (3 kg) between these two gender groups might actually exist in the population. Here the male's weight could actually be 7.3 kg lower than the female's weight, or the male's weight could actual overtake and exceed the female's weight by 1.3 kg. This is a range of 8 kg from the high to low which indeed is very narrow -- excellent. But keep in mind this range moves from a negative scale, and crosses the 0 threshold, and then moves into a positive scale. So, at some point the difference could be 0 kg, that is, the females and males weigh the same -- a nil difference.

 


Further Study

Happiness... you should now understand how to perform the Student T-test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Mann-Whitney U Test

The Mann-Whitney U test compares the medians or mean ranks of two independent groups and is commonly used when the dependent variable is either categorial (ordinal) or continuous (interval or ratio), and does not meet the assumptions for the Independent Samples T-test (aka:Student T-test).

 


Test Assumptions

(1) The dependent variable (test variable) can be categorial (ordinal) or continuous (interval or ratio) in its measurement type.

(2) The independent variable should be two independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other group

(3) The samples (participants) for each group are taken at random from the population.

(4) The sample size can be disproportionate or unbalanced in the number of participants in each group.

(5) The dependent variable (test variable) for one or both groups can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.

(6) The dependent variable (test variable) for one or both groups may (or may not) have a similar shape (homogeneity) in its variance. It is extremely unlikely that the variance of the two groups will be identical, and therefore, the Mann-Whitney U test will test between the mean ranks of the dependent variable for both groups.

 


Quick Quiz

(Q1) Would you use the Mann-Whitney U test for the following data on users and non-users of a weight training supplement ?

Frequency table and tests for normality in SPSS

 

(Answer: Yes). The participant count (frequency) for the two groups is certainly not in a balanced proportion with User at 52 (73%) and Non-user at 19 (27%). Also the dependent variable (Muscle_kg) violates normal distribution for the smaller Non-user group as indicated by both the Kolmogorov-Smirnov (p = .006) and the Shapiro-Wilk (p = .020) tests of normality, which are below the critical 0.05 alpha level.

 

(Q2) Does the Boxplot give support for using the Mann-Whitney U test to compare between the users and non-users of the weight training supplement ?

Boxplot for supplement users and their muscle data in SPSS

 

(Answer: Yes). The total variance (whisker-to-whisker and including outliers) between the two groups, although not exactly equal, is certainly not wildly different to each other. And you could argue the two groups have homogeneity of variance. However, there are a several outliers in the non-user group; and one is an extreme outlier, as marked with the asterisk (*) symbol. This number and condition of outliers in the non-user group would give support for choosing the Mann-Whitney U test to analyse the data.

 


Mann-Whitney U Test

To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples...

Mann-Whitney U menu path in SPSS

 

This will bring up the Two-Independent-Samples Tests dialogue box. To carry out the test, move the dependent variable (scale or ordinal) into the Test Variable List: placard. Next move the independent variable (nominal or ordinal) into the Grouping Variable: placard. Click on the Define Groups... button, and enter the correct numeric values that represent each group. Click the Continue button. Verify that the Mann-Whitney U test is selected in the Test Type section. Finally, click the OK button on the bottom of the main dialogue box.

Mann-Whitney U (Two Independent Samples) dialogue box in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N) and mean rank. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 3.8 point difference between the mean rank of the non-users (19.84) and the mean rank of the users (23.71) as regards to their muscle mass. This is a differences of 4 places in rank, and you would not expect this small difference to be statistically significant.

Mann-Whitney U test result in SPSS

 

In the Test Statistics table there are the key test metrics -- the Mann-Whitney U score, the p-value (Asymp. Sig.), and some researchers will also report the Z score. In this example, we see (as we estimated earlier from the two mean ranks) the difference between the two groups is not statistically significant as the p-value (0.316) is above the critical 0.05 alpha level. In your report write-up you should also include the Mann-Whitney U score as further support that indicates the difference is not statistically significant.

The Mann-Whitney U tests converts the raw data values for the dependent variable into a rank -- 1st, 2nd, 3rd, 4th, and so forth. Then it adds all the converted ranks for all the participant in their respective group to achieve that group's total "sum of ranks". If you divide the sum of ranks by the number of participants, you will get the mean rank (or what is the typical participant's rank). Remember in statistics we tend to determine 1) what is a typical member in my sample and 2) what is the variance around that typical member.

The Mann-Whitney U test is much simpler to understand and appreciate, as it is not concerned with normal distribution of the dependent variable, and it is not concerned with homogeneity of variance between the two groups.

 


Further Study

Happiness... you should now understand how to perform the Mann-Whitney U test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the One-Way ANOVA Test

In this tutorial, we will look at how to conduct the One-Way ANOVA test in SPSS (aka: One Factor ANOVA or One-Way Analysis of Variance), and how to interpret the results of the test. The One-Way ANOVA test compares the means of three or more independent groups to determine if there is reasonable evidence that the population means for these three or more groups have a statistically significant difference.

 


Test Assumptions

(1) The dependent variable (test variable) is continuous (interval or ratio).

(2) The independent variable (factor variable) is categorical (nominal or ordinal) and should be three or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other groups.

(3) The samples (participants) for each group are taken at random from the population.

(4) The sample size for all the groups have roughly the same number of participants.

(5) The dependent variable (test variable) has a reasonably normal distribution for each group. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(6) The dependent variable (test variable) has equal variance (homogeneity) between all the groups. Homogeneity means you want the standard deviation measurements for the groups to be roughly the same to each other.

 


Quick Quiz

(Q1) Does the dependent variable (Fat %) have a reasonably normal distribution across the three groups?

Histogram chart for percentage of fat

 

(Answer: Yes). The data for the three groups is certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve... happiness.

 

(Q2) Does the dependent variable (Fat %) have a equal variance between the three groups?

Boxplot chart for percentage of fat

 

(Answer: Yes). The variance (whisker-to-whisker) for the three groups, although not exactly equal, is certainly not excessively different to each other. But wait... no, no, no... there are a few outliers, and this could violate one of the assumptions of this test. One interesting point regarding these outliers is that none are measured as extreme. In SPSS extreme outliers are marked with the asterisk (*) symbol in a Boxplot chart.

Here is where SPSS will not help you. You as the researcher must look at the SPSS results and make some relevant interpretation. In this example there are two outliers, and the One-Way ANOVA test would prefer zero outliers. You the researcher will need to make a decision and support that decision with evidence.

In the write-up for this test you could indicate that you elected to run the ANOVA test because the data met the assumptions of normal distribution and homogeneity of variance across the three groups. Equally, there are similar sample sizes from 13 to 15 participants in each group (include the histogram charts, a participants per sessions Frequency table, and a Kolmogorov-Smirnov or Shapiro-Wilk test as evidence). However, not all the assumptions for this test were met perfectly. There were two outliers which 1) is only 4.6% of the data, and 2) neither of the two outliers were measured as extreme (including the Boxplots as evidence). In an ideal world this test prefers 0 outliers, but the few outliers that exist are certainly not excessive in number nor significant in distance from the median.

 


One-Way ANOVA Test

To start the analysis,click Analyze > Compare Means > One-Way ANOVA

One-Way ANOVA menu path in SPSS

 

This will bring up the One-Way ANOVA dialogue box. To carry out the test, move the dependent (scale) variable into the Dependent List: placard. Next move the independent (nominal or ordinal) variable into the Factor: placard. There is an Options... button, where you can select descriptive statistics for the three groups and a homogeneity of variance test, and there are other extra statistics and a mean plot.

Note: If the dependent variable violates the homogeneity of variance test (a p-value above the critical 0.05 alpha level), then researchers will re-run the One-Way ANOVA test and in the Options... section they will select the Welch statistics. This variant of the One-Way ANOVA test is not concerned with homogeneity of variance between the different groups.

After selecting any extra options that you want, click the Continue button, and then click the OK button at the bottom of the main dialogue box.

One-Way ANOVA dialogue box in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Descriptives table there are the key group metrics -- sample size (N), mean, standard deviation, and 95% C.I. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 1.5 and 1.9 point difference in fat (%) between the three groups with 3 times a week at 14.8%, and 4 times a week at 16.3%, and 5 times a week at 14.4%. You would expect this small difference to not be statistically significant. Equally there are almost identical standard deviation measurements for the three groups; and therefore, you would expect that the groups do not violate homogeneity of variance.

One-Way ANOVA test results in SPSS

 

In the Test of Homogeneity of Variances table there are the key test metrics for homogeneity (equality) of variance. If the data were normally distributed and there were no significant outliers, than you would expect all four measurements to agree. And this is true with our example with all the p-values virtually the same between 0.95 and 0.96. The measurement you would refer to (and quote) in your write-up would be the top row titled, Based on Mean. Here in our example the p-value is 0.955 (well above the critical 0.05 alpha level) which indicates that between the three groups the dependent variable does not violate homogeneity of variance.

Finally, in the ANOVA table you have the the degrees of freedom, the F-score, and the p-value. Here in this example the F-score (1.418) is relative small, and we were expecting that as there is only a 1.5 and 1.9 point difference in fat (%). Equally, the p-value (0.254) is above the critical 0.05 alpha level indicating this difference between the three groups in not statistically significant, which we were expecting based on the results in the Descriptives table as mentioned earlier.

 


Post Hoc Testing

Secondary post hoc testing can be completed if the original One-Way ANOVA result indicated that the differences between the groups were statistically significant . You would need to re-run the test, and in the One-Way ANOVA dialogue box there is a Post Hoc button you would click.

Post Hoc dialogue box in SPSS

 

This will open the One-Way ANOVA: Post Hoc Multiple Comparison dialogue box, and you can select one of the Post Hoc tests recommended by your tutor. The three most common seem to be: LSD, Bonferroni, and Tukey; and in my example I have selected the LSD test.

 


Post Hoc Results

Post Hoc test results in SPSS

 

In the Multiple Comparison table you are looking for any comparison with a large mean difference which should result in a corresponding p-value below the critical 0.05 alpha level. In my example there are two comparisons like this, which are the 3 times / week to the 5 times / week, and the 4 times / week to the 5 times / week. In your write-up you would list these two comparisons with the evidence of the mean difference and p-value respectively.

 


Further Study

Happiness... you should now understand how to perform the One-Way ANOVA in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Kruskal-Wallis Test

The Kruskal-Wallis test compares the medians or mean ranks of three or more independent groups and is commonly used when the dependent variable is either categorial (ordinal) or continuous (interval or ratio), and does not meet the assumptions for the One-Way ANOVA test.

 


Test Assumptions

(1) The dependent variable (test variable) can be categorial (ordinal) or continuous (interval or ratio) in its measure type.

(2) The independent variable should be three or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants from the other groups.

(3) The sample size can be disproportionate or unbalanced in the number of participants in each group.

(4) The dependent variable (test variable) for one or all the groups can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.

(5) The dependent variable (test variable) for one or all the groups may (or may not) have a similar shape (homogeneity) in its variance. It is extremely unlikely that the variance for the groups will be identical, and therefore, the Kruskal-Wallis test will test between the mean ranks of the dependent variable for the all the groups.

 


Quick Quiz

(Q1) Do the three bread types (White, Brown, Seeded) have balanced or equal proportions?

Frequency tables for bread types in SPSS

(Answer: No) The Brown and Seeded bread types are fairly balanced (equal) in their sample size at 24 (16.1%) and 28 (18.8%) respectively. However the White bread type has a sample size that is more than 3 times larger at 97 (65.1%). The Kruskal-Wallis test is more suited to manage groups with disproportionate (unequal) sample sizes.

 

(Q2) Do the three bread types (White, Brown, Seeded) have a normal distribution?

Histogram chart for saturated fat in three bread types

(Answer: No) The Seeded bread appears the most normal with the data values located around the mean (top of the bell curve). However, the Brown bread is starting to show a higher distribution of data values on the left tail and some outliers above 2.0 grams. And the White bread has increased this same skewed distribution (overweight on the left tail and outliers on the right tail) to a much higher degree with several extreme outliers at 4.0 to 8.0 grams. The Kruskal-Wallis test is more suited to manage groups where their test data are not normally distributed and/or have a high number of outliers.

 


Kruskal-Wallis Test

To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples

Kruskal-Wallis menu path in SPSS

 

This will bring up the Tests for Several Independent Samples dialogue box. To carry out the test, move the dependent (scale or ordinal) variable into the Test Variable List: placard. Next move the independent (nominal or ordinal) variable into the Grouping Variable: placard. Click on the Define Range... button, and enter the correct numeric values that represent all the groups. Click the Continue button. Verify that the Kruskal-Wallis test is selected in Test Type section. Finally, click the OK button at the bottom of the main dialogue box.

 

Dialogue box for Kruskal-Wallis test in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N) and mean rank. From these measurements you should develop an intuitive perspective as to the whether the Kruskal-Wallis test will indicate a statistically significant difference or not. Here is this example, there is approximately a 9.1 point difference between the mean rank of the White bread (65.12) and the mean rank of the Brown bread (74.29) as regards to their saturated fat. You would not expect this moderate (13.9%) difference to be statistically significant.

However, there is approximately a 44.7 point difference between the mean rank of the White bread (65.12) and the mean rank of the Seeded bread (109.84). You would expect this large (68.6%) difference to be statistically significant. Equally, there is approximately a 35.5 point difference between the mean rank of the Brown bread (74.29) and the mean rank of the Seeded bread (109.84). You would expect this large (47.8%) difference to be statistically significant

 

Kruskal-Wallis test result is SPSS

 

In the Test Statistics table there are the key test metrics -- the Kruskal-Wallis H score, the degrees of freedom (df), and the p-value (Asymp. Sig.). In this example, we see (as we estimated earlier from the mean ranks between the bread types) the difference between the three groups is statistically significant as the p-value (0.000) is below the critical 0.05 alpha level. In your report write-up you should also include the Kruskal-Wallis H score as further support that indicates the difference is statistically significant.

The Kruskal-Wallis test converts the raw data values for the dependent variable into a rank -- 1st, 2nd, 3rd, 4th, and so forth. Then it adds all the converted ranks for all the participant in their respective group to achieve that group's "sum of ranks". If you divide the sum of ranks by the number of participants, you will get the mean rank (or what is a typical participant's rank). Remember, in statistics we tend to determine 1) what is a typical member in my sample and 2) what is the variance around that typical member.

The Kruskal-Wallis test is much simpler to understand and appreciate, as it is not concerned with normal distribution of the dependent variable, and it is not concerned with homogeneity of variance between the three groups.

Sadly, what it does not report is exactly between which groups the statistical difference exists. In our example, is the statistical difference between the White and Brown breads, or between the White and Seeded breads, or between the Brown and Seeded breads? To find exactly where the statistical difference exists between our three bread groups, you would need to run three separate Mann-Whitney U tests on each of three pair-wise comparisons listed above.

 


Further Study

Happiness... you should now understand how to perform the Kruskal-Wallis test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Paired Samples T-test

The Paired Samples T-test (aka: Repeated Measures T-test) compares the means of two measurements taken from the same participant or sample object. It is commonly used for a measurement at two different times (e.g., pre-test and post-test score with an intervention administered between the two scores), or a measurement taken under two different conditions (e.g., a test under a control condition and an experiment condition).

The Paired Samples T-test determines if there is evidence that the mean difference between the paired measurements is significantly different from a zero difference.

 


Test Assumptions

(1) The dependent variable (test variable) is continuous (interval or ratio).

(2) The independent variable consist of two related groups. Related groups means the participants (or sample objects) for both measurements of the dependent variable are the same participants.

(3) The participants (or sample objects) are taken at random from the population.

(4) The dependent variables (test variables) have a reasonable normal distribution. Normal distribution means you do not want data heavily skewed to the left or right tails, and you do not want significant outliers (better to have no outliers).

(Note) When testing the assumptions related to normal distribution and outliers, you must create and use a new variable that represents the difference between the two paired measurements. Do not test the original two paired measurements themselves.

 


Quick Quiz

(Q1) You want to examine the alertness of both male and female students at 09:00 am lectures and at 1:00 pm (after lunch) lectures. Do you use the Independent Samples T-test or the Paired Samples T-test?

University students at lectures

 

(Answer: Independent Samples T-test). Although the experiment design sounds like a before and after intervention, it would be highly unlikely that at the two different times (09:00 am and 1:00 pm) the students in the lectures would be the same identical students.

 

(Q2) You have surveyed students on what they eat (over one week) for breakfast and lunch. A diet-plan app has calculated the energy level of the food eaten for each participant. You used SPSS to create a new variable which is the difference between the breakfast meal and the lunch meal, and you created a histogram to check for normal distribution and outliers. From the histogram below, would you use the Independent Samples T-test or the Paired Samples T-test?

A histogram chart for energy (kcal) in SPSS

 

(Answer: Paired Samples T-test). Here in this experiment design there are the same students surveyed for their breakfast meal and for their lunch meal. Equally, the histogram shows a very reasonable normal distribution (no extreme skewness on the left or right tails) and with no significant outliers... happiness!

 


Paired Samples T-test

To start the analysis,click Analyze > Compare Means > Paired Samples T Test

Paired Samples menu path in SPSS

 

This will bring up the Paired-Samples T Test dialogue box. To carry out the test, move the two dependent (scale) variables into the Paired Variables: placard. And then click the OK button at the bottom of dialogue box.

Paired Samples menu path in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Paired Samples Statistics table there are the key group metrics -- sample size (N), mean, and standard deviation. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there is approximately a 505 calorie difference (on average) in the lunch meal (880 calories) and the dinner meal (1385 calories) which is a 57.4% increase in calories from lunch to dinner. You would expect this sizeable difference to be statistically significant.

Paired Samples T-test results in SPSS

 

In the Paired Samples Test table there are the key test metrics -- 95% confidence intervals, the t-score, the degrees of freedom (df), and the p-value. In this example, we can see (as we estimated earlier from the two means) the t-score (34.242) is extremely large, and we were expecting this as there was a 505 calorie difference between the two meals. Equally, the p-value (0.000) is well below the critical 0.05 alpha level indicating the difference in calories between the lunch meal and dinner meal is statistically significant, which also we were expecting as the 505 calorie difference is a 57.4% magnitude of change.

Finally, the 95% C.I. of the difference provides a high / low range of accuracy as to where this difference (505 calories) between the two meals might actually exist in the population. Here the calorie difference could actually be as high as 534 calories or as low as 474 calories. This is only a 60 calorie range from high to low providing strong confidence that the mean difference in our sample accurately represents what is likely to be the mean difference in the population.

 


Further Study

Happiness... you should now understand how to perform the Paired Samples T-test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Wilcoxon Sign Test

The Wilcoxon Sign test (aka: Wilcoxon Signed-Rank) compares the means of two measurements taken from the same participant or sample object. It is commonly used for a measurement at two different times (e.g., pre-test and post-test score with an intervention administered between the two scores), or a measurement taken under two different conditions (e.g., a test under a control condition and an experiment condition).

The Wilcoxon Sign test determines if there is evidence that the mean difference between the paired measurements is significantly different from a zero difference.

 


Test Assumptions

(1) The dependent variable (test variable) is continuous (interval or ratio) or it can be categorical (ordinal).

(2) The independent variable consist of two related groups. Related groups means the participants (or sample objects) for both measurements of the dependent variable are the same participants.

(3) The participants (or sample objects) are taken at random from the population.

(4) The dependent variables (test variables) do not need to have a normal distribution. This test does not require normality or homoscedasticity (the data having the same scatter or spread) within the dependent variables. Non-normal distribution means the data can be skewed to the left or right tails, and the data can have a significant number outliers.

 


Quick Quiz

(Q1) You want to examine caffeine markers in a group of students. One week the students will receive a normal cup of coffee (control group), and the next week the same students will receive a cup of coffee with an additive (experiment group). The research is set up as a double blind, so that neither the students nor the researches know which cup of coffee is normal or with the additive. Could you use the Wilcoxon Sign test to analyse the data?

Students drinking coffee

 

(Answer: Yes) The experiment design is set-up as two related (dependent) groups tested twice, once as the control group and once as the experiment group.

 

(Q2) You have collected the data for the two coffee groups (control and experiment). You used SPSS to create a boxplot to visualise the data side-by-side. From the boxplot, as an intuitive perspective, would the Wilcoxon Sign test indicate a statistically significant difference?

Boxplot chart in SPSS

 

(Answer: Yes) Here in this boxplot there is very little overlap between the two interquartile ranges (IQR). Remember, the IQRs represent 50% the the data values. Therefore for almost 50% of experiment group (or greater if we include the whiskers), the caffeine markers are different from when the same person was in the control group.

 


Wilcoxon Sign Test

To start the analysis,click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples

Wilcoxon Sign Test menu path is SPSS

 

This will bring up the Two-Related-Samples Tests dialogue box. To carry out the test, move the two dependent (scale or ordinal) variables into the Test Pairs: placard. And then click the OK button at the bottom of dialogue box.

Dialogue box for Wilcoxon Sign test in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Ranks table there are the key group metrics -- sample size (N), mean rank, and sum of ranks. From these measurements you should develop an intuitive perspective as to the whether the test will indicate a statistically significant difference or not. Here is this example, there are 0 negative ranks to 50 positive ranks -- take note there are only 50 students in the sample. So, out of 50 students all of them had a positive rank, not a single student had a negative rank. If you had 50 darts and threw that at a dart board (a random action), would they all land on the top half and not a single dart would land on bottom half? Never! Something is happening here that is violating the laws of random probability and equality. If there is no bias, trickery, or tom-foolery, if everything is equal with the students and the coffee, then you would expect 25 negative ranks and 25 positive ranks. As the result is extremely skewed with everyone in a positive rank, we would expect the test to indicate the difference is statistically significant.

Wilcoxon Sign Test result in SPSS

 

As the footnotes under the Ranks table indicate a negative rank is where the experiment group's caffeine marker (BPM) is lower than the same person's caffeine marker when they were in the control group. In other words, their second test measurement (experiment) was lower than their first test measurement (control). And, of course, a positive rank is just the opposite. As mentioned earlier, in the mathematics of random probability we are expecting a 25 to 25 ratio, that is, half the students to have a lower second score and half the students to have a higher second score. The further we move away from this equal and random ratio, the more likely the result will be statistically significant.

In the Test Statistics table there are the key test metrics -- the test score (Z) and the p-value (Asymp. Sig). In this example, we can see (as we estimated earlier from the negative and positive ranks) the z-score (6.169) is extremely large, and we were expecting this as there was a 0 to 50 negative to positive ratio in the ranks. Equally, the p-value (0.000) is well below the critical 0.05 alpha level indicating the difference in the caffeine markers for the control to the experiment is statistically significant, which also we were expecting as the 0 to 50 ratio in ranks is a 100% magnitude of change.

 


Further Study

Happiness... you should now understand how to perform the Wilcoxon Sign Test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


Tests for Relationships

AA Team Guide for the Chi-Square Test for Independence

In this guide, we will look at how to conduct the Chi-Square Test for Independence (aka: Chi-Square Test of Association or Pearson's Chi-Square Test), and how to interpret the results of the test. The Chi-Square Test for Independence determines whether there is a relationship (association) between categorical variables. Equally, this test only determines an association between categorical variables, and will not provide any indications about causation.

 


Test Assumptions

(1) Only categorical variables can be analysed.

(2) Each categorical variable (nominal or ordinal) and should have two or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants in the other groups in the variable.

(3) The samples (participants) for each variable are taken at random from the population.

(4) The categorical variables are not paired samples (pre-test/post-test observations).

(5) There should be relatively large sample sizes for each group in all the variables (e.g. the expected frequencies should be at least 5 for the majority (80%) of the groups for all the variables).

 


Quick Quiz

(Q1) You want to test for an association between which gender is likely to indicate better lighting will improve safety. Can you use the two variable listed below?

Two frequency tables in SPSS

 

(Answer: Yes). Both variables are categorical in their type. Equally there are adequate sample sizes for all the groups.

 

(Q2) Does the clustered bar chart for the two test variables indicate there is likely to be a statistically significant association?

A clustered bar chart in SPSS

 

(Answer: Yes). We can see the male are more associated with the No response, while the females are more associated with the Yes response.

 


Chi-Square Test for Independence

To start the analysis, click Analyze > Descriptive Statistics > Crosstabs

Menu path for Chi-Square Test for Independence in SPSS

 

This will bring up the Crosstabs dialogue box. To perform the analysis, move one categorical variable into the Row(s) placard and the other categorical variable into the Column(s) placard. Next, click on the Statistics option button.

The Crosstabs dialogue box in SPSS

 

In the Crosstabs: Statistics box tick the Chi-Square option, and then click the Continue button to return to the main dialogue box. After returning to the main Crosstabs dialogue box, click the Cells option button.

In the Crosstabs: Cell Display box tick the Observed and Expected options, and then click the Continue button to return to the main dialogue box.

The Crosstabs: Cells Display dialogue box in SPSS

 

Next, at the bottom left corner of the main dialogue box, tick the Display clustered bar charts option. Finally, click the OK button at the bottom of the main dialogue box.

 


The Result

The result will appear in the SPSS Output Viewer. The Crosstabulation table provides the observed count and expected count for the groups in relation to each categorical variable. Similar to the clustered bar chart discussed earlier, these observed and expected counts should give you an intuitive perspective as to whether an association is likely (or not likely) to exist.

In our example, for the males we observed a 14 to 7 (No/Yes) split, and we should have achieved a 11 to 11 split; we are roughly 3 (No) and 4 (Yes) participants out of balance from our expected split. For the females we observed a 5 to 12 (No/Yes) split, and we should have achieved a 9 to 9 split; we are roughly 4 (No) and 3 (yes) participants out of balance from our expected split.

Result for the Chi-Square Test for Independence in SPSS

 

The Chi-Square Tests table provides the test metrics -- Pearson Chi-Square statistic and the p-value. Here in our example we have a reasonable strong Pearson test statistics (5.216) and a p-value (0.022) which is below the critical alpha level, and therefore indicating a statistical significant result.

In your write-up you should quote both these metrics as evidence that there is a statistically significant association between males who are more likely to answer No to the question if better lighting would improve safety, while females are more likely to answer Yes to this same question.

 


Further Study

Happiness... you should now understand how to perform the Chi-Square Test for Independence in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Pearson's Correlation

The Pearson Correlation test (aka: Pearson Product-Movement) measures the strength and direction, which is the r coefficient in the test, of a linear relationship between two continuous variables. The Pearson's correlation attempts to draw a line of best fit through the data of the two variables, and the r coefficient indicates how far away these data points are from this line of best fit. (e.g., if the data values are all compacted on and squeezed around the line, the r coefficient is high. And conversely, if the data values are spread out and dispersed away from the line, the r coefficient is low).

 


Test Assumptions

(1) The two test variables are continuous (interval or ratio).

(2) There is a linear relationship between the two test variables

(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.

(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).

(5) The two test variables have equal variance (homogeneity) when compared to each other. Homogeneity means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.

 


Quick Quiz

(Q1) Do the two test variables have a reasonably normal distribution?

Histogram charts in SPSS

 

(Answer: Yes). The data for both variables are certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve. That said, the distribution for Protein is starting to spread toward the two tails which may be questionable, and a Kolmogorov-Smirnov or Shapiro-Wilk test would be advisable to run to confirm any suspicions.

 

(Q2) Do the two test variables have homogeneity between each other?

Scatter chart as relationship between protein and energy

 

(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Protein and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.

 


Pearson's Correlation Test

To start the analysis,click Analyze > Correlate > Bivariate

Pearson's Correlation menu path in SPSS

 

This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Pearson option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.

Pearson's Correlation dialogue box in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Pearson's Coefficient (which is the r score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the r score (.373) is low-medium in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this low-medium correlation is statistically significant.

Pearson's Correlation test result in SPSS

 

I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the r score can vary from a -.999 to .000 to +.999. The further the r score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.

These measurements indicate that as the protein levels in the 149 breads tested increases so also energy (kcal) increases as the r score is a positive number. However, the strength (or magnitude) of this correlation is low-medium (r = .373). Finally, this mild correlation is statistically significant (p <.001) which implies there is good evidence from the sample data that this correlation between protein levels in breads and energy (kcal) is very likely to exist for white, brown, and seeded breads in general.

Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.

 


Further Study

Happiness... you should now understand how to perform the Pearson's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Spearman's Correlation

The Spearman's Correlation test (aka: Spearman Rank) measures the strength and direction, which is the rho coefficient (rs) in the test, of a monotonic relationship between two continuous or ordinal variables.

The Spearman's correlation is the nonparametric version of the Pearson's correlation, that is, the Spearman's correlation should be used when the parametric assumptions (normal distribution and homogeneity of variance) for the Pearson's correlation are violated.

 


Test Assumptions

(1) The two test variables are continuous (interval or ratio) or they can be categorical (ordinal).

(2) There is a monotonic relationship between the two test variables.

(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.

(4) The two test variables (for one or both) can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.

(5) The two test variables (for one or both) can have unequal variance (heterogeneity) when compared to each other. Heterogeneity means the variance in the data (as plotted between the two variables) will not be the same along the entire line of best fit.

 


Quick Quiz

(Q1) Do the two test variables have a reasonably normal distribution?

Histogram charts in SPSS

 

(Answer: No). The data for the Energy (kcal) variable certainly has a normal distribution with the data values (the blue bins) centrally gathered in and around the top of the bell curve. However, the data for the Fats variable in heavily skewed and has some extreme outliers beyond 6.0 grams. As both variables do not meet the assumption of normal distribution, the Spearman's Correlation test should be used.

 

(Q2) Is the relationship between the two test variables linear or monotonic?

Scatter chart for Fats and Energy (kcal) in SPSAS

 

(Answer: monotonic). The movement (rate of change) of the plotted data values is always progressing in a positive direction. However, it is an exponential rate of change from 0.0 to 2.0 grams, and then from 2.0 to 8.0 grams the rate of change becomes relatively flat. As this relationship is monotonic, a Spearman's Correlation test should be used.

 


Spearman's Correlation Test

To start the analysis,click Analyze > Correlate > Bivariate

Pearson's Correlation menu path in SPSS

 

This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Spearman option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.

Bivariate Correlations dialogue box in SPSS

 


The Result

The result will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Spearman's rho (which is the rs score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the rs score (.543) is high-moderate in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this high-moderate correlation is statistically significant.

Test result for Spearman's Correlation in SPSS

 

I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the rs score can vary from a -.999 to .000 to +.999. The further the rs score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.

These measurements indicate that as the fat levels in the 149 breads tested increases so also energy (kcal) increases as the rs score is a positive number. Equally, the strength (or magnitude) of this correlation is high-moderate (rs = .543). Finally, this high-moderate correlation is statistically significant (p <.001) which implies there is good evidence that this correlation between fat and energy is very likely to exist for white, brown, and seeded breads in general.

Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.

 


Further Study

Happiness... you should now understand how to perform the Pearson's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:


AA Team Guide for the Linear Regression

The Linear Regression test (aka: Simple Regression) can be seen as the continuation of correlation, that is, the two test variables 1) should have a correlation with each other, and 2) the correlation should be statistically significant.

Here in linear regression we want to be able to predict the value of one test variable by the value of the other test variable (hence the need for a correlation between them). The variable we want to predict is called the dependent variable (or outcome variable); and the variable we are using to predict is called the independent variable (or predictor variable).

 


Test Assumptions

(1) The two test variables are continuous (interval or ratio).

(2) There is a linear relationship between the two test variables

(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.

(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).

(5) The two test variables have equal variance (homogeneity / homoscedasticity) when compared to each other. Homogeneity (or homoscedasticity) means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.

(6) After completing the linear regression test, you will need to check that the residuals (errors) of the regression line have a reasonably normal distribution as confirmation that the regression model is reliable.

 


Quick Quiz

(Q1) For a linear regression test which predictor variable could you use for the outcome variable Energy (kcal)?

A correlation matrix table for multiple variables in SPSS

 

(Answer: Fats). You might say (and you would be correct to say this) that all the predictor variables in the Correlations table (Fats, Sugar, Protein, Fibre) have a correlation with the outcome variable which is Energy (kcal). Equally, every correlation is statistically significant, as all the p-values are below the critical 0.05 alpha level. However, the best predictor variable would be Fats, as it has the highest (.477) coefficient score.

 

(Q2) Do the two test variables have homogeneity (or homoscedasticity) between each other?

Scatter chart for Fats and Energy (kcal) in SPSS

 

(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Fats and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.

 


Linear Regression Test

To start the analysis,click Analyze > Regression > Linear...

Linear Regression menu path in SPSS

 

This will bring up the Linear Regression dialogue box. To carry out the test, move the outcome variable into the Dependent: placard and the predictor variable into the Independent: placard. Next, click the Statistics... button and select the confidence intervals option which is set for the 95% level. Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.

Linear Regression dialogue box with Statistics options in SPSS

 

After returning to the Linear Regression dialogue box, click the Plots... button. Move the *ZPRED variable into the X: axis placard and the *ZRESID variable into the Y: axis placard. [[Tip 1: In any chart the predictor (independent) variable should be on the X axis.]]

Next, select the Histogram and Normal probability plot options. [[Tip 2: Here you are creating a number of charts and plots to test that the residuals (errors) of the regression line have a reasonably normal distribution as per the earlier Test Assumptions section.]]

Click the Continue button at the bottom of this dialogue box to return to the main Linear Regression dialogue box.

Linear Regression dialogue box with Plots options in SPSS

 

After returning to the Linear Regression dialogue box, click the OK button at the bottom of the dialogue box... Wow, that were a lot of boxes!

 


The Result

The result will appear in the SPSS Output Viewer. There are three key tables with several important test metrics. In the Model Summary table, there is R (the Pearson's Correlation coefficient) which is indicating the strength and direction of any correlation between the predictor and outcome variables. Also there is the R 2 (r multiplied by r) which is indicating the amount of shared variance between the two test variables. Here shared variance means to what degree the predictor variable accounts for (or can explain) the variance in the outcome variable. Here is our example the fat levels in the breads tested accounts for or can explain 25.6% (r2 = .256) of the variance in the energy (kcal) levels.

The Model Summary table in the Linear Regression test result in SPSS

 

Next, there is the ANOVA table, and we might consider this as the 'fitting room' for the regression model. You are in a store and you pick out some clothes you want to buy. But you go into a fitting room to see how well the clothes fit to your body shape. This is what is happening here in this ANOVA table. There are different mathematical equations that can be used to predict one value from another value. Here SPSS is testing the fit (suitability) of a linear, straight-line equation with the shape of the two variable used in the model.

The ANOVA table in the Linear Regression test result in SPSS

 

There is F-test score which indicates the strength (magnitude) for how well the linear regression equation fits the two variables in the model, as opposed to the null hypothesis (i.e., there is no (null) fit with the two variables used). There is also the p-value (Sig) for the regression model indicating if the fit (suitability) of the linear regression equation is statistically significant. Here in our example, we have a high magnitude F-test score (50.486), and the model is statistically significant (p < .001) indicating that 1) a linear, straight-line equation has a strong, robust fit with the shape of the correlation between the two variables in this model, and 2) that the fit (suitability) of this linear equation to the shape is statistically significant.

The Coefficients table in the Linear Regression test result in SPSS

 

Finally, there is the Coefficients table which lists the regression equation coefficients, the intercept, and their statistical significance. In our example of white, brown, and seeded breads, the regression equation (Y = A + (B * X1)) would become:

Y (Energy (kcal)) = 221.461 + (5.865 * (Fats))

There are also the 95% C.I. around both Y-intercept (Constant) value and the X-predictor (Fats) value in the regression equation. This gives us a high-low measure of accuracy (confidence) as to how well our sample data values are likely to represent (or include) the actual values in the population. There are also the T-test score and the p-values (Sig) metrics which indicate the strength and statistical significance of the coefficients as compared to the null hypothesis (a zero numeric value).

Coming back to our example if we randomly took a loaf of white, brown, and seeded bread off the supermarket shelf and we read from the label there were 3, or 5, or 8 grams of fats, then we could estimate (predict) the levels of energy (kcal) that loaf is likely to have from our regression equation. And we would have a reasonably high level of confidence that the estimate would be accurate, as the 95% C.I. in the regression model are very narrow -- 213 to 230 for our Y-intercept and 4.2 to 7.5 for our X-predictor (Fats).

 


Secondary Analysis on Residuals

Finally, as required by the earlier test assumptions, we have the charts and plots to confirm if the residuals between the two variables tested in the regression model meet the assumption for normal distribution and the assumption for homoscedasticity (equal variance).

Charts for normal distribution of residuals in SPSS

 

In the both the histogram and P-P plot we can see there is a very reasonable normal distribution for the residuals. The histogram is showing the data values are centrally gathered around the mean (i.e., not skewed to the left or right tails and no significant outliers). Equally, the P-P plot is showing a very tight wrapping (closeness) of the plotted data values to the line of fit.

Charts for homoscedasticity of residuals in SPSS

 

The scatter chart of the predictor (X-axis) to outcome (Y-axis) residuals are showing reasonably good homoscedasticity, that is, there is the same variance in the plotted data value across the chart, there is very little bunching up of the data values to form tight clumps, and the top and bottom halves (split along the the red dashed line) are roughly mirror images of one of the other.

Therefore by this post-testing of the residuals, we have strong confirmation that the regression model meets the required assumptions for the test, and that the test result is therefore reliable.

 


Further Study

Happiness... you should now understand how to perform the Linear Regression test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites: