In this guide, we will look at how to conduct the Chi-Square Test for Independence (aka: Chi-Square Test of Association or Pearson's Chi-Square Test), and how to interpret the results of the test. The Chi-Square Test for Independence determines whether there is a relationship (association) between categorical variables. Equally, this test only determines an association between categorical variables, and will not provide any indications about causation.
(1) Only categorical variables can be analysed.
(2) Each categorical variable (nominal or ordinal) and should have two or more independent groups in which the samples (participants) have no relationship between the other participants in their group or between the participants in the other groups in the variable.
(3) The samples (participants) for each variable are taken at random from the population.
(4) The categorical variables are not paired samples (pre-test/post-test observations).
(5) There should be relatively large sample sizes for each group in all the variables (e.g. the expected frequencies should be at least 5 for the majority (80%) of the groups for all the variables).
(Q1) You want to test for an association between which gender is likely to indicate better lighting will improve safety. Can you use the two variable listed below?
(Answer: Yes). Both variables are categorical in their type. Equally there are adequate sample sizes for all the groups.
(Q2) Does the clustered bar chart for the two test variables indicate there is likely to be a statistically significant association?
(Answer: Yes). We can see the male are more associated with the No response, while the females are more associated with the Yes response.
To start the analysis, click Analyze > Descriptive Statistics > Crosstabs
This will bring up the Crosstabs dialogue box. To perform the analysis, move one categorical variable into the Row(s) placard and the other categorical variable into the Column(s) placard. Next, click on the Statistics option button.
In the Crosstabs: Statistics box tick the Chi-Square option, and then click the Continue button to return to the main dialogue box. After returning to the main Crosstabs dialogue box, click the Cells option button.
In the Crosstabs: Cell Display box tick the Observed and Expected options, and then click the Continue button to return to the main dialogue box.
Next, at the bottom left corner of the main dialogue box, tick the Display clustered bar charts option. Finally, click the OK button at the bottom of the main dialogue box.
The results will appear in the SPSS Output Viewer. The Crosstabulation table provides the observed count and expected count for the groups in relation to each categorical variable. Similar to the clustered bar chart discussed earlier, these observed and expected counts should give you an intuitive perspective as to whether an association is likely (or not likely) to exist.
In our example, for the males we observed a 14 to 7 (No/Yes) split, and we should have achieved a 11 to 11 split; we are roughly 3 (No) and 4 (Yes) participants out of balance from our expected split. For the females we observed a 5 to 12 (No/Yes) split, and we should have achieved a 9 to 9 split; we are roughly 4 (No) and 3 (yes) participants out of balance from our expected split.
The Chi-Square Tests table provides the test metrics -- Pearson Chi-Square statistic and the p-value. Here in our example we have a reasonable strong Pearson test statistics (5.216) and a p-value (0.022) which is below the critical alpha level, and therefore indicating a statistically significant result.
In your write-up you should quote both these metrics as evidence that there is a statistically significant association between males who are more likely to answer No to the question if better lighting would improve safety, while females are more likely to answer Yes to this same question.
Happiness... you should now understand how to perform the Chi-Square Test for Independence in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Pearson Correlation test (aka: Pearson Product-Movement) measures the strength and direction, which is the r coefficient in the test, of a linear relationship between two continuous variables. The Pearson's correlation attempts to draw a line of best fit through the data of the two variables, and the r coefficient indicates how far away these data points are from this line of best fit. (e.g., if the data values are all compacted on and squeezed around the line, the r coefficient is high. And conversely, if the data values are spread out and dispersed away from the line, the r coefficient is low).
(1) The two test variables are continuous (interval or ratio).
(2) There is a linear relationship between the two test variables
(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.
(4) The two test variables have a reasonably normal distribution. Normal distribution means the data should not be heavily skewed to the left or right tails, and there should not be significant outliers (better to have no outliers).
(5) The two test variables have equal variance (homogeneity) when compared to each other. Homogeneity means you want the variance in the data (as plotted between two variables) to be reasonably the same along the entire line of best fit.
(Q1) Do the two test variables have a reasonably normal distribution?
(Answer: Yes). The data for both variables are certainly not heavily skewed to the left or right tails, and the data values (blue bins) pretty much gather in and around the centre of the bell curve. That said, the distribution for Protein is starting to spread toward the two tails which may be questionable, and a Kolmogorov-Smirnov or Shapiro-Wilk test would be advisable to run to confirm any suspicions.
(Q2) Do the two test variables have homogeneity between each other?
(Answer: Yes). The variance (between the two red dashed lines) from the plotted data values between the two variables (X-axis is Protein and Y-axis is Energy) is reasonably the same along the entire line of best fit. Equally, we can see from this scatter chart that the relationship is linear as the plotted data values are progressing in one direction at relatively the same rate (magnitude) of movement.
To start the analysis,click Analyze > Correlate ;> Bivariate
This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Pearson option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.
The results will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Pearson's Coefficient (which is the r score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the r score (.373) is low-medium in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this low-medium correlation is statistically significant.
I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the r score can vary from a -.999 to .000 to +.999. The further the r score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.
These measurements indicate that as the protein levels in the 149 breads tested increases so also energy (kcal) increases as the r score is a positive number. However, the strength (or magnitude) of this correlation is low-medium (r = .373). Finally, this mild correlation is statistically significant (p <.001) which implies there is good evidence from the sample data that this correlation between protein levels in breads and energy (kcal) is very likely to exist for white, brown, and seeded breads in general.
Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.
Happiness... you should now understand how to perform the Pearson's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites:
The Spearman's Correlation test (aka: Spearman Rank) measures the strength and direction, which is the rho coefficient (rs) in the test, of a monotonic relationship between two continuous or ordinal variables.
The Spearman's correlation is the nonparametric version of the Pearson's correlation, that is, the Spearman's correlation should be used when the parametric assumptions (normal distribution and homogeneity of variance) for the Pearson's correlation are violated.
(1) The two test variables are continuous (interval or ratio) or they can be categorical (ordinal).
(2) There is a monotonic relationship between the two test variables.
(3) The participants (samples) have no relationship between the other participants, and are taken at random from the population.
(4) The two test variables (for one or both) can be non-normal in it distribution. Non-normal distribution means the data can be heavily skewed to the left or right tails, and/or it may have significant outliers.
(5) The two test variables (for one or both) can have unequal variance (heterogeneity) when compared to each other. Heterogeneity means the variance in the data (as plotted between the two variables) will not be the same along the entire line of best fit.
(Q1) Do the two test variables have a reasonably normal distribution?
(Answer: No). The data for the Energy (kcal) variable certainly has a normal distribution with the data values (the blue bins) centrally gathered in and around the top of the bell curve. However, the data for the Fats variable in heavily skewed and has some extreme outliers beyond 6.0 grams. As both variables do not meet the assumption of normal distribution, the Spearman's Correlation test should be used.
(Q2) Is the relationship between the two test variables linear or monotonic?
(Answer: monotonic). The movement (rate of change) of the plotted data values is always progressing in a positive direction. However, it is an exponential rate of change from 0.0 to 2.0 grams, and then from 2.0 to 8.0 grams the rate of change becomes relatively flat. As this relationship is monotonic, a Spearman's Correlation test should be used.
To start the analysis,click Analyze > Correlate > Bivariate
This will bring up the Bivariate Correlations dialogue box. To carry out the test, move the two test variable (scale) into the Variables: placard. Next, in the Correlation Coefficients section be sure the Spearman option is ticked, and untick the Flag significant correlations. Finally, click the OK button at the bottom of the dialogue box.
The results will appear in the SPSS Output Viewer. In the Correlations table there are the key test metrics -- sample size (N), Spearman's rho (which is the rs score) and the p-value (which is Sig.(2-tailed) in SPSS). Here is this example, the rs score (.543) is high-moderate in its strength, and it is positive in its direction. The p-value (.000), which is below the critical 0.05 alpha level, indicates that this high-moderate correlation is statistically significant.
I have included a General Interpretation table for the correlation coefficient metric by Prof. Halsey. Remember, the rs score can vary from a -.999 to .000 to +.999. The further the rs score moves away from .000 (zero) the stronger the correlation is, which it true for both positive or negative scores.
These measurements indicate that as the fat levels in the 149 breads tested increases so also energy (kcal) increases as the rs score is a positive number. Equally, the strength (or magnitude) of this correlation is high-moderate (rs = .543). Finally, this high-moderate correlation is statistically significant (p <.001) which implies there is good evidence that this correlation between fat and energy is very likely to exist for white, brown, and seeded breads in general.
Oops... what's missing? (Answer: the 95% C.I.) To produce the 95% confidence interval, re-run the test; and in the Bivariate Correlations dialogue box, there is the Bootstrap... button. If you open this button, you can activate this metric.
Happiness... you should now understand how to perform the Spearman's Correlation test in SPSS and how to interpret the result. However, if you want to explore further, here are two sites: