What is the t-test and when should I use it?
A t-test (sometimes called Student’s t-test) is used to determine if the means of two sample groups are significantly different. So, for LC-MS data, we would typically use a t-test to find out if the amount of our variable of interest (protein, peptide, small molecule, or m/z + retention time combination) is different between experimental groups.
When using a t-test, we assume that the underlying data has a normal distribution, and our experimental groups are of equal size, with equal variance. If the two samples have unequal variances and unequal sample size, we should use the Welch t-test. This option can be selected from the t-test dialog box.
The reason for this is that unequal variances can alter the Type I error rate. That is, we might falsely reject the null hypothesis and report a significant difference of means where no true difference exists. The more unequal the variance of the two populations, the greater the Type I error rate. 1 Therefore, it is important to use the Welch t-test if it is appropriate for your data.
Non-parametric tests (i.e. for data that does not have a normal distribution) are not supported in MarkerView™ Software. Fortunately, the shape of the distribution has little impact on the Type I error rate, and for studies with a large sample size, t-tests can be used even for skewed data.1,2
Once you have defined your experimental groups in MarkerView Software (via the Samples table), you can perform the t-test and it will automatically compare all groups in pairs. A new pane will open with a table containing p- and t-values, plus profile and box-and-whisker plots. There will be more on that later.
The t-value is a measure of how well the variable distinguishes between two groups. Conversely, the p-value is the probability that the delta value would occur by chance. If the value of t exceeds a calculated critical value then the variable does distinguish the groups with some confidence value; t can be positive or negative depending on the direction of the subtraction of the means. The p-value is always positive and the smaller the value, the lower the probability that this is a chance occurrence.
The best way to visualize these results is to plot the p-value computed for each variable versus its log fold change. This is known as a Volcano plot and allows you to see both how large and how significant the specific variable is in distinguishing between the two groups. The example on the left is from our rat dataset. Notice that there are relatively few data points on the plot. This is because we are using a simplified data set for the purposes of this tutorial series. A volcano plot from a “real data” set is shown on the right. This example compares blood samples taken from turtles living in two geographial locations.3
Typically the features we are most interested in are located at the extremes of the x-axis—those with the largest log-fold change and the smallest p-value. Individual or multiple grouped features can be selected directly from the volcano plot by drawing a box around the data point. Features can also be selected directly from the Loadings plot, or by highlighting a row in the t-test data table. Once selected, the underlying data is plotted as an Area or % Response plot. A box-and-whisker plot can also be computed for that feature to show the distribution of the data, where the box spans the interquartile range and the whiskers show the highest and lowest observations.
Let’s revisit our rat example.
- Delacre, M et al. (2019) “Taking Parametric Assumptions Seriously: Arguments for the Use of Welch's F-test instead of the Classical F-test in One-way ANOVA.” International Review of Social Psychology. 32(1):13.
- Fagerland M. W. (2012) “t-tests, non-parametric tests, and large studies--a paradox of statistical practice?”. BMC Medical Research Methodology, 12:78.
- Heffernan A. L. et al. (2017). "Non-targeted, high resolution mass spectrometry strategy for simultaneous monitoring of xenobiotics and endogenous compounds in green sea turtles on the Great Barrier Reef." Sci Total Environ 599–600: 1251-1262.