Most engineers I know do not apply advanced statistical tools in their work. Whatever the reasons this, it is a detriment to engineering work. In this series, my goal is to show some applications of these techniques to mine engineering work using MS Excel. In the first of this series, I will introduce to the reader the Analysis Toolpak add-in in MS Excel and discuss applications of t-tests.
The t-test is a statistical procedure used to compare the population means of two distributions. This is useful for determining whether there is a significant difference between two data sets (e.g. cycle times from two different operators on the same machine, excavator performance before and after changes in blast design, and before and after implementing process improvement strategies). You can read about comparing means in one of my favorite statistical resources here. The procedure (hypothesis test) assumes the data is independent and normally distributed and makes a further assumption of the variances (equal or not equal). You may want to review your knowledge of hypothesis tests here. The typical null hypotheses (and corresponding alternate hypotheses) for the t-test are:
- H0: µ1 = µ2 vs. H1: µ1 ≠ µ2
- H0: µ1 ≥ µ2 vs. H1: µ1 < µ2
- H0: µ1 ≤ µ2 vs. H1: µ1 > µ2
The sample mean and standard deviation are used to estimate the test statistic. The calculated test statistic is then compared to the critical statistic read from the Student’s t distribution at the desired significant level, α. The comparison leads to two possible decisions: (i) the null hypothesis is rejected; or (ii) there is not enough evidence to reject the null hypothesis.
Most engineers were taught this in engineering school. Microsoft Excel’s Analysis Toolpak makes this easier to do (you don’t have to do any calculations yourself). If you don’t know how to activate this add-in you will find instructions here. Once activated, the tool shows up on the “Data” tab under “Analysis.” The tool provides several statistical procedures, including 3 t-tests, which are the subject of this post. The 3 tests provided by the add-in are “Paired two sample for mean”, “Two-sample assuming equal variances”, and “Two-sample assuming unequal variances.” These functions operate on different assumptions of the source and properties of the data: that the two samples represent before-treatment and after-treatment observations on the same subjects; that the population variances are equal; and that the population variances are not equal. The paired two sample for mean test is good for cases where the data is naturally paired (e.g. when you collect data on all your operators before and after making a change and the data identifies each operator’s data as a pair of values).
To illustrate, step-by-step, the process of using this add-in, I am going to use data for a before and after implementing process improvement scenario.
- Prepare your data in two groups in a sheet where you can easily select the ranges.
- Start the data analysis function and select the right test (out of the 3) for your case. In our illustration, we assume the variances are equal (there is a way to test this too). Also, decide what your null hypothesis is (e.g. one of the 3 above). In Excel, the tool allows you to add a constant by hypothesizing a difference in the means (enter zero for means are equal).
- Test the hypothesis for two- or one-tail test depending on whether you want to test for equality of means or one is greater/less than the other. In our example, we are interested in whether the mean after the process improvement is significantly (95% confidence) different from (lower than) the mean before the improvement. In that case, our null hypothesis is number 1 in the list above and we are interested in the two-tail test.
Once you go through this process and run the function, you should obtain results like the one below. The Excel function will always provide you the one- and two-tail results so you need to choose, carefully, which one to use. In our case, we use the two-tail result.
Our result shows that the probability of rejecting the null hypothesis (the p-value) is 1.22 × 10-9 < 0.05, our significance level. Any time our p-value is less than α, we reject the null hypothesis. Therefore, we conclude that there is enough evidence to reject the null hypothesis (that the population means are equal). In other words, our process improvement made a difference in performance and the mean performance is now different.
I hope this explains the basics of t-tests and you are now more likely to use this technique in mine engineering analysis. If you have any comments or questions, use the comment section below to let me know about it. If you have examples of scenarios where you have applied the t-test in mine engineering analysis use the same section to share it.