ExcelStats2007.xls

8/11/2019 ExcelStats2007.xls

1/26

Sheet: Introduction File: 246707238.xls.ms_office Page 1 of 26

ExcelStats 2007.xls Version 1.0 2/15/08Using Excel to do Statistics: Some Helpful Notes

[email protected]

Johnson Graduate School of Management

Cornell University

Ithaca NY 14853

This workbook is intended for teaching purposes. You are welcome to use it in any manner,

and change it as you see fit. It comes without any guarantee whatsoever, and is distributed

free of charge.

This workbook tells you how to do a bunch of Statistics calculations using Excel. Excel has an Add-In

called the Analysis ToolPak. To find out if you have it working, go into the Tools menu and select Add-Ins.

For Excel 2007:

1. Click the Microsoft Office Button , and then click Excel Options.

2. Click Add-Ins, and then in the Managebox, select Excel Add-ins, and click Go.

3. In the Add-Ins availablebox, select the Analysis ToolPakcheck box, and then click OK.

4. In the same box, select the Analysis ToolPak - VBAcheck box, and then click OK.

5. After you load the Analysis ToolPak, the Data Analysiscommand is available in the Analysisgroup on

the Datatab.

Excel has 2 ways to do almost every statistical analysis, and in many cases this workbook illustrates both.

There are separate sheets for each topic listed below. You can find the sheets by selecting the appropriate

"tab" at the bottom of the screen.

Contents:These are the sheets in this workbook.

Introduction

Sorting

Frequencies & Graphs

Histogram

Scatter Plot

Descriptive Statistics

Rank & Percentile

Covariance

Correlation

Sampling

Confidence Intervals

One-Sample t-tests

Two-Sample t-tests

Regression

Additional File Available:

PredInt.xls: Contains a Visual Basic macro to do multiple regression with Prediction Intervals,

a feature that is not included in the Regression tool in the Analysis ToolPak.

If Analysis ToolPakis not listed in the Add-Ins availablebox, click Browseto locate it. If you getprompted that the Analysis ToolPak is not currently installed, click Yesto install it.


2/26

Sheet: Sorting File: 246707238.xls.ms_office Page 2 of 26

Sorting a Data Set

*** Sorting does not require the Data Analysis package.

Sorting changes the data set. If you want to be able to restore the original order of the data, begin by

numbering the data points. The first column of the Example Data Set contains these numbers.

Example Data Set Sorted "Ascending" by a Sorted "Descending" by b

Numb. a b Numb. a b Numb. a b1 1 2 1 1 2 6 5 11

2 2 3 2 2 3 8 6 10

3 3 4 11 2 5 9 5 7

4 4 3 3 3 4 7 8 6

5 3 5 5 3 5 5 3 5

6 5 11 10 3 4 11 2 5

7 8 6 4 4 3 3 3 4

8 6 10 6 5 11 10 3 4

9 5 7 9 5 7 2 2 3

10 3 4 8 6 10 4 4 3

11 2 5 7 8 6 1 1 2

Instructions for the Sort Menu Item

Select the data set. (Hold down the Left Mouse Button and drag the cursor over cells A6 to C17.)

On the Datatab, select Sort

Use the arrow next to the Sort Bywindow to select afrom the pull-down list and then Click OK

The results should look like the data in the first shaded area next to the Example Data Set above.

Now repeat the steps, except select bfrom the pull-down list, and Largest to Smallestunder Order.

The results should look like the data in the second shaded area next to the Example Data Set.

To return the data set to its original order, repeat the steps, selecting Numb.from the pull-down list,

and select Smallest to Largestunder Order.

The Add Levelbutton may be used to resolve ties.

For example, select Sort Byaand Smallest to Largest, and then click Add Level.

In the new boxes, select Then byband Smallest to Largest.

Compare the results to the first shaded area next to the Example Data Set. The order

has been arranged so that bis ascending when ais constant. For example, look at the

three points for which a= 3. The values for bare 4, 4 and 5, whereas in the first shaded area they

are 4, 5 and 4.


3/26

Sheet: Sorting File: 246707238.xls.ms_office Page 3 of 26


4/26

Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 4 of 26

Counting and Graphing Frequency of Observations

Data may or may not be numerical. The four counting functions illustrated below take into account

COUNTA counts all entries, ignoring blanks.

COUNT counts only numbers, excluding blanks.

COUNTBLANK counts the number of blank cells.

COUNTIF counts the number of entries that match a specified condition.

Data Excel Functions:

a d a Entries = 15 =COUNTA($A$9:$A$24)

1 High Numbers = 13 =COUNT($A$9:$A$24)

2 Low Blanks = 1 =COUNTBLANK($A$9:$A$24)

3 Med

4 Med a Freq.

3 Med 1 3 =COUNTIF($A$9:$A$24, "=" & E13)

High 2 5 =COUNTIF($A$9:$A$24, "=" & E14)

5 Low 3 3

2 Med 4 1

1 Med 5 1 Range to be Counted:

2 Low 6 0 The Condition:

- Low

? Med d Entries = 16

1 High Numbers = 0

3 Low Blanks = 0

2 Low

2 Low d Freq.

Low 7 =COUNTIF($B$9:$B$24,"=" & E25)

Med 6 =COUNTIF($B$9:$B$24,"=" & E26)

High 3 =COUNTIF($B$9:$B$24,"=" & E27)

Graphing Frequencies

Frequencies may be graphed in several ways. We will illustrate two kinds of bar charts and a pie chart.

Standard Bar Chart (Column

Chart):

Excel has a Chart Wizard to help you. It works much faster if you select the range that contains your data

before you start making the graph. So begin by selecting the range E24:F27 above. Then,

On the Inserttab in theChartsgroup, select Column.

Then click on the first picture under 2-D Column

A chart appears, and your next job is to make it look like the figure above.

As long as the chart is selected, you should see Chart Tools at the top of the screen.

On the Layouttab, in the Labelsgroup, click on Legendand select None

0

5

10

Low Med High

Frequency

Distribution of Shipment Sizes


5/26


click on Legendand select None

click on Axis Titles, select Primary Vertical Axis Title, Rotated Title

and then type the word Frequency. (This enters the title for the Y axis.)

click on Chart Title, select Above Chart, and type the words Distribution of Shipment Sizes.

Right-click on a blank spot to the right of the chart title, and select Font, and change the font size to 10.

Now you may move your chart and change its size. To move it, just click once on it and drag it to a

new location. To change the size, click once on it and use the "handles" (little black boxes) on the corners.

Horizontal Bar

Chart with Stacked

Bars:

Select the range that contains your data and labels. This is E24:F27 in the example.

On the Inserttab in theChartsgroup, select Bar.

Then click on the second picture under 2-D Bar Important: On the Designtab, in the Datagroup, click the Switch Row/Columnbutton.

Now you may resize and move the graph as you please.

0 5 10 15 20

Freq.Low

Med

High


6/26


Pie Chart:

Select the range that contains your data and labels, E24:F27.

On the Inserttab in theChartsgroup, select Pie

Then click on the first picture under 2-D Pie Click on the title and hit the Deletekey.

Click on the legend and hit the Deletekey.

On the Layouttab in the Labelsgroup, select Data Labels, More Data Label Options.

In the popup window, un-check all of the options except the following:

Under "Label Contains", select Category Name, Percentage,and Show Leader Lines

Under "Label Position" select Outside End

Low

44%

Med

37%

High

19%


7/26

Sheet: Histogram File: 246707238.xls.ms_office Page 7 of 26

Histograms

The Histogram tool in the Data Analysis package is a fast way to get a picture and table of the distribution

of your data. An example is shown below. Also shown are the Excel functions that give the same information.

NOTE: The Histogram tool cannotdescribe more than one variable at a time.

The term Binrefers to the Upper L imitof the range for which the frequency is calculated. Bin 2 in the table

below has frequency 6, because "Data" contains 6 values that are strictly greater than 1and less-than-or-equal to 2.

Output from Histogram Tool Excel Functions

Data Bin Frequency Bin Freq. Cumul.

1 1 4 1 4 4

2 2 6 2 6 10

3 3 4 3 4 14

4 4 1 4 1 15

3 More 1 1000 1 16

1

5

2

1

2

3

2

1

3

2

2

Instructions for the Histogram Data Analysis Tool.

On the Datatab, the Analysis group, selectData AnalysisSelect Histogram(double-click on it, or select OK)

Select the Input Range window, and either type or select

the area that contains the data.

On the first try, leave Bin Rangeblank. Later you may

wish to customize the histogram by putting a range

into this box (an example is given later).

If your area includes names for the variables,

select the Labels checkbox.

If you want the results to be written on the current

worksheet, select the Output Range

button, then click on the window next to that button and either type in or select a location for the output.

For example, if you type D8, the output will begin at

cell D8 and continue down and to the right.

Check Chart Outputif you want Excel to create a graph.

Click OK

UNFORTUNATE NOTE: At the time of this writing, there is an unfixed bug in Excel. If you try to move the

chart created by Histogram, it will separate into two pieces. However, if you save the file, then close it, and then

open it again, the chart will remain in one piece. So I recommend doing that now.

0

1

2

3

4

56

7

1 2 3 4 More

Frequency

Bin


8/26

Sheet: Histogram File: 246707238.xls.ms_office Page 8 of 26

Improving the appearance of the histogram(after saving, closing and reopening the file):

The chart above, created by the Histogram tool, has been modified to look better.

First, I changed what was displayed inside the chart:

Delete the title (right-click on it and select Delete).

Delete the legend (same way).

Center the plot area in the chart (click above one of the bars and drag).

Next, I changed its shape:

Single-click on the chart and drag one of the "handles" (little boxes in the corners).

Formatting the numbers on the axes:

Sometimes the histogram tool creates bins with many more decimal places than is necessary. This

has an unfortunate effect on the appearance of the horizontal axis, but it is easy to fix.

Since the problem did not occur in the example above, we first have to create the problem and then fix it.

To create the problem:

Select cell D9and enter the formula = 1/3

Now look at the graph. Notice how the display has changed on the horizontal axis. Not very pretty, is it?

To fix the problem:

The format of the numbers on the chart is the same as their format on the spreadsheet. Therefore,

Select the range of numbers below the word "Bin" (cells D8:D13 in the example above).

On the Home tab, in the Number group, in the pull-down list, select More Number Formats

In the popup window, select Numberfrom the list of options and change the decimal places to 2.

Notice that the numbers in the graph are now displayed with 2 decimal places, which looks better.

You can use this method for any axis on any Excel graph, displaying however many decimal places

are appropriate for the situation.

Using Bins that You Choose

To tell Excel what bins you want to use for the data,

put the Bin Rangein this box.

Notice that I had to include one cell abovethe

desired range of bins, because the "Labels" box is

checked.

Output from Histogram Tool

Desired Bins Desired Bins Frequency

2 2 10

4 4 5

6 6 1

More 0


9/26

Sheet: Scatter Plot File: 246707238.xls.ms_office Page 9 of 26

Scatter Diagrams (Scatter Plots)

Scatter Plots offer a way to visualize the relationship between two variables. Excel's Chart Group

makes it fairly easy to construct one. An example is shown below.

Example Data Set:

a y b

1 33 2

2 23 3

3 14 4

4 55 3

3 3 5

5 44 11

8 35 6

6 98 10

5 41 7

3 77 4

2 8 5

Instructions for Scatter Plots

Follow these steps to reproduce the chart above. Notice that it plots aand b, but that in the data, variable

y is in the column between aand b.

Begin by selecting the data range. Click on cell A6. Then, holding down the Cntlkey, click and drag to cell

A17; then, continuing to hold Cntl, click on C6 and drag to C17. This selects both aand b, leaving out y.

On the Insert tab, in the Charts group, select Scatter, and click on the first picture.

Right-click on the legend (on the right side of the chart) and select Delete

Click on the chart title and type "Plot of a vs. b" Right-click on the title, select Font and change the font size to 12.

On the Layout tab, use the Axis Titles button to insert titles for both axes as shown.

On the Layout tab, use the Gridlines button to insert gridlines as shown.

Now you may move your chart and change its size. To move it, just click once on it and drag it to a

new location. To change the size, click once on it and use the "handles" (little black boxes).

0

2

4

6

8

10

12

0 2 4 6 8 10

b

a

Plot of a vs. b


10/26

Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 10 of 26

Descriptive Statistics

The Descriptive Statistics tool in the Data Analysis package is a fast way to get a bunch of numbers that

describe your data. An example is shown below, together with the built-in Excel functions that give the

same information. Copy the Excel Functions to the next column to get a description of variable b.

Example Data Set Output from Descriptive Statistics Tool Excel Functions:

a b a a1 2

2 3 Mean 3.818182 3.818182

3 4 Standard Error 0.615234 0.615234

4 3 Median 3 3

3 5 Mode 3 3

5 11 Standard Deviation 2.040499 2.040499

8 6 Sample Variance 4.163636 4.163636

6 10 Kurtosis 0.260801 0.260801

5 7 Skewness 0.730477 0.730477

3 4 Range 7 7

2 5 Minimum 1 1Maximum 8 8

Sum 42 42

Count 11 11

Largest(2) 6 6

Smallest(2) 2 2

Confidence Level(95.0%) 1.370826 1.370826

Instructions for the Descriptive Statistics Data Analysis Tool.

On the Datatab, the Analysis group, selectData Analysis

Select Descriptive Statistics(double-click on it, or select OK)

Select the Input Rangewindow, and either type or select the area that contains the data.

If your data is arranged so that each vertical column

represents a variable, select the Columnsbutton.

If your input range includes names for the variables,

select the Labels In checkbox.


worksheet, select the Output Rangebutton,

click on the window next to that button and

either type in or select a location for the output.

(If you type E6, the output will begin at cell E6

and continue down and to the right.)

Most Important:Check the Summary Statisticsbox.

Confidence Level for the Meanbox gives a

"Confidence Level" in the output, which is equal to

half of the width of a confidence interval.

Kth Largest orKth Smallest:

Checking the boxes and entering "2" as shown

causes the output to include the second smallest and

to include the second smallest and second largest

second largest values in the data set.


11/26

Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 11 of 26

Click OK


12/26


13/26

Sheet: Covariance File: 246707238.xls.ms_office Page 13 of 26

Sample Covariance

Covariance measures the degree to which things "vary together". In that regard it is almost the

same as correlation (see the next page). In fact, correlation is more useful for quantifying the

relationship between two variables. The most common use of Covariance is when you are adding

two random variables, such as when you are forming a portfolio of different stocks.

Unfortunately,Excel does not offer an "unbiased" sample estimate of covariance. This is an error that should

have been remedied long ago, but Microsoft has not seen fit to fix it. To understand the problem, consider

Excel's variance function. There are two versions: Sample variance VAR() and Population Variance VARP().

Both of these compute the sum of squared differenced from the sample mean. However, Sample Variance

corrects for a statistical bias by dividing that sum by (n-1), where n is the size of the sample. Population Variance

divides by n, and therefore gives a smaller answer. Population Variance is correct ONLY IF the sample

is, in fact, the entire population. Sample Variance is appropriate when the sample is a small fraction of the

population, which is the more usual case.

To be consistent, Excel should have called their covariance function COVP() or the Population Covariance,

and should change the definition of COV() to Sample Covariance and calculate it using (n-1).

Until they make such a change, you can obtained unbiased estimates of covariance by multiplying Excel's

values by the ratio n/(n-1). The example below, in red, does this correction.

Finally, please note that the diagonal values in the covariance table are variances . Thus, 1.25 is the Population

variance of a and 1.66667 is the Sample Variance of a .

Example Data Set with n= 4

a b c Covariance Data Analysis Tool

1 2 5 a b c

2 3 4 a 1.25

3 5 2 b 1 1.25

4 4 3 c -1 -1.25 1.25

Covariance Excel Function (Population Covariance) Sample Covariance: Excel Function multiplied by n/(n-1)

a b c a b ca 1.25 a 1.6666667

b 1 1.25 b 1.3333333 1.6666667

c -1 -1.25 1.25 c -1.333333 -1.666667 1.6666667

Instructions for the Covariance Data Analysis Tool.


Select Covariance(double-click on it, or select OK)

Select the Input Rangewindow, and either type or select the area that contains the data.


represents a variable, select the Columnsbutton. Otherwise, select the Rowsbutton.


select the Labels in first row checkbox.



button, then click on the window next to that button

and either type in or select a location for the output.

For example, if you type F15, the output will begin at


14/26

Sheet: Covariance File: 246707238.xls.ms_office Page 14 of 26

cell F15 and continue down and to the right.

Make sure that the Output Rangedoes notoverlap with

the Input Range.

Click OK

Remember, if you want Unbiased estimates, multiply Excel's Covariance by n/(n-1).


15/26

Sheet: Correlation File: 246707238.xls.ms_office Page 15 of 26

Sample Correlation

Correlationis a way to quantify a linear relationship between variables. The value of correlation

is between -1 and +1.Positive correlation means that the variables tend to move in the same

direction. That is, if one variable is above its mean, the other one is likely to be above its mean, too.

Height and weight of people are positively correlated, because very tall people usually weigh more

than very short people. Note that this is not always true, so the correlation is less than +1.0.

Negative correlation means that they tend to move in opposite directions. Mountain climbers know

that there is a negative correlation between altitude and stamina, because of decreasing oxygen.

Correlation of +1 or -1 means that the relationship between the two variables is perfectly linear.

When this happens, a "scatter plot" of the two variables yields a straight line. In the example below,

variables b and c have correlation of -1.

Example Data Set: Correlation Data Analysis Tool:

a b c a b c

1 2 5 a 1

2 3 4 b 0.8 1

3 5 2 c -0.8 -1 1

4 4 3

Correlation Excel Function:

a b c

a 1

b 0.8 1

c -0.8 -1 1

Instructions for the Correlation Data Analysis Tool.


Select Correlation(double-click on it, or select OK)Select the Input Rangewindow, and either type or select the area that contains the data.


represents a variable, select the Columnsbutton.

Otherwise, select the Rowsbutton.


select the Labels checkbox.




and either type in or select a location for the output.Make sure that the Output Rangedoes notoverlap

with the Input Range.

Click OK

0

2

4

6

0 2 4 6

b

a 0

2

4

6

0 2 4 6

c

b


16/26

Sheet: Sampling File: 246707238.xls.ms_office Page 16 of 26

Random Sampling

Example Data Set: Example After Sorting:

FUND RandNo FUND RandNo

Benchmarrk Div 0.999969 Freedom Cash 0.078951

Bradford 0.172857 Capital Cash 0.082888

BT INstit Treas 0.263466 Fortis 0.110691

Capital Cash 0.082888 Flex-fund 0.119541

Fidelity Cash 0.275826 Nationwide 0.165838

Flex-fund 0.119541 Bradford 0.172857

Fortis 0.110691 MarketWatch 0.183844

Freedom Cash 0.078951 Piermont Money 0.220191

Galaxy Money 0.291818 BT INstit Treas 0.263466

MarketWatch 0.183844 Fidelity Cash 0.275826

Nationwide 0.165838 NCC Funds 0.27604

NCC Funds 0.27604 Galaxy Money 0.291818

Piermont Money 0.220191 Benchmarrk Div 0.999969

To select a random sample of size n,

Put random numbers into the column next to the data set (instructions given below).Select the fi rst random numberand then go to the Datatab and press this button:(See alternative instructions on worksheet Sorting.)

Your sample is the first n rows.

Here is how to put random numbers into cells B17:B29:

Tab: Data,Analysis group,Data Analysis, Random Number GenerationNumber of Variables: (leave blank)Number of Random Numbers: (leave blank)Distribution: UniformParameters Between:0and 1Output Range: B17:B29

For a Sample of 8,

choose the first 8 after

sorting on the Random

Numbers.


17/26

Sheet: Confidence Intervals File: 246707238.xls.ms_office Page 17 of 26

Confidence Intervals

There are two ways to do confidence intervals: useBuilt-in Excel functions , or use information from

theDescriptive Statistics toolin the Data Analysis package. They are both illustrated below.

Confidence Intervals from the Descriptive Statistics Data Analysis Tool.

First,generate the descriptive statistics (see the Descriptive Statssheet in this workbook): On the Datatab, the Analysis group, selectData Analysis, Descriptive Statistics

Select your data range,

Check the Confidence Level for the Meanbox and enter your desired confidence level in the box,

Check the Summary Statisticsbox.

Click OK. You should get the output shown below.

Example Data Set Output from Descriptive Statistics Tool

a b a

1 2

2 3 Mean 3.818182

3 4 Standard Error 0.615234

4 3 Median 3

3 5 Mode 3

5 11 Standard Deviation 2.040499

8 6 Sample Variance 4.163636

6 10 Kurtosis 0.260801

5 7 Skewness 0.730477

3 4 Range 7

2 5 Minimum 1

Maximum 8

Sum 42

Count 11

Confidence Level (95.0%) 1.370826

Then, to get the confidence interval, add and subtract the "Confidence Level"from the "Mean".

Calculations: Lower Confidence Limit: 2.447356 = E16 - E29

Upper Confidence Limit: 5.189008 = E16 + E29

Interpretation:We have 95% confidence that the population mean for variable a

is in the interval 2.447 to 5.189.

Confidence Intervals using Built-in Excel Functions.

Basic Calculations: Average: 3.818182 =AVERAGE(A15:A25)

Standard Deviation: 2.040499 =STDEV(A15:A25)Sample Size, n: 11 =COUNT(A15:A25)

Probability Calculations: Confidence: 0.95

Student's t (2-tail): 2.228139 =TINV(1-E42,E41-1)

The Confidence Interval: Lower Confidence Limit: 2.447356 =E39-E43*E40/SQRT(E41)

Upper Confidence Limit: 5.189008 =E39+E43*E40/SQRT(E41)


18/26

Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 18 of 26

One-Sample t-Test

The easiest way to do a One-Sample t-Test in Excel is to use a Confidence Interval. However, this method

does not give a p-value directly. The second method is to construct the test statistic and compare it to a

criti cal value. The test statistic can be used to compute a p-value. Both methods are illustrated below.

One-Sample t-Tests using Confidence IntervalsTwo-tail test: set up a (1 - a) confidence interval (see the sheet Confidence I ntervalsfor instructions)

and rejectH0(the Null Hypothesis) if the value specified in H0is outside the confidence interval.

One-tail test: use a confidence level of (1 - 2a) and rejectH0if the value specified in H0is

outside of the confidence interval in the direction predicted by the Alternative Hypothesis , Ha.

Example Data

a Calculation of the Conf idence I ntervals:

1 Two-tail test:For a= 0.05, One-tail test:For a= 0.05,

2 set up a 95% confidence interval: set up a 90% confidence interval:

3 Average: 3.8181818 Average: 3.8181818

4 Standard Deviation: 2.040499 Standard Deviation: 2.040499

3 Sample Size, n: 11 Sample Size, n: 11

5 Confidence: 0.95 Confidence: 0.90

8 Student's t (2-tail): 2.2281389 Student's t (2-tail): 1.8124611

6 Lower Confidence Limit: 2.4473559 Lower Confidence Limit: 2.7030948

5 Upper Confidence Limit: 5.1890077 Upper Confidence Limit: 4.9332688

3

2

Hypothesis Tests using Conf idence I ntervals:

In the following examples, assume that 4.4 has been given as the value to use in the null hypothesis.(This is the value often referred to as m0. Thus, m0= 4.4 in the examples.)

Two-tail test: One-tail test:

Example 1:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example 2:H0: m< 4.4, Ha: m> 4.4

Reject H0if 4.4 is outside the confidence interval. Reject H0if 4.4 is above the upper confidence limit.

Result: 4.4 is between 2.447 and 5.189, Result: 4.4 is not above 4.933,

so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

Example 3:H0: m> 4.4, Ha: m< 4.4

Reject H0if 4.4 isbelow the lower confidence limit.

Result: 4.4 is not below 2.703

so the null hypothesis is NOTrejected.

Note: If you choose to do a one-tail

test, you must do one or the other of

these, NEVER BOTH.


19/26

Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 19 of 26

One-Sample t-Tests using the Test Statistic

The test statistic is (sample average - m0)/(standard error). The cri tical levelis the value from the

Student's t distribution. There are two ways to test the hypothesis (they give the same result):

Hypothesis test using the Test Statistic:

For 2 tails, rejectH0if the test statistic is larger in absolute value than the critical level.

For 1 tail, rejectH0if the test statistic is larger than the critical level in the direction predicted by Ha.

Hypothesis test using the P-values:

For 2 tails, rejectH0if thep-value is smaller than a.

For 1 tail, rejectH0if thep-value is smaller than aand the direction is consistent with Ha.

Basic Calculations: Average: 3.8181818

Standard Deviation: 2.040499

Sample Size, n: 11

Probability Calculations: Hypothesized Value, m0: 4.4

a: 0.05

t-ratio, or Test Statistic: -0.945687

p-value, one-tail: 0.183299t Critical one-tail: 1.8124611

p-value, two-tail: 0.3665981

t Critical two-tail: 2.2281389

Tests using the Test Statistic (t-r atio):


Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

Result: absolute value of t-ratio of 0.94569 is Result: sample average of 3.818 is below 4.4,

smaller than the critical value of 1.81246, which IS NOTconsistent with Ha,

so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

Example:H0: m> 4.4, Ha: m< 4.4

Result: sample average of 3.818 is below 4.4,

which IS consistent with Ha,

but the absolute value of the t-ratio of 0.94569 is

smaller than the critical value of 2.22814


Same Tests, using the p-value


Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

Result: p-value of 0.3666 is larger than 0.05 Result: sample average of 3.818 is below 4.4,

so the null hypothesis is NOTrejected. which IS NOTconsistent with Ha,so the null hypothesis is NOT rejected.

Example:H0: m> 4.4, Ha: m< 4.4

Result: sample average of 3.818 is below 4.4,

which IS consistent with Ha,

but the p-value of 0.1833 is larger than 0.05,


Note: Test Statistic and P-value

ALWAYS GIVE THE SAME RESULT.

Note: Test Statistic and P-value

ALWAYS GIVE THE SAME RESULT.


20/26

Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 20 of 26

Two-Sample t-Tests.

There are three t-Tests in the Excel Data Analysis Tools, and each has a corresponding built-in function.

Data Analysis Tool: Excel Spreadsheet Formula:

t-Test: Paired Two-Sample for Means = TTEST(Array1, Array2, Tails, 1)

t-Test: Two-Sample Assuming Equal Variances = TTEST(Array1, Array2, Tails, 2)

t-Test: Two-Sample Assuming Unequal Variances = TTEST(Array1, Array2, Tails, 3)

The formulas above give only the p-value for the test. The Data Analysis Tools give the complete analysis.

t-Tests using the built-in function TTEST(array1, array2, tails, type)

Example Data Set Paired Equal s Unequal s

a b Hypothesized Difference 0 0 0

1 2 p-value, one-tail: 0.016746 0.069742 0.0705883

2 3 p-value, two-tail: 0.033492 0.139485 0.1411765

3 4

4 3 TTEST(Array1, Array2, Tails, Type)

3 5 Array1 is the first data set.

5 11 Array2 is the second data set.

8 6 Tails = 1 for a one-tail test, or 2 for a two-tail test

6 10 Type = 1 for a Paired Two-sampletest

5 7 Type = 2 for a Two-sample test assuming Equalvariance

3 4 Type = 3 for a Two-sample test assuming Unequalvariance

2 5 Example: = TTEST( $A$14:$A$24, $B$14:$B$24, 2, 1)

t-Tests using the t-Test Data Analysis toolsAt this point you should be familiar with how to use the input boxes, so here is a brief list of the steps.

Tab: Data, group:Analysis, Data Analysis, t-Test: Two-Sample Assuming Equal Variances

Put the addresses of the two variables in their respective Input Rangeboxes.

In the Hypothesized Mean Differencebox,

If your "null hypothesis" is that the two population means are equal, leave the box blank.

If your "null hypothesis" is that the two population means are different by a specified amount:

(In this case, the variable hypothesized to have the larger mean MUST BE "Variable 1". For example,

if Ho is that bhas a larger mean than a, then Variable 1 Input Range must contain variable b.)

Then, type the hypothesized difference in the Hypothesized Mean Differencebox.

For example, if the null hypothesis states that Variable 1's population mean is 7.4 units

larger than Variable 2's, enter 7.4 in the Hypothesized Mean Differencebox.

If your Variable Ranges include a name for each variable, Check the Labelsbox.

The Alphabox is where you enter the type I error probability. (Excel's output does not report this

value, so be sure to note what value you used.)

Enter your Output Optionsin the usual way, and click OK.

Examples of each of the Tools are given on the next 2 pages.

Select a cell to see theTTEST formula.

2-tails Paired Two-Sample testArray 1 Array 2


21/26


t-Test: Paired Two Sample for Means, Hypothesized Diff. = 0

a b

Mean 3.818182 5.454545

Variance 4.163636 8.272727

Observations 11 11

Pearson Correlation 0.645926

Hypothesized Mean Difference 0df 10

t Stat -2.46321 Built-in function TTEST

p-value, one-tail: P(T


22/26


t-Test: Two-Sample Assuming Unequal Variances, Hypothesized Diff. = 0

a b

Mean 3.818182 5.454545

Variance 4.163636 8.272727

Observations 11 11

Hypothesized Mean Difference 0

df 18t Stat -1.53897 Built-in function TTEST

p-value, one-tail: P(T


23/26

Sheet: Regression File: 246707238.xls.ms_office Page 23 of 26

Regression

Regression is a method to fit a linear function to a data set.

The objective is to estimate values of b0, b1 and b2 in the following equation:

y = b0 + b1 x1 + b2 x3

In this equation, y is called theDependent Variable (sometimes called the Criterion Variable )

x1 and x2 are called theIndependent Variables (orPredictor Variables ),

b0, is called the "intercept", and b1 and b2 are the "slopes".

Collectively, b0, b1 and b2 are referred to as the Coefficients. (This is their label in the output.)

The results for the Example Data Set are shown below the instructions.

Example Data Set:

y x1 x2

-3 2 5

2 3 4

11 5 2

9 6 4

8 4 3

Instructions for the Regression Data Analysis Tool.


Select Regression(double-click on it, or select OK)

Select the Input Y Rangewindow, and select the area that contains the Dependent Var iable.

Select the Input X Rangewindow, and select the area that contains the I ndependent Var iable(s).

(If there are 2 or more Independent Variables, they

must be side-by-side in the worksheet.)

You may specify Constant is Zeroto force b0

(the intercept) to be zero.

If the first row of your area contains names for the

variables, select Labels in the First Row.

You may set a Confidence Levelfor the confidence

intervals for the coefficients.




and either type in or select a location for the output.

Make sure that the Output Rangedoes notoverlap

with the Input Range.

Select additional output and plots that you would like.

Excel's Normal Probability Plot is incorrect.

Click OK

Note: Graphs produced by Excel's Regression program are badly sized. However, it is easy to change the size

by clicking on the graph and dragging one of the corner "handles". Output is shown below.

http://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htm

As of this writing, Microsoft Excel's Regression package

malfunctions. I recommend using the PredictionInterval

macro that I wrote, which is in a file called PredInt.xls

available on the class web site, with instructions.

An alternative link for this file is:
http://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htmhttp://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htm


24/26


SUMMARY OUTPUT

Regression Statistics

Multiple R 0.99322

R Square 0.986486

Adjusted R 0.972973

Standard E 0.948683Observatio 5

ANOVA

df SS MS F ignificance F

Regressio 2 131.4 65.7 73 0.0135135

Residual 2 1.8 0.9

Total 4 133.2

Coefficients andard Err t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 5.2 2.894823 1.79631 0.2142827 -7.2554266 17.655427 -7.2554266 17.6554266

x1 2.3 0.360555 6.379052 0.0237044 0.7486554 3.8513446 0.7486554 3.851344584

x2 -2.5 0.5 -5 0.0377496 -4.6513279 -0.3486721 -4.6513279 -0.348672137

RESIDUAL OUTPUT

bservatio Predicted y Residuals dard Residuals

1 -2.7 -0.3 -0.447214

2 2.1 -0.1 -0.149071

3 11.7 -0.7 -1.043498

4 9 1.78E-15 2.65E-15

5 6.9 1.1 1.639783

One of the " L ine Fi t Plots" as produced

by Excel. Note that the plot is much "flatter"

than is customary, and it is a "Column Chart".

The other " Li ne F it Plot" after changing

its height to a mor e sui table value,

and converting i t to a " Li ne Chart" .

-5

0

5

10

15

2 3 5 6 4

y

x1

x1 Line Fit Plot

y

Predicted y

-4

-2

0

2

4

6

8

10

12

14

5 4 2 4 3

y

x2 Line Fit Plot

y

Predicted y


25/26


Residual Plots are shown below.

One of the " Residual Plots" as produced

by Excel. Note that the plot is much "flatter"

than is customary. "Scatter Plots" are easier

to interpret if they are nearly square.

The other " Residual Plot" after changing

its height to a mor e sui table value.

NOTE:

Plots of Standardized Residuals are easily

obtained, if you clicked the Standardized

Residuals checkbox. Make copies of the

Residual Plots, and then change the data

source so that the dependent variable is

the standardized residuals.

This plot is an example for variable x2.

Note that I added gridlines as well.

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8Residuals

x1

x1 Residual Plot

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6

Residuals

x2

x2 Residual Plot

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6Residuals

x2 Standardized Residual Plot


26/26


-1.5

-1

x2

ExcelStats2007.xls

Documents

Transcript of ExcelStats2007.xls