ExcelStats.xls

8/11/2019 ExcelStats.xls

1/21

Sheet: Introduction File: 246706782.xls.ms_office Page 1 of 21

ExcelStats.xls Version 7.1 11/4/03Using Excel to do Statistics: Some Helpful Notes

[email protected]

Johnson Graduate School of Management

Cornell University

Ithaca NY 14853

This workbook is intended for teaching purposes. You are welcome to use it in any manner,

and change it as you see fit. It comes without any guarantee whatsoever, and is distributed

free of charge. Changes are f requent, so check back fr equently f or a new version.

This workbook tells you how to do a bunch of Statistics calculations using Excel. Excel has an Add-In

called the Analysis ToolPak. To find out if you have it working, go into the Tools menu and select Add-Ins.

Here is how I give that sort of instruction in these notes: Menu: Tools, Add-Ins

A window with the title Add-Inswill appear, listing the Add-Ins you currently have available.

Menu: Tools, Data Analysis

Excel has 2 ways to do almost every statistical analysis, and in many cases this workbook illustrates both.

There are separate sheets for each topic listed below. You can find the sheets by selecting the appropriate

"tab" at the bottom of the screen.

Contents:These are the sheets in this workbook.

Introduction

Sorting

Frequencies & GraphsHistogram

Scatter Plot

Descriptive Statistics

Rank & Percentile

Covariance

Correlation

Sampling

Confidence Intervals

One-Sample t-tests

Two-Sample t-tests

Regression

Additional Files Available:

PivotTab.xls: Explains how to use Excel's Pivot Tables to do Cross-Tabs.

It also contains a macro to do the Chi-squared test for a contingency table.

PredInt.xls: Contains a Visual Basic macro to do multiple regression with Prediction Intervals,

a feature that is not included in the Regression tool in the Analysis ToolPak.

If you see Analysis ToolPakand also Analysis ToolPak - VBAin the list, make sure the boxes next to them have

check marks. If not, clicking on a box makes the check mark appear. Then click OK.

If either Analysis ToolPakor Analysis ToolPak - VBAis NOT in the list, you have to install it from the disks or

CD that came with Microsoft Office or Microsoft Excel. If you don't know how to do that, get help.Once you have "attached" BOTH Analysis ToolPaks (that is, checked the boxes), a new menu item will appear under

the Tools menu. That item will be used throughout these spreadsheets. Try it now.


2/21

Sheet: Sorting File: 246706782.xls.ms_office Page 2 of 21

Sorting a Data Set

*** Sorting does not require the Data Analysis package.

Sorting changes the data set. If you want to be able to restore the original order of the data, begin by

numbering the data points. The first column of the Example Data Set contains these numbers.

Example Data Set Sorted "Ascending" by a Sorted "Descending" by b

Numb. a b Numb. a b Numb. a b1 1 2 1 1 2 6 5 11

2 2 3 2 2 3 8 6 10

3 3 4 11 2 5 9 5 7

4 4 3 3 3 4 7 8 6

5 3 5 5 3 5 5 3 5

6 5 11 10 3 4 11 2 5

7 8 6 4 4 3 3 3 4

8 6 10 6 5 11 10 3 4

9 5 7 9 5 7 2 2 3

10 3 4 8 6 10 4 4 3

11 2 5 7 8 6 1 1 2

Instructions for the Sort Menu Item

Select the data set. (Hold down the Left Mouse Button and drag the cursor over cells A6 to C17.)

Select menu item Data, Sort

Use the arrow next to the Sort Bywindow to select afrom the pull-down list.

Click OK

The results should look like the data in the first

shaded area next to the Example Data Set above.

Now repeat the steps, except select bfrom the

pull-down list, and click the Descendingbutton.

Click OK

The results should look like the data in the

second shaded area next to the Example Data Set.

To return the data set to its original order, repeat

the steps, selecting Numb.from the pull-down list,

and click Ascending.

The boxes Then Bymay be used to resolve ties.

For example, select Sort Byaand Then Bybusing Ascendingfor both.

Compare the results to the first shaded area next to the Example Data Set. The order

has been arranged so that bis ascending when ais constant. For example, look at the

three points for which a= 3. The values for bare 4, 4 and 5, whereas in the first shaded area they

are 4, 5 and 4.

Header Row No Header Row

My List Has

Ascending

Descending

Sort By

a

Ascending

Descending

Then By

Ascending

Descending

Then By


3/21

Sheet: Frequencies & Graphs File: 246706782.xls.ms_office Page 3 of 21

Counting and Graphing Frequency of Observations

Data may or may not be numerical. The four counting functions illustrated below take into account

COUNTA counts all entries, ignoring blanks.

COUNT counts only numbers.

COUNTBLANK counts the number of blank cells.

COUNTIF counts the number of entries that match a specified condition.

Data Excel Functions:

a d a Entries = 15 =COUNTA($A$9:$A$24)

1 High Numbers = 13 =COUNT($A$9:$A$24)

2 Low Blanks = 1 =COUNTBLANK($A$9:$A$24)

3 Med

4 Med a Freq.

3 Med 1 3 =COUNTIF($A$9:$A$24, "=" & E13)

High 2 5 =COUNTIF($A$9:$A$24, "=" & E14)

5 Low 3 3

2 Med 4 1

1 Med 5 1 Range to be Counted:

2 Low 6 0 The Condition:

- Low

? Med d Entries = 16

1 High Numbers = 0

3 Low Blanks = 0

2 Low

2 Low d Freq.

Low 7 =COUNTIF($B$9:$B$24,"=" & E25)

Med 6 =COUNTIF($B$9:$B$24,"=" & E26)

High 3 =COUNTIF($B$9:$B$24,"=" & E27)

Graphing Frequencies

Frequencies may be graphed in several ways. We will illustrate two kinds of bar charts and a pie chart.

Standard Bar Chart (Column

Chart):

Excel has a Chart Wizard to help you. It works much faster if you select the range that contains your data

before you start making the graph. So begin by selecting the range E24:F27 above. Then,

Select menu item Insert, Chart

Chart Wizard, Step 1 of 4appears showing a selection of chart types.

Under Chart Type:select Column. Then click on the first Chart sub-type.

Click Next>

0

2

4

6

8

Low Med High

Frequency

Distribution of Shipment Sizes


4/21

Sheet: Frequencies & Graphs File: 246706782.xls.ms_office Page 4 of 21

Chart Wizard, Step 2 of 4: The Data rangebox should show the data range you selected before starting.

Click Next>

Chart Wizard, Step 3 of 4:

You can type a title for your chart, and for each axis, if you want to.

Select the Legendtab. Un-checking Show Legendwill cause the legend to disappear.

Click Next>

Chart Wizard, Step 4 of 4allows you to place the chart on the current sheet, or insert a new sheet.

Don't worry!If you place it on the current sheet, you can move it later. It will not affect any of your data.

Click Finish

Now you may move your chart and change its size. To move it, just click once on it and drag it to a

new location. To change the size, click once on it and use the "handles" (little black boxes) on the corners.

Horizontal Bar

Chart with Stacked

Bars:

Select the range that contains your data and labels. This is E24:F27 in the example.


Chart Wizard, Step 1 of 4: Under Chart Type select Bar,and select the secondChart Sub-type.

Click Next>

Chart Wizard, Step 2 of 4: The Data Rangeshould show all the data, including the labels.

Important: Click the Rowsbutton.

Click Next>


On the Axestab, un-check Category (X) axis.

On the Data Labelstab, select Show Value.

Click Finish

='Frequencies & Graphs'!$E$24:$F$27Data range:

Rows

Columns

Data Series in:

='Frequencies & Graphs'!$E$24:$F$27Data range:

7 6 3

0 5 10 15 20

Freq.

Low

Med

High


5/21


6/21

Sheet: Histogram File: 246706782.xls.ms_office Page 6 of 21

Histograms

The Histogram tool in the Data Analysis package is a fast way to get a picture and table of the distribution

of your data. An example is shown below, together with the built-in Excel functions that give the

same information. The Histogram tool cannotdescribe more than one variable at a time.

The chart created by the Histogram tool usually needs to be modified. I have done so in the example.

Instructions for modifying the appearance are given in the last part of this note.

Data Output from Histogram Tool Excel Functions:

a Bin Frequency Bin Freq. Cumul.

1 1 4 1 4 4

2 2 6 2 6 10

3 3 4 3 4 14

4 4 1 4 1 15

3 More 1 1000 1 16

1

5

2

1

2

3

2

1

3

2

2

Instructions for the Histogram Data Analysis Tool.

Select menu item Tools, Data AnalysisSelect Histogram(double-click on it, or select OK)

Select the Input Range window, and either type or select the area that contains the data.

On the first try, leave Bin Rangeblank. Later you may

wish to customize the histogram by putting a range

into this box (an example is given later).

If your area includes names for the variables,

select the Labels checkbox.

If you want the results to be written on the current

worksheet, select the Output Range

button, then click on the window next to that button

and either type in or select a location for the output. For example, if you type D8, the output will begin at

cell D8 and continue down and to the right.

Check Chart Outputif you want Excel to create a graph.

Click OK

D8

New Worksheet Ply:

New Workbook

Chart Output

Cumulative Percentage

$B$8:$B$24Input Range:

Input

Bin Range:

Labels

Pareto (sorted histogram)

Output Range:

Output Options

0

1

2

3

4

5

6

7

1 2 3 4 More

Frequency

Bin


7/21

Sheet: Histogram File: 246706782.xls.ms_office Page 7 of 21

Improving the appearance of the histogram:

The chart above, created by the Histogram tool, has been modified to look better.

First, I changed its shape:

Single-click on the chart and drag one of the "handles" (little boxes in the corners).

Then, I changed what was displayed inside the chart:

Delete the title (right-click on it and select Clear). Delete the legend (same way).

Stretch the plot area to fill the box (click on the grey area and drag the handles).

Formatting the numbers on the axes:

Sometimes the histogram tool creates bins with many more decimal places than is necessary. This

has an unfortunate effect on the appearance of the horizontal axis, but it is easy to fix.

Since the problem did not occur in the example above, we first have to create the problem and then fix it.

To create the problem:

Select cell D9and enter the number 1.23456789

Now look at the graph. Notice how the display has changed on the horizontal axis. Not very pretty, is it?

To fix the problem:

The format of the numbers on the chart is the same as their format on the spreadsheet. Therefore,

Select the range of numbers below the word "Bin" (cells D8:D13 in the example above).

Menu: Format, Cells, Number,

then select Numberfrom the list of options and change the decimal places to 2.

Notice that the numbers in the graph are now displayed with 2 decimal places, which looks better.

You can use this method for any axis on any Excel graph, displaying however many decimal places

are appropriate for the situation.

Using Bins that You Choose

To tell Excel what bins you want to use for the data,

put the Bin Rangein this box.

Notice that I had to include one cell abovethe

desired range of bins, because the "Labels" box is

checked.

Output from Histogram Tool

Desired Bins Bin Frequency

2 2 10

4 4 5

6 6 1

More 0

D75

New Worksheet Ply:

New Workbook

Chart Output

Cumulative Percentage

$B$8:$B$24Input Range:

Input

Bin Range: $C$75:$C$78

Labels

Pareto (sorted histogram)

Output Range:

Output Options


8/21

Sheet: Scatter Plot File: 246706782.xls.ms_office Page 8 of 21

Scatter Diagrams (Scatter Plots)

Scatter Plots offer a way to visualize the relationship between two variables. Excel's Chart Wizard

makes it fairly easy to construct one. An example is shown below.

Example Data Set:

a y b

1 33 2

2 23 3

3 14 4

4 55 3

3 3 5

5 44 11

8 35 6

6 98 10

5 41 7

3 77 4

2 8 5

Instructions for Scatter Plots

Follow these steps to reproduce the chart above. Notice that it plots aand b, but that in the data, variable

y is in the column between aand b.

Begin by selecting the data range. Click on cell A6. Then, holding down the Cntlkey, click and drag to cell

A17; then, continuing to hold Cntl, click on C6 and drag to C17. This selects both aand b, leaving out y.

Excel calls this "non-contiguous range selection."


Chart Wizard, Step 1 of 4shows a selection of chart types.

Under Chart Type:select XY (Scatter). Then click on the first Chart sub-type.

Click Next>

Chart Wizard, Step 2 of 4: The Data rangebox should show the data range you selected before starting.

Note the comma between the two ranges.

Click Next>


Select the Titlestab. Type a title for your chart, and for each axis.

Select the Legendtab. Un-checking Show Legendwill cause the legend to disappear.

Select the Grid Linestab. Check Major Gridlinesfor both the X axis and the Y axis.

Click Next>

Chart Wizard, Step 4 of 4allows you to place the chart on the current sheet, or insert a new sheet. Don't worry!You can move it later, and it will not affect any of your data.

Click Finish

Now you may move your chart and change its size. To move it, just click once on it and drag it to a

new location. To change the size, click once on it and use the "handles" (little black boxes).

0

2

4

6

8

10

12

0 2 4 6 8 10

b

a

Plot of a vs b

='Scatter Plot'!$A$6:$A$17 , 'Scatter Plot'!$C$6:$C$17Data range:


9/21

Sheet: Descriptive Stats File: 246706782.xls.ms_office Page 9 of 21


The Descriptive Statistics tool in the Data Analysis package is a fast way to get a bunch of numbers that

describe your data. An example is shown below, together with the built-in Excel functions that give the

same information. Copy the Excel Functions to the next column to get a description of variable b.

Example Data Set Output from Descriptive Statistics Tool Excel Functions:

a b a a1 2 Mean 3.818182 3.818182

2 3 Standard Error 0.615234 0.615234

3 4 Median 3 3

4 3 Mode 3 3

3 5 Standard Deviation 2.040499 2.040499

5 11 Sample Variance 4.163636 4.163636

8 6 Kurtosis 0.260801 0.260801

6 10 Skewness 0.730477 0.730477

5 7 Range 7 7

3 4 Minimum 1 1

2 5 Maximum 8 8Sum 42 42

Count 11 11

Largest(2) 6 6

Smallest(2) 2 2

Confidence Level(95.0%) 1.370826 1.370826

Instructions for the Descriptive Statistics Data Analysis Tool.

Select menu item Tools, Data Analysis

Select Descriptive Statistics(double-click on it, or select OK)

Select the Input Rangewindow, and either type or select the area that contains the data.

If your data is arranged so that each vertical column

represents a variable, select the Columnsbutton.

If your input range includes names for the variables,

select the Labels In checkbox.


worksheet, select the Output Rangebutton,

click on the window next to that button and

either type in or select a location for the output.

(If you type E6, the output will begin at cell E6

and continue down and to the right.)

Most Important:Check the Summary Statisticsbox.

Confidence Level for the Meanbox gives a

"Confidence Level" in the output, which is equal to

half of the width of a confidence interval.

Kth Largest orKth Smallest: Checking the boxes and entering "2" as shown would cause the output

to include the second smallest and second largest values in the data set.

Click OK

Output Range: E6

New Worksheet Ply:

New Workbook

Output Options

Summary statistics

$A$9:$A$20

Columns

Input Range:

Rows

Input

Confidence Level for Mean: 95Kth Largest: 2

Kth Smallest: 2

Labels in First Row

%

Grouped By:



10/21

Sheet: Rank & Percentile File: 246706782.xls.ms_office Page 10 of 21

Rank and Percentile

The Rank and Percentile tool in the Data Analysis package is a fast way to get a copy of your data,

sorted from largest to smallest, with the associated ranks. An example is shown below.

The Excel functions don't exactly repeat what the tool does. The tool begins by numbering the data

points, then sorting them in descending order, and finally inserting ranks and percent ranks. The two

related Excel functions, RANK() and PERCENTRANK() are shown for the first 3 data points. The

first point in the Data is a =2, and this value is tied for 7th rank. That puts it at the 26.6 percentile

of the data. The second data point is a =4, which is in sole posession of rank of 2, percentile 93.3.

Data Related Excel Functions Output from Rank and Percentile Tool

a Point a Rank Percent

1 2 7 26.60% 7 5 1 100.00%

2 4 2 93.30% 2 4 2 93.30%

3 3 3 66.60% 3 3 3 66.60%

4 1 5 3 3 66.60%

5 3 11 3 3 66.60%

6 1 14 3 3 66.60%

7 5 1 2 7 26.60%8 2 8 2 7 26.60%

9 1 10 2 7 26.60%

10 2 12 2 7 26.60%

11 3 15 2 7 26.60%

12 2 16 2 7 26.60%

13 1 4 1 13 .00%

14 3 6 1 13 .00%

15 2 9 1 13 .00%

16 2 13 1 13 .00%

Instructions for the Rank and Percentile Data Analysis Tool.Select menu item Tools, Data Analysis

Select Rank and Percentile(double-click on it, or select OK)









and either type in or select a location for the output.

Make sure that the Output Rangedoes notoverlap with

the Input Range.

Click OK

$B$10:$B$26

Columns

Input Range:

Grouped By:Rows

Labels in First Row

Input

Output Range: E11New Worksheet Ply:

New Workbook

Output Options

=PERCENTRANK $B$11:$B$26 $B11

=RANK($B11,$B$11:$B$26)


11/21

Sheet: Covariance File: 246706782.xls.ms_office Page 11 of 21

Sample Covariance

Covariance measures the degree to which things "vary together". In that regard it is almost the

same as correlation (see the next page). In fact, correlation is more useful for quantifying the

relationship between two variables. The most common use of Covariance is when you are adding

two random variables, such as when you are forming a portfolio of different stocks.

Excel offers two ways to estimate the covariance between pairs of variables. Unfortunately, they are both

"biased estimators" that divide by n rather than n-1. To obtain the "unbiased sample estimate", multiply by n/(n-1).

(Previous versions of Excel used the unbiased method in the Data Analysis Tool.)

The (n-1) method is almost* always preferred. (Use n if the data set is the entire population.)

The table beginning at cell F20 shows how the built-in Excel function can be modified to use (n-1).

Example Data Set with n= 4

a b c Covariance Data Analysis Tool

1 2 5 a b c

2 3 4 a 1.25

3 5 2 b 1 1.25

4 4 3 c -1 -1.25 1.25

Covariance Excel Function (denominator = n) Covariance Excel Function multiplied by n/(n-1)

a b c a b c

a 1.25 a 1.6666667

b 1 1.25 b 1.3333333 1.6666667

c -1 -1.25 1.25 c -1.333333 -1.666667 1.6666667

Instructions for the Covariance Data Analysis Tool.


Select Covariance(double-click on it, or select OK)




Otherwise, select the Rowsbutton.







For example, if you type F15, the output will begin at

cell F15 and continue down and to the right.

Make sure that the Output Rangedoes notoverlap with

the Input Range.

Click OK

$B$14:$D$18

Columns

Input Range:

Grouped By:Rows

Labels in First Row

Input

Output Range: F15

New Worksheet Ply:

New Workbook

Output Options


12/21

Sheet: Correlation File: 246706782.xls.ms_office Page 12 of 21

Sample Correlation

Excel offers two ways to estimate the correlation between pairs of variables. The value of correlation

is between -1 and +1.Positive correlation means that the variables tend to move in the same

direction. That is, if one variable is above its mean, the other one is likely to be above its mean, too.

Height and weight of people are positively correlated, because very tall people usually weigh more

than very short people. Note that this is not always true, so the correlation is less than +1.0.

Negative correlation means that they tend to move in opposite directions. Mountain climbers know

that there is a negative correlation between altitude and stamina, because of decreasing oxygen.

Correlation of +1 or -1 means that the relationship between the two variables is perfectly linear.

When this happens, a "scatter plot" of the two variables yields a straight line. In the example below,

variables b and c have correlation of -1.

Example Data Set: Correlation Data Analysis Tool:

a b c a b c

1 2 5 a 1

2 3 4 b 0.8 1

3 5 2 c -0.8 -1 1

4 4 3

Correlation Excel Function:

a b c

a 1

b 0.8 1

c -0.8 -1 1

Instructions for the Correlation Data Analysis Tool.


Select Correlation(double-click on it, or select OK)Select the Input Rangewindow, and either type or select the area that contains the data.



Otherwise, select the Rowsbutton.






and either type in or select a location for the output.Make sure that the Output Rangedoes notoverlap

with the Input Range.

Click OK

0

2

4

6

0 2 4 6

b

a 0

2

4

6

0 2 4 6

c

b

$B$14:$D$18

Columns

Input Range:

Grouped By:

Rows

Labels in First Row

Input

Output Range: F14

New Worksheet Ply:

New Workbook

Output Options


13/21

Sheet: Sampling File: 246706782.xls.ms_office Page 13 of 21

Random Sampling

Example Data Set: Example After Sorting:

FUND RandNo FUND RandNo

Benchmarrk Div 0.999969 Freedom Cash 0.078951

Bradford 0.172857 Capital Cash 0.082888

BT INstit Treas 0.263466 Fortis 0.110691

Capital Cash 0.082888 Flex-fund 0.119541

Fidelity Cash 0.275826 Nationwide 0.165838

Flex-fund 0.119541 Bradford 0.172857

Fortis 0.110691 MarketWatch 0.183844

Freedom Cash 0.078951 Piermont Money 0.220191

Galaxy Money 0.291818 BT INstit Treas 0.263466

MarketWatch 0.183844 Fidelity Cash 0.275826

Nationwide 0.165838 NCC Funds 0.27604

NCC Funds 0.27604 Galaxy Money 0.291818

Piermont Money 0.220191 Benchmarrk Div 0.999969

To select a random sample of size n,

Put random numbers into the column next to the data set (instructions given below).Select the fi rst random numberand then go to the Standard Toolbar and press this button:(or use Menu: Data, Sort, Ascending. See instructions on worksheet Sorting.)

Your sample is the first n rows.

Here is how to put random numbers into cells B17:B29:

Menu: Tools, Data Analysis, Random Number Generation Number of Variables: (leave blank)Number of Random Numbers: (leave blank)Distribution: UniformParameters Between:0and 1Output Range: B17:B29

For a Sample of 8,

choose the first 8 after

sorting on the Random

Numbers.

A

Z


14/21

Sheet: Confidence Intervals File: 246706782.xls.ms_office Page 14 of 21

Confidence Intervals

There are two ways to do confidence intervals: useBuilt-in Excel functions , or use information from

theDescriptive Statistics toolin the Data Analysis package. They are both illustrated below.

Confidence Intervals from the Descriptive Statistics Data Analysis Tool.

First,generate the descriptive statistics (see the Descriptive Statssheet in this workbook): Menu: Tools, Data Analysis, Descriptive Statistics

Select your data range,

Check the Confidence Level for the Meanbox and enter your desired confidence level in the box,

Check the Summary Statisticsbox.

Click OK. You should get the output shown below.

Example Data Set Output from Descriptive Statistics Tool

a b a

1 2 Mean 3.818182

2 3 Standard Error 0.615234

3 4 Median 3

4 3 Mode 3

3 5 Standard Deviation 2.040499

5 11 Sample Variance 4.163636

8 6 Kurtosis 0.260801

6 10 Skewness 0.730477

5 7 Range 7

3 4 Minimum 1

2 5 Maximum 8

Sum 42

Count 11

Confidence Level (95.0% 1.370826

Then, to get the confidence interval, add and subtract the "Confidence Level"from the "Mean".

Calculations: Lower Confidence Limit: 2.447356 = E15 - E28

Upper Confidence Limit: 5.189008 = E15 + E28

Interpretation:We have 95% confidence that the population mean for variable a

is in the interval 2.447 to 5.189."

Confidence Intervals using Built-in Excel Functions.

Basic Calculations: Average: 3.818182 =AVERAGE(A15:A25)

Standard Deviation: 2.040499 =STDEV(A15:A25)Sample Size, n: 11 =COUNT(A15:A25)

Probability Calculations: Confidence: 0.95

Student's t (2-tail): 2.228139 =TINV(1-E42,E41-1)

The Confidence Interval:Lower Confidence Limit: 2.447356 =E39-E43*E40/SQRT(E41)

Upper Confidence Limit: 5.189008 =E39+E43*E40/SQRT(E41)


15/21

Sheet: One-Sample t-tests File: 246706782.xls.ms_office Page 15 of 21

One-Sample t-Test

The easiest way to do a One-Sample t-Test in Excel is to use a Confidence Interval. However, this method

does not give a p-value directly. The second method is to construct the test statistic and compare it to a

criti cal value. The test statistic can be used to compute a p-value. Both methods are illustrated below.

One-Sample t-Tests using Confidence IntervalsTwo-tail test: set up a (1 - a) confidence interval (see the sheet Confidence I ntervalsfor instructions)

and rejectH0(the Null Hypothesis) if the value specified in H0is outside the confidence interval.

One-tail test: use a confidence level of (1 - 2a) and rejectH0if the value specified in H0is

outside of the confidence interval in the direction predicted by the Alternative Hypothesis , Ha.

Example Data

a Calculation of the Confidence I ntervals:

1 Two-tail test:For a= 0.05, One-tail test:For a= 0.05,

2 set up a 95% confidence interval: set up a 90% confidence interval:

3 Average: 3.8181818 Average: 3.8181818

4 Standard Deviation: 2.040499 Standard Deviation: 2.040499

3 Sample Size, n: 11 Sample Size, n: 11

5 Confidence: 0.95 Confidence: 0.90

8 Student's t (2-tail): 2.2281389 Student's t (2-tail): 1.8124611

6 Lower Confidence Limit: 2.4473559 Lower Confidence Limit: 2.7030948

5 Upper Confidence Limit: 5.1890077 Upper Confidence Limit: 4.9332688

3

2

Hypothesis Tests using Conf idence I ntervals:

In the following examples, assume that 4.4 has been given as the value to use in the null hypothesis.(This is the value often referred to as m0. Thus, m0= 4.4 in the examples.)

Two-tail test: One-tail test:

Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

Reject H0if 4.4 is outside the confidence interval. Reject H0if 4.4 is above the upper confidence limit.

Result: 4.4 is between 2.447 and 5.189, Result: 4.4 is not above 4.933,

so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

Example:H0: m> 4.4, Ha: m< 4.4

Reject H0if 4.4 isbelow the lower confidence limit.

Result: 4.4 is not below 2.703

so the null hypothesis is NOTrejected.


16/21

Sheet: One-Sample t-tests File: 246706782.xls.ms_office Page 16 of 21

One-Sample t-Tests using the Test Statistic

The test statistic is (sample average - m0)/(standard error). The cri tical levelis the value from the

Student's t distribution. There are two ways to test the hypothesis (they give the same result):

Hypothesis test using the Test Statistic:

For 2 tails, rejectH0if the test statistic is larger in absolute value than the critical level.

For 1 tail, rejectH0if the test statistic is larger than the critical level in the direction predicted by Ha.

Hypothesis test using the P-values:

For 2 tails, rejectH0if thep-value is smaller than a.

For 1 tail, rejectH0if thep-value is smaller than aand the direction is consistent with Ha.

Basic Calculations: Average: 3.8181818

Standard Deviation: 2.040499

Sample Size, n: 11

Probability Calculations: Hypothesized Value, m0: 4.4

a: 0.05

t-ratio, or Test Statistic: -0.945687

p-value, one-tail: 0.183299t Critical one-tail: 1.8124611

p-value, two-tail: 0.3665981

t Critical two-tail: 2.2281389

Tests using the Test Statistic (t-r atio):



Result: absolute value of t-ratio of 0.94569 is Result: sample average of 3.818 is below 4.4,

smaller than the critical value of 1.81246, which IS NOTconsistent with Ha,

so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.


Result: sample average of 3.818 is below 4.4,

which IS consistent with Ha,

but the absolute value of the t-ratio of 0.94569 is

smaller than the critical value of 2.22814


Same Tests, using the p-value



Result: p-value of 0.3666 is larger than 0.05 Result: sample average of 3.818 is below 4.4,

so the null hypothesis is NOTrejected. which IS NOTconsistent with Ha,so the null hypothesis is NOT rejected.


Result: sample average of 3.818 is below 4.4,

which IS consistent with Ha,

but the p-value of 0.1833 is larger than 0.05,



17/21

Sheet: Two-Sample t-tests File: 246706782.xls.ms_office Page 17 of 21

Two-Sample t-Tests.

There are three t-Tests in the Excel Data Analysis Tools, and each has a corresponding built-in function.

Data Analysis Tool: Excel Spreadsheet Formula:

t-Test: Paired Two-Sample for Means = TTEST(Array1, Array2, Tails, 1)

t-Test: Two-Sample Assuming Equal Variances = TTEST(Array1, Array2, Tails, 2)

t-Test: Two-Sample Assuming Unequal Variances = TTEST(Array1, Array2, Tails, 3)

The formulas give only the p-value for the test. The Data Analysis Tools give the complete analysis.

t-Tests using the built-in function TTEST(array1, array2, tails, type)

Example Data Set Paired Equal s Unequal s

a b Hypothesized Difference 0 0 0

1 2 p-value, one-tail: 0.016746 0.069742 0.0705883

2 3 p-value, two-tail: 0.033492 0.139485 0.1411765

3 4

4 3 TTEST(Array1, Array2, Tails, Type)

3 5 Array1 is the first data set.

5 11 Array2 is the second data set.

8 6 Tails = 1 for a one-tail test, or 2 for a two-tail test

6 10 Type = 1 for a Paired Two-sampletest

5 7 Type = 2 for a Two-sample test assuming Equalvariance

3 4 Type = 3 for a Two-sample test assuming Unequalvariance

2 5 Example: = TTEST( $A$14:$A$24, $B$14:$B$24, 2, 1)

t-Tests using the t-Test Data Analysis toolsAt this point you should be familiar with how to use the input boxes, so here is a brief list of the steps.

Menu: Tools, Data Analysis, t-Test: Two-Sample Assuming Equal Variances

Put the addresses of the two variables in their respective Input Rangeboxes.

In the Hypothesized Mean Differencebox,

If your "null hypothesis" is that the two population means are equal, leave the box blank.

If your "null hypothesis" is that the two population means are different by a specified amount:

First, make sure that the variable hypothesized to have the larger mean is "Variable 1".

If not, go back and re-do the Input Range boxes.

Then, type the hypothesized difference in the Hypothesized Mean Differencebox.

For example, if the null hypothesis states that Variable 1's population mean is 7.4 units

larger than Variable 2's, enter 7.4 in the Hypothesized Mean Differencebox.

If your Variable Ranges include a name for each variable, Check the Labelsbox.

The Alphabox is where you enter the type I error probability. (Excel's output does not report this

value, so be sure to note what value you used.)

Enter your Output Optionsin the usual way, and click OK.

Examples of each of the Tools are given on the next 2 pages.

Select a cell to see theTTEST formula.

2-tails Paired Two-Sample testArray 1 Array 2


18/21


t-Test: Paired Two Sample for Means, Hypothesized Diff. = 0

a b

Mean 3.818182 5.454545

Variance 4.163636 8.272727

Observations 11 11

Pearson Correlation 0.645926

Hypothesized Mean Difference 0df 10

t Stat -2.46321 Built-in function TTEST

p-value, one-tail: P(T


19/21


t-Test: Two-Sample Assuming Unequal Variances, Hypothesized Diff. = 0

a b

Mean 3.818182 5.454545

Variance 4.163636 8.272727

Observations 11 11

Hypothesized Mean Difference 0

df 18t Stat -1.53897 Built-in function TTEST

p-value, one-tail: P(T


20/21

Sheet: Regression File: 246706782.xls.ms_office Page 20 of 21

Regression

Regression is a method to fit a linear function to a data set.

The objective is to estimate values of b0, b1 and b2 in the following equation:

y = b0 + b1 x1 + b2 x3

In this equation, y is called theDependent Variable (sometimes called the Criterion Variable )

x1 and x2 are called theIndependent Variables (orPredictor Variables ),

b0, is called the "intercept", and b1 and b2 are the "slopes".

Collectively, b0, b1 and b2 are referred to as the Coefficients. (This is their label in the output.)

The results for the Example Data Set are shown below the instructions.

Example Data Set:

y x1 x2

-3 2 5

2 3 4

11 5 2

9 6 4

8 4 3

Instructions for the Regression Data Analysis Tool.


Select Regression(double-click on it, or select OK)

Select the Input Y Rangewindow, and select the area that contains the Dependent Var iable.

Select the Input X Rangewindow, and select the area that contains the I ndependent Var iable(s).

(If there are 2 or more Independent Variables, they

must be side-by-side in the worksheet.)

You may specify Constant is Zeroto force b0

(the intercept) to be zero.

If the first row of your area contains names for the

variables, select Labels in the First Row.

You may set a Confidence Levelfor the confidence

intervals for the coefficients.





Make sure that the Output Rangedoes notoverlap

with the Input Range.

Select additional output and plots that you would like.

Click OK

Note: Graphs produced by Excel's Regression program are badly sized. However, it is easy to change the size

by clicking on the graph and dragging one of the corner "handles". An example is given below the output.

$B$14:$B$18Input Y Range:

Input X Range:

Labels in First Row

Input

Output Range: $A$44

New Worksheet Ply:

New Workbook

Output Options

Confidence Level %

Constant is Zero

$C$14:$D$18

95

Residuals

Residuals

Standardized Residuals

Residual Plots

Line Fit Plots

Normal Probability

Normal Probability Plots


21/21

Sheet: Regression File: 246706782.xls.ms_office Page 21 of 21

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.99322

R Square 0.986486

Adjusted R 0.972973

Standard E 0.948683Observatio 5

ANOVA

df SS MS F ignificance F

Regressio 2 131.4 65.7 73 0.0135135

Residual 2 1.8 0.9

Total 4 133.2

Coefficientsandard Err t Stat P-value Lower 95% Upper 95% ower 95.0% Upper 95.0%

Intercept 5.2 2.894823 1.79631 0.2142827 -7.2554266 17.655427 -7.2554266 17.6554266

x1 2.3 0.360555 6.379052 0.0237044 0.7486554 3.8513446 0.7486554 3.851344584x2 -2.5 0.5 -5 0.0377496 -4.6513279 -0.3486721 -4.6513279 -0.348672137

RESIDUAL OUTPUT

bservatio Predicted y Residuals

1 -2.7 -0.3

2 2.1 -0.1

3 11.7 -0.7

4 9 5.33E-155 6.9 1.1

One of the " L ine Fi t Plots" as produced

by Excel. Note that the plot is much

flatter than is customary.

The other " Li ne F it Plot" after changing

its height to a mor e sui table value.

Note that you can see the difference

between "y" and "Predicted y" in

this graph, whereas they are "on top of

each other" in the other graph.

Increasing the height reveals the errors

(or residuals).

-5

0

5

10

15

0 2 4 6 8

y

x1

x1 Line Fit Plot

y Predicted y

-4

-2

0

2

4

68

10

12

14

0 2 4 6

y

x2

x2 Line Fit Plot

y Predicted y

ExcelStats.xls

Documents

Transcript of ExcelStats.xls