ExcelStats2007.xls

download ExcelStats2007.xls

of 26

Transcript of ExcelStats2007.xls

  • 8/11/2019 ExcelStats2007.xls

    1/26

    Sheet: Introduction File: 246707238.xls.ms_office Page 1 of 26

    ExcelStats 2007.xls Version 1.0 2/15/08Using Excel to do Statistics: Some Helpful Notes

    [email protected]

    Johnson Graduate School of Management

    Cornell University

    Ithaca NY 14853

    This workbook is intended for teaching purposes. You are welcome to use it in any manner,

    and change it as you see fit. It comes without any guarantee whatsoever, and is distributed

    free of charge.

    This workbook tells you how to do a bunch of Statistics calculations using Excel. Excel has an Add-In

    called the Analysis ToolPak. To find out if you have it working, go into the Tools menu and select Add-Ins.

    For Excel 2007:

    1. Click the Microsoft Office Button , and then click Excel Options.

    2. Click Add-Ins, and then in the Managebox, select Excel Add-ins, and click Go.

    3. In the Add-Ins availablebox, select the Analysis ToolPakcheck box, and then click OK.

    4. In the same box, select the Analysis ToolPak - VBAcheck box, and then click OK.

    5. After you load the Analysis ToolPak, the Data Analysiscommand is available in the Analysisgroup on

    the Datatab.

    Excel has 2 ways to do almost every statistical analysis, and in many cases this workbook illustrates both.

    There are separate sheets for each topic listed below. You can find the sheets by selecting the appropriate

    "tab" at the bottom of the screen.

    Contents:These are the sheets in this workbook.

    Introduction

    Sorting

    Frequencies & Graphs

    Histogram

    Scatter Plot

    Descriptive Statistics

    Rank & Percentile

    Covariance

    Correlation

    Sampling

    Confidence Intervals

    One-Sample t-tests

    Two-Sample t-tests

    Regression

    Additional File Available:

    PredInt.xls: Contains a Visual Basic macro to do multiple regression with Prediction Intervals,

    a feature that is not included in the Regression tool in the Analysis ToolPak.

    If Analysis ToolPakis not listed in the Add-Ins availablebox, click Browseto locate it. If you getprompted that the Analysis ToolPak is not currently installed, click Yesto install it.

  • 8/11/2019 ExcelStats2007.xls

    2/26

    Sheet: Sorting File: 246707238.xls.ms_office Page 2 of 26

    Sorting a Data Set

    *** Sorting does not require the Data Analysis package.

    Sorting changes the data set. If you want to be able to restore the original order of the data, begin by

    numbering the data points. The first column of the Example Data Set contains these numbers.

    Example Data Set Sorted "Ascending" by a Sorted "Descending" by b

    Numb. a b Numb. a b Numb. a b1 1 2 1 1 2 6 5 11

    2 2 3 2 2 3 8 6 10

    3 3 4 11 2 5 9 5 7

    4 4 3 3 3 4 7 8 6

    5 3 5 5 3 5 5 3 5

    6 5 11 10 3 4 11 2 5

    7 8 6 4 4 3 3 3 4

    8 6 10 6 5 11 10 3 4

    9 5 7 9 5 7 2 2 3

    10 3 4 8 6 10 4 4 3

    11 2 5 7 8 6 1 1 2

    Instructions for the Sort Menu Item

    Select the data set. (Hold down the Left Mouse Button and drag the cursor over cells A6 to C17.)

    On the Datatab, select Sort

    Use the arrow next to the Sort Bywindow to select afrom the pull-down list and then Click OK

    The results should look like the data in the first shaded area next to the Example Data Set above.

    Now repeat the steps, except select bfrom the pull-down list, and Largest to Smallestunder Order.

    The results should look like the data in the second shaded area next to the Example Data Set.

    To return the data set to its original order, repeat the steps, selecting Numb.from the pull-down list,

    and select Smallest to Largestunder Order.

    The Add Levelbutton may be used to resolve ties.

    For example, select Sort Byaand Smallest to Largest, and then click Add Level.

    In the new boxes, select Then byband Smallest to Largest.

    Compare the results to the first shaded area next to the Example Data Set. The order

    has been arranged so that bis ascending when ais constant. For example, look at the

    three points for which a= 3. The values for bare 4, 4 and 5, whereas in the first shaded area they

    are 4, 5 and 4.

  • 8/11/2019 ExcelStats2007.xls

    3/26

    Sheet: Sorting File: 246707238.xls.ms_office Page 3 of 26

  • 8/11/2019 ExcelStats2007.xls

    4/26

    Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 4 of 26

    Counting and Graphing Frequency of Observations

    Data may or may not be numerical. The four counting functions illustrated below take into account

    COUNTA counts all entries, ignoring blanks.

    COUNT counts only numbers, excluding blanks.

    COUNTBLANK counts the number of blank cells.

    COUNTIF counts the number of entries that match a specified condition.

    Data Excel Functions:

    a d a Entries = 15 =COUNTA($A$9:$A$24)

    1 High Numbers = 13 =COUNT($A$9:$A$24)

    2 Low Blanks = 1 =COUNTBLANK($A$9:$A$24)

    3 Med

    4 Med a Freq.

    3 Med 1 3 =COUNTIF($A$9:$A$24, "=" & E13)

    High 2 5 =COUNTIF($A$9:$A$24, "=" & E14)

    5 Low 3 3

    2 Med 4 1

    1 Med 5 1 Range to be Counted:

    2 Low 6 0 The Condition:

    - Low

    ? Med d Entries = 16

    1 High Numbers = 0

    3 Low Blanks = 0

    2 Low

    2 Low d Freq.

    Low 7 =COUNTIF($B$9:$B$24,"=" & E25)

    Med 6 =COUNTIF($B$9:$B$24,"=" & E26)

    High 3 =COUNTIF($B$9:$B$24,"=" & E27)

    Graphing Frequencies

    Frequencies may be graphed in several ways. We will illustrate two kinds of bar charts and a pie chart.

    Standard Bar Chart (Column

    Chart):

    Excel has a Chart Wizard to help you. It works much faster if you select the range that contains your data

    before you start making the graph. So begin by selecting the range E24:F27 above. Then,

    On the Inserttab in theChartsgroup, select Column.

    Then click on the first picture under 2-D Column

    A chart appears, and your next job is to make it look like the figure above.

    As long as the chart is selected, you should see Chart Tools at the top of the screen.

    On the Layouttab, in the Labelsgroup, click on Legendand select None

    0

    5

    10

    Low Med High

    Frequency

    Distribution of Shipment Sizes

  • 8/11/2019 ExcelStats2007.xls

    5/26

    Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 5 of 26

    click on Legendand select None

    click on Axis Titles, select Primary Vertical Axis Title, Rotated Title

    and then type the word Frequency. (This enters the title for the Y axis.)

    click on Chart Title, select Above Chart, and type the words Distribution of Shipment Sizes.

    Right-click on a blank spot to the right of the chart title, and select Font, and change the font size to 10.

    Now you may move your chart and change its size. To move it, just click once on it and drag it to a

    new location. To change the size, click once on it and use the "handles" (little black boxes) on the corners.

    Horizontal Bar

    Chart with Stacked

    Bars:

    Select the range that contains your data and labels. This is E24:F27 in the example.

    On the Inserttab in theChartsgroup, select Bar.

    Then click on the second picture under 2-D Bar Important: On the Designtab, in the Datagroup, click the Switch Row/Columnbutton.

    Now you may resize and move the graph as you please.

    0 5 10 15 20

    Freq.Low

    Med

    High

  • 8/11/2019 ExcelStats2007.xls

    6/26

    Sheet: Frequencies & Graphs File: 246707238.xls.ms_office Page 6 of 26

    Pie Chart:

    Select the range that contains your data and labels, E24:F27.

    On the Inserttab in theChartsgroup, select Pie

    Then click on the first picture under 2-D Pie Click on the title and hit the Deletekey.

    Click on the legend and hit the Deletekey.

    On the Layouttab in the Labelsgroup, select Data Labels, More Data Label Options.

    In the popup window, un-check all of the options except the following:

    Under "Label Contains", select Category Name, Percentage,and Show Leader Lines

    Under "Label Position" select Outside End

    Low

    44%

    Med

    37%

    High

    19%

  • 8/11/2019 ExcelStats2007.xls

    7/26

    Sheet: Histogram File: 246707238.xls.ms_office Page 7 of 26

    Histograms

    The Histogram tool in the Data Analysis package is a fast way to get a picture and table of the distribution

    of your data. An example is shown below. Also shown are the Excel functions that give the same information.

    NOTE: The Histogram tool cannotdescribe more than one variable at a time.

    The term Binrefers to the Upper L imitof the range for which the frequency is calculated. Bin 2 in the table

    below has frequency 6, because "Data" contains 6 values that are strictly greater than 1and less-than-or-equal to 2.

    Output from Histogram Tool Excel Functions

    Data Bin Frequency Bin Freq. Cumul.

    1 1 4 1 4 4

    2 2 6 2 6 10

    3 3 4 3 4 14

    4 4 1 4 1 15

    3 More 1 1000 1 16

    1

    5

    2

    1

    2

    3

    2

    1

    3

    2

    2

    Instructions for the Histogram Data Analysis Tool.

    On the Datatab, the Analysis group, selectData AnalysisSelect Histogram(double-click on it, or select OK)

    Select the Input Range window, and either type or select

    the area that contains the data.

    On the first try, leave Bin Rangeblank. Later you may

    wish to customize the histogram by putting a range

    into this box (an example is given later).

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button and either type in or select a location for the output.

    For example, if you type D8, the output will begin at

    cell D8 and continue down and to the right.

    Check Chart Outputif you want Excel to create a graph.

    Click OK

    UNFORTUNATE NOTE: At the time of this writing, there is an unfixed bug in Excel. If you try to move the

    chart created by Histogram, it will separate into two pieces. However, if you save the file, then close it, and then

    open it again, the chart will remain in one piece. So I recommend doing that now.

    0

    1

    2

    3

    4

    56

    7

    1 2 3 4 More

    Frequency

    Bin

  • 8/11/2019 ExcelStats2007.xls

    8/26

    Sheet: Histogram File: 246707238.xls.ms_office Page 8 of 26

    Improving the appearance of the histogram(after saving, closing and reopening the file):

    The chart above, created by the Histogram tool, has been modified to look better.

    First, I changed what was displayed inside the chart:

    Delete the title (right-click on it and select Delete).

    Delete the legend (same way).

    Center the plot area in the chart (click above one of the bars and drag).

    Next, I changed its shape:

    Single-click on the chart and drag one of the "handles" (little boxes in the corners).

    Formatting the numbers on the axes:

    Sometimes the histogram tool creates bins with many more decimal places than is necessary. This

    has an unfortunate effect on the appearance of the horizontal axis, but it is easy to fix.

    Since the problem did not occur in the example above, we first have to create the problem and then fix it.

    To create the problem:

    Select cell D9and enter the formula = 1/3

    Now look at the graph. Notice how the display has changed on the horizontal axis. Not very pretty, is it?

    To fix the problem:

    The format of the numbers on the chart is the same as their format on the spreadsheet. Therefore,

    Select the range of numbers below the word "Bin" (cells D8:D13 in the example above).

    On the Home tab, in the Number group, in the pull-down list, select More Number Formats

    In the popup window, select Numberfrom the list of options and change the decimal places to 2.

    Notice that the numbers in the graph are now displayed with 2 decimal places, which looks better.

    You can use this method for any axis on any Excel graph, displaying however many decimal places

    are appropriate for the situation.

    Using Bins that You Choose

    To tell Excel what bins you want to use for the data,

    put the Bin Rangein this box.

    Notice that I had to include one cell abovethe

    desired range of bins, because the "Labels" box is

    checked.

    Output from Histogram Tool

    Desired Bins Desired Bins Frequency

    2 2 10

    4 4 5

    6 6 1

    More 0

  • 8/11/2019 ExcelStats2007.xls

    9/26

    Sheet: Scatter Plot File: 246707238.xls.ms_office Page 9 of 26

    Scatter Diagrams (Scatter Plots)

    Scatter Plots offer a way to visualize the relationship between two variables. Excel's Chart Group

    makes it fairly easy to construct one. An example is shown below.

    Example Data Set:

    a y b

    1 33 2

    2 23 3

    3 14 4

    4 55 3

    3 3 5

    5 44 11

    8 35 6

    6 98 10

    5 41 7

    3 77 4

    2 8 5

    Instructions for Scatter Plots

    Follow these steps to reproduce the chart above. Notice that it plots aand b, but that in the data, variable

    y is in the column between aand b.

    Begin by selecting the data range. Click on cell A6. Then, holding down the Cntlkey, click and drag to cell

    A17; then, continuing to hold Cntl, click on C6 and drag to C17. This selects both aand b, leaving out y.

    On the Insert tab, in the Charts group, select Scatter, and click on the first picture.

    Right-click on the legend (on the right side of the chart) and select Delete

    Click on the chart title and type "Plot of a vs. b" Right-click on the title, select Font and change the font size to 12.

    On the Layout tab, use the Axis Titles button to insert titles for both axes as shown.

    On the Layout tab, use the Gridlines button to insert gridlines as shown.

    Now you may move your chart and change its size. To move it, just click once on it and drag it to a

    new location. To change the size, click once on it and use the "handles" (little black boxes).

    0

    2

    4

    6

    8

    10

    12

    0 2 4 6 8 10

    b

    a

    Plot of a vs. b

  • 8/11/2019 ExcelStats2007.xls

    10/26

    Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 10 of 26

    Descriptive Statistics

    The Descriptive Statistics tool in the Data Analysis package is a fast way to get a bunch of numbers that

    describe your data. An example is shown below, together with the built-in Excel functions that give the

    same information. Copy the Excel Functions to the next column to get a description of variable b.

    Example Data Set Output from Descriptive Statistics Tool Excel Functions:

    a b a a1 2

    2 3 Mean 3.818182 3.818182

    3 4 Standard Error 0.615234 0.615234

    4 3 Median 3 3

    3 5 Mode 3 3

    5 11 Standard Deviation 2.040499 2.040499

    8 6 Sample Variance 4.163636 4.163636

    6 10 Kurtosis 0.260801 0.260801

    5 7 Skewness 0.730477 0.730477

    3 4 Range 7 7

    2 5 Minimum 1 1Maximum 8 8

    Sum 42 42

    Count 11 11

    Largest(2) 6 6

    Smallest(2) 2 2

    Confidence Level(95.0%) 1.370826 1.370826

    Instructions for the Descriptive Statistics Data Analysis Tool.

    On the Datatab, the Analysis group, selectData Analysis

    Select Descriptive Statistics(double-click on it, or select OK)

    Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    If your input range includes names for the variables,

    select the Labels In checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Rangebutton,

    click on the window next to that button and

    either type in or select a location for the output.

    (If you type E6, the output will begin at cell E6

    and continue down and to the right.)

    Most Important:Check the Summary Statisticsbox.

    Confidence Level for the Meanbox gives a

    "Confidence Level" in the output, which is equal to

    half of the width of a confidence interval.

    Kth Largest orKth Smallest:

    Checking the boxes and entering "2" as shown

    causes the output to include the second smallest and

    to include the second smallest and second largest

    second largest values in the data set.

  • 8/11/2019 ExcelStats2007.xls

    11/26

    Sheet: Descriptive Stats File: 246707238.xls.ms_office Page 11 of 26

    Click OK

  • 8/11/2019 ExcelStats2007.xls

    12/26

  • 8/11/2019 ExcelStats2007.xls

    13/26

    Sheet: Covariance File: 246707238.xls.ms_office Page 13 of 26

    Sample Covariance

    Covariance measures the degree to which things "vary together". In that regard it is almost the

    same as correlation (see the next page). In fact, correlation is more useful for quantifying the

    relationship between two variables. The most common use of Covariance is when you are adding

    two random variables, such as when you are forming a portfolio of different stocks.

    Unfortunately,Excel does not offer an "unbiased" sample estimate of covariance. This is an error that should

    have been remedied long ago, but Microsoft has not seen fit to fix it. To understand the problem, consider

    Excel's variance function. There are two versions: Sample variance VAR() and Population Variance VARP().

    Both of these compute the sum of squared differenced from the sample mean. However, Sample Variance

    corrects for a statistical bias by dividing that sum by (n-1), where n is the size of the sample. Population Variance

    divides by n, and therefore gives a smaller answer. Population Variance is correct ONLY IF the sample

    is, in fact, the entire population. Sample Variance is appropriate when the sample is a small fraction of the

    population, which is the more usual case.

    To be consistent, Excel should have called their covariance function COVP() or the Population Covariance,

    and should change the definition of COV() to Sample Covariance and calculate it using (n-1).

    Until they make such a change, you can obtained unbiased estimates of covariance by multiplying Excel's

    values by the ratio n/(n-1). The example below, in red, does this correction.

    Finally, please note that the diagonal values in the covariance table are variances . Thus, 1.25 is the Population

    variance of a and 1.66667 is the Sample Variance of a .

    Example Data Set with n= 4

    a b c Covariance Data Analysis Tool

    1 2 5 a b c

    2 3 4 a 1.25

    3 5 2 b 1 1.25

    4 4 3 c -1 -1.25 1.25

    Covariance Excel Function (Population Covariance) Sample Covariance: Excel Function multiplied by n/(n-1)

    a b c a b ca 1.25 a 1.6666667

    b 1 1.25 b 1.3333333 1.6666667

    c -1 -1.25 1.25 c -1.333333 -1.666667 1.6666667

    Instructions for the Covariance Data Analysis Tool.

    On the Datatab, the Analysis group, selectData Analysis

    Select Covariance(double-click on it, or select OK)

    Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton. Otherwise, select the Rowsbutton.

    If your area includes names for the variables,

    select the Labels in first row checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.

    For example, if you type F15, the output will begin at

  • 8/11/2019 ExcelStats2007.xls

    14/26

    Sheet: Covariance File: 246707238.xls.ms_office Page 14 of 26

    cell F15 and continue down and to the right.

    Make sure that the Output Rangedoes notoverlap with

    the Input Range.

    Click OK

    Remember, if you want Unbiased estimates, multiply Excel's Covariance by n/(n-1).

  • 8/11/2019 ExcelStats2007.xls

    15/26

    Sheet: Correlation File: 246707238.xls.ms_office Page 15 of 26

    Sample Correlation

    Correlationis a way to quantify a linear relationship between variables. The value of correlation

    is between -1 and +1.Positive correlation means that the variables tend to move in the same

    direction. That is, if one variable is above its mean, the other one is likely to be above its mean, too.

    Height and weight of people are positively correlated, because very tall people usually weigh more

    than very short people. Note that this is not always true, so the correlation is less than +1.0.

    Negative correlation means that they tend to move in opposite directions. Mountain climbers know

    that there is a negative correlation between altitude and stamina, because of decreasing oxygen.

    Correlation of +1 or -1 means that the relationship between the two variables is perfectly linear.

    When this happens, a "scatter plot" of the two variables yields a straight line. In the example below,

    variables b and c have correlation of -1.

    Example Data Set: Correlation Data Analysis Tool:

    a b c a b c

    1 2 5 a 1

    2 3 4 b 0.8 1

    3 5 2 c -0.8 -1 1

    4 4 3

    Correlation Excel Function:

    a b c

    a 1

    b 0.8 1

    c -0.8 -1 1

    Instructions for the Correlation Data Analysis Tool.

    On the Datatab, the Analysis group, selectData Analysis

    Select Correlation(double-click on it, or select OK)Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    Otherwise, select the Rowsbutton.

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.Make sure that the Output Rangedoes notoverlap

    with the Input Range.

    Click OK

    0

    2

    4

    6

    0 2 4 6

    b

    a 0

    2

    4

    6

    0 2 4 6

    c

    b

  • 8/11/2019 ExcelStats2007.xls

    16/26

    Sheet: Sampling File: 246707238.xls.ms_office Page 16 of 26

    Random Sampling

    Example Data Set: Example After Sorting:

    FUND RandNo FUND RandNo

    Benchmarrk Div 0.999969 Freedom Cash 0.078951

    Bradford 0.172857 Capital Cash 0.082888

    BT INstit Treas 0.263466 Fortis 0.110691

    Capital Cash 0.082888 Flex-fund 0.119541

    Fidelity Cash 0.275826 Nationwide 0.165838

    Flex-fund 0.119541 Bradford 0.172857

    Fortis 0.110691 MarketWatch 0.183844

    Freedom Cash 0.078951 Piermont Money 0.220191

    Galaxy Money 0.291818 BT INstit Treas 0.263466

    MarketWatch 0.183844 Fidelity Cash 0.275826

    Nationwide 0.165838 NCC Funds 0.27604

    NCC Funds 0.27604 Galaxy Money 0.291818

    Piermont Money 0.220191 Benchmarrk Div 0.999969

    To select a random sample of size n,

    Put random numbers into the column next to the data set (instructions given below).Select the fi rst random numberand then go to the Datatab and press this button:(See alternative instructions on worksheet Sorting.)

    Your sample is the first n rows.

    Here is how to put random numbers into cells B17:B29:

    Tab: Data,Analysis group,Data Analysis, Random Number GenerationNumber of Variables: (leave blank)Number of Random Numbers: (leave blank)Distribution: UniformParameters Between:0and 1Output Range: B17:B29

    For a Sample of 8,

    choose the first 8 after

    sorting on the Random

    Numbers.

  • 8/11/2019 ExcelStats2007.xls

    17/26

    Sheet: Confidence Intervals File: 246707238.xls.ms_office Page 17 of 26

    Confidence Intervals

    There are two ways to do confidence intervals: useBuilt-in Excel functions , or use information from

    theDescriptive Statistics toolin the Data Analysis package. They are both illustrated below.

    Confidence Intervals from the Descriptive Statistics Data Analysis Tool.

    First,generate the descriptive statistics (see the Descriptive Statssheet in this workbook): On the Datatab, the Analysis group, selectData Analysis, Descriptive Statistics

    Select your data range,

    Check the Confidence Level for the Meanbox and enter your desired confidence level in the box,

    Check the Summary Statisticsbox.

    Click OK. You should get the output shown below.

    Example Data Set Output from Descriptive Statistics Tool

    a b a

    1 2

    2 3 Mean 3.818182

    3 4 Standard Error 0.615234

    4 3 Median 3

    3 5 Mode 3

    5 11 Standard Deviation 2.040499

    8 6 Sample Variance 4.163636

    6 10 Kurtosis 0.260801

    5 7 Skewness 0.730477

    3 4 Range 7

    2 5 Minimum 1

    Maximum 8

    Sum 42

    Count 11

    Confidence Level (95.0%) 1.370826

    Then, to get the confidence interval, add and subtract the "Confidence Level"from the "Mean".

    Calculations: Lower Confidence Limit: 2.447356 = E16 - E29

    Upper Confidence Limit: 5.189008 = E16 + E29

    Interpretation:We have 95% confidence that the population mean for variable a

    is in the interval 2.447 to 5.189.

    Confidence Intervals using Built-in Excel Functions.

    Basic Calculations: Average: 3.818182 =AVERAGE(A15:A25)

    Standard Deviation: 2.040499 =STDEV(A15:A25)Sample Size, n: 11 =COUNT(A15:A25)

    Probability Calculations: Confidence: 0.95

    Student's t (2-tail): 2.228139 =TINV(1-E42,E41-1)

    The Confidence Interval: Lower Confidence Limit: 2.447356 =E39-E43*E40/SQRT(E41)

    Upper Confidence Limit: 5.189008 =E39+E43*E40/SQRT(E41)

  • 8/11/2019 ExcelStats2007.xls

    18/26

    Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 18 of 26

    One-Sample t-Test

    The easiest way to do a One-Sample t-Test in Excel is to use a Confidence Interval. However, this method

    does not give a p-value directly. The second method is to construct the test statistic and compare it to a

    criti cal value. The test statistic can be used to compute a p-value. Both methods are illustrated below.

    One-Sample t-Tests using Confidence IntervalsTwo-tail test: set up a (1 - a) confidence interval (see the sheet Confidence I ntervalsfor instructions)

    and rejectH0(the Null Hypothesis) if the value specified in H0is outside the confidence interval.

    One-tail test: use a confidence level of (1 - 2a) and rejectH0if the value specified in H0is

    outside of the confidence interval in the direction predicted by the Alternative Hypothesis , Ha.

    Example Data

    a Calculation of the Conf idence I ntervals:

    1 Two-tail test:For a= 0.05, One-tail test:For a= 0.05,

    2 set up a 95% confidence interval: set up a 90% confidence interval:

    3 Average: 3.8181818 Average: 3.8181818

    4 Standard Deviation: 2.040499 Standard Deviation: 2.040499

    3 Sample Size, n: 11 Sample Size, n: 11

    5 Confidence: 0.95 Confidence: 0.90

    8 Student's t (2-tail): 2.2281389 Student's t (2-tail): 1.8124611

    6 Lower Confidence Limit: 2.4473559 Lower Confidence Limit: 2.7030948

    5 Upper Confidence Limit: 5.1890077 Upper Confidence Limit: 4.9332688

    3

    2

    Hypothesis Tests using Conf idence I ntervals:

    In the following examples, assume that 4.4 has been given as the value to use in the null hypothesis.(This is the value often referred to as m0. Thus, m0= 4.4 in the examples.)

    Two-tail test: One-tail test:

    Example 1:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example 2:H0: m< 4.4, Ha: m> 4.4

    Reject H0if 4.4 is outside the confidence interval. Reject H0if 4.4 is above the upper confidence limit.

    Result: 4.4 is between 2.447 and 5.189, Result: 4.4 is not above 4.933,

    so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

    Example 3:H0: m> 4.4, Ha: m< 4.4

    Reject H0if 4.4 isbelow the lower confidence limit.

    Result: 4.4 is not below 2.703

    so the null hypothesis is NOTrejected.

    Note: If you choose to do a one-tail

    test, you must do one or the other of

    these, NEVER BOTH.

  • 8/11/2019 ExcelStats2007.xls

    19/26

    Sheet: One-Sample t-tests File: 246707238.xls.ms_office Page 19 of 26

    One-Sample t-Tests using the Test Statistic

    The test statistic is (sample average - m0)/(standard error). The cri tical levelis the value from the

    Student's t distribution. There are two ways to test the hypothesis (they give the same result):

    Hypothesis test using the Test Statistic:

    For 2 tails, rejectH0if the test statistic is larger in absolute value than the critical level.

    For 1 tail, rejectH0if the test statistic is larger than the critical level in the direction predicted by Ha.

    Hypothesis test using the P-values:

    For 2 tails, rejectH0if thep-value is smaller than a.

    For 1 tail, rejectH0if thep-value is smaller than aand the direction is consistent with Ha.

    Basic Calculations: Average: 3.8181818

    Standard Deviation: 2.040499

    Sample Size, n: 11

    Probability Calculations: Hypothesized Value, m0: 4.4

    a: 0.05

    t-ratio, or Test Statistic: -0.945687

    p-value, one-tail: 0.183299t Critical one-tail: 1.8124611

    p-value, two-tail: 0.3665981

    t Critical two-tail: 2.2281389

    Tests using the Test Statistic (t-r atio):

    Two-tail test: One-tail test:

    Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

    Result: absolute value of t-ratio of 0.94569 is Result: sample average of 3.818 is below 4.4,

    smaller than the critical value of 1.81246, which IS NOTconsistent with Ha,

    so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

    Example:H0: m> 4.4, Ha: m< 4.4

    Result: sample average of 3.818 is below 4.4,

    which IS consistent with Ha,

    but the absolute value of the t-ratio of 0.94569 is

    smaller than the critical value of 2.22814

    so the null hypothesis is NOTrejected.

    Same Tests, using the p-value

    Two-tail test: One-tail test:

    Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

    Result: p-value of 0.3666 is larger than 0.05 Result: sample average of 3.818 is below 4.4,

    so the null hypothesis is NOTrejected. which IS NOTconsistent with Ha,so the null hypothesis is NOT rejected.

    Example:H0: m> 4.4, Ha: m< 4.4

    Result: sample average of 3.818 is below 4.4,

    which IS consistent with Ha,

    but the p-value of 0.1833 is larger than 0.05,

    so the null hypothesis is NOTrejected.

    Note: Test Statistic and P-value

    ALWAYS GIVE THE SAME RESULT.

    Note: Test Statistic and P-value

    ALWAYS GIVE THE SAME RESULT.

  • 8/11/2019 ExcelStats2007.xls

    20/26

    Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 20 of 26

    Two-Sample t-Tests.

    There are three t-Tests in the Excel Data Analysis Tools, and each has a corresponding built-in function.

    Data Analysis Tool: Excel Spreadsheet Formula:

    t-Test: Paired Two-Sample for Means = TTEST(Array1, Array2, Tails, 1)

    t-Test: Two-Sample Assuming Equal Variances = TTEST(Array1, Array2, Tails, 2)

    t-Test: Two-Sample Assuming Unequal Variances = TTEST(Array1, Array2, Tails, 3)

    The formulas above give only the p-value for the test. The Data Analysis Tools give the complete analysis.

    t-Tests using the built-in function TTEST(array1, array2, tails, type)

    Example Data Set Paired Equal s Unequal s

    a b Hypothesized Difference 0 0 0

    1 2 p-value, one-tail: 0.016746 0.069742 0.0705883

    2 3 p-value, two-tail: 0.033492 0.139485 0.1411765

    3 4

    4 3 TTEST(Array1, Array2, Tails, Type)

    3 5 Array1 is the first data set.

    5 11 Array2 is the second data set.

    8 6 Tails = 1 for a one-tail test, or 2 for a two-tail test

    6 10 Type = 1 for a Paired Two-sampletest

    5 7 Type = 2 for a Two-sample test assuming Equalvariance

    3 4 Type = 3 for a Two-sample test assuming Unequalvariance

    2 5 Example: = TTEST( $A$14:$A$24, $B$14:$B$24, 2, 1)

    t-Tests using the t-Test Data Analysis toolsAt this point you should be familiar with how to use the input boxes, so here is a brief list of the steps.

    Tab: Data, group:Analysis, Data Analysis, t-Test: Two-Sample Assuming Equal Variances

    Put the addresses of the two variables in their respective Input Rangeboxes.

    In the Hypothesized Mean Differencebox,

    If your "null hypothesis" is that the two population means are equal, leave the box blank.

    If your "null hypothesis" is that the two population means are different by a specified amount:

    (In this case, the variable hypothesized to have the larger mean MUST BE "Variable 1". For example,

    if Ho is that bhas a larger mean than a, then Variable 1 Input Range must contain variable b.)

    Then, type the hypothesized difference in the Hypothesized Mean Differencebox.

    For example, if the null hypothesis states that Variable 1's population mean is 7.4 units

    larger than Variable 2's, enter 7.4 in the Hypothesized Mean Differencebox.

    If your Variable Ranges include a name for each variable, Check the Labelsbox.

    The Alphabox is where you enter the type I error probability. (Excel's output does not report this

    value, so be sure to note what value you used.)

    Enter your Output Optionsin the usual way, and click OK.

    Examples of each of the Tools are given on the next 2 pages.

    Select a cell to see theTTEST formula.

    2-tails Paired Two-Sample testArray 1 Array 2

  • 8/11/2019 ExcelStats2007.xls

    21/26

    Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 21 of 26

    t-Test: Paired Two Sample for Means, Hypothesized Diff. = 0

    a b

    Mean 3.818182 5.454545

    Variance 4.163636 8.272727

    Observations 11 11

    Pearson Correlation 0.645926

    Hypothesized Mean Difference 0df 10

    t Stat -2.46321 Built-in function TTEST

    p-value, one-tail: P(T

  • 8/11/2019 ExcelStats2007.xls

    22/26

    Sheet: Two-Sample t-tests File: 246707238.xls.ms_office Page 22 of 26

    t-Test: Two-Sample Assuming Unequal Variances, Hypothesized Diff. = 0

    a b

    Mean 3.818182 5.454545

    Variance 4.163636 8.272727

    Observations 11 11

    Hypothesized Mean Difference 0

    df 18t Stat -1.53897 Built-in function TTEST

    p-value, one-tail: P(T

  • 8/11/2019 ExcelStats2007.xls

    23/26

    Sheet: Regression File: 246707238.xls.ms_office Page 23 of 26

    Regression

    Regression is a method to fit a linear function to a data set.

    The objective is to estimate values of b0, b1 and b2 in the following equation:

    y = b0 + b1 x1 + b2 x3

    In this equation, y is called theDependent Variable (sometimes called the Criterion Variable )

    x1 and x2 are called theIndependent Variables (orPredictor Variables ),

    b0, is called the "intercept", and b1 and b2 are the "slopes".

    Collectively, b0, b1 and b2 are referred to as the Coefficients. (This is their label in the output.)

    The results for the Example Data Set are shown below the instructions.

    Example Data Set:

    y x1 x2

    -3 2 5

    2 3 4

    11 5 2

    9 6 4

    8 4 3

    Instructions for the Regression Data Analysis Tool.

    On the Datatab, the Analysis group, selectData Analysis

    Select Regression(double-click on it, or select OK)

    Select the Input Y Rangewindow, and select the area that contains the Dependent Var iable.

    Select the Input X Rangewindow, and select the area that contains the I ndependent Var iable(s).

    (If there are 2 or more Independent Variables, they

    must be side-by-side in the worksheet.)

    You may specify Constant is Zeroto force b0

    (the intercept) to be zero.

    If the first row of your area contains names for the

    variables, select Labels in the First Row.

    You may set a Confidence Levelfor the confidence

    intervals for the coefficients.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.

    Make sure that the Output Rangedoes notoverlap

    with the Input Range.

    Select additional output and plots that you would like.

    Excel's Normal Probability Plot is incorrect.

    Click OK

    Note: Graphs produced by Excel's Regression program are badly sized. However, it is easy to change the size

    by clicking on the graph and dragging one of the corner "handles". Output is shown below.

    http://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htm

    As of this writing, Microsoft Excel's Regression package

    malfunctions. I recommend using the PredictionInterval

    macro that I wrote, which is in a file called PredInt.xls

    available on the class web site, with instructions.

    An alternative link for this file is:

    http://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htmhttp://forum.johnson.cornell.edu/faculty/mcclain/Software/PredInt.htm
  • 8/11/2019 ExcelStats2007.xls

    24/26

    Sheet: Regression File: 246707238.xls.ms_office Page 24 of 26

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R 0.99322

    R Square 0.986486

    Adjusted R 0.972973

    Standard E 0.948683Observatio 5

    ANOVA

    df SS MS F ignificance F

    Regressio 2 131.4 65.7 73 0.0135135

    Residual 2 1.8 0.9

    Total 4 133.2

    Coefficients andard Err t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

    Intercept 5.2 2.894823 1.79631 0.2142827 -7.2554266 17.655427 -7.2554266 17.6554266

    x1 2.3 0.360555 6.379052 0.0237044 0.7486554 3.8513446 0.7486554 3.851344584

    x2 -2.5 0.5 -5 0.0377496 -4.6513279 -0.3486721 -4.6513279 -0.348672137

    RESIDUAL OUTPUT

    bservatio Predicted y Residuals dard Residuals

    1 -2.7 -0.3 -0.447214

    2 2.1 -0.1 -0.149071

    3 11.7 -0.7 -1.043498

    4 9 1.78E-15 2.65E-15

    5 6.9 1.1 1.639783

    One of the " L ine Fi t Plots" as produced

    by Excel. Note that the plot is much "flatter"

    than is customary, and it is a "Column Chart".

    The other " Li ne F it Plot" after changing

    its height to a mor e sui table value,

    and converting i t to a " Li ne Chart" .

    -5

    0

    5

    10

    15

    2 3 5 6 4

    y

    x1

    x1 Line Fit Plot

    y

    Predicted y

    -4

    -2

    0

    2

    4

    6

    8

    10

    12

    14

    5 4 2 4 3

    y

    x2 Line Fit Plot

    y

    Predicted y

  • 8/11/2019 ExcelStats2007.xls

    25/26

    Sheet: Regression File: 246707238.xls.ms_office Page 25 of 26

    Residual Plots are shown below.

    One of the " Residual Plots" as produced

    by Excel. Note that the plot is much "flatter"

    than is customary. "Scatter Plots" are easier

    to interpret if they are nearly square.

    The other " Residual Plot" after changing

    its height to a mor e sui table value.

    NOTE:

    Plots of Standardized Residuals are easily

    obtained, if you clicked the Standardized

    Residuals checkbox. Make copies of the

    Residual Plots, and then change the data

    source so that the dependent variable is

    the standardized residuals.

    This plot is an example for variable x2.

    Note that I added gridlines as well.

    -1

    -0.5

    0

    0.5

    1

    1.5

    0 2 4 6 8Residuals

    x1

    x1 Residual Plot

    -0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 1 2 3 4 5 6

    Residuals

    x2

    x2 Residual Plot

    -0.5

    0

    0.5

    1

    1.5

    2

    0 1 2 3 4 5 6Residuals

    x2 Standardized Residual Plot

  • 8/11/2019 ExcelStats2007.xls

    26/26

    Sheet: Regression File: 246707238.xls.ms_office Page 26 of 26

    -1.5

    -1

    x2