ExcelStats.xls

download ExcelStats.xls

of 21

Transcript of ExcelStats.xls

  • 8/11/2019 ExcelStats.xls

    1/21

    Sheet: Introduction File: 246706782.xls.ms_office Page 1 of 21

    ExcelStats.xls Version 7.1 11/4/03Using Excel to do Statistics: Some Helpful Notes

    [email protected]

    Johnson Graduate School of Management

    Cornell University

    Ithaca NY 14853

    This workbook is intended for teaching purposes. You are welcome to use it in any manner,

    and change it as you see fit. It comes without any guarantee whatsoever, and is distributed

    free of charge. Changes are f requent, so check back fr equently f or a new version.

    This workbook tells you how to do a bunch of Statistics calculations using Excel. Excel has an Add-In

    called the Analysis ToolPak. To find out if you have it working, go into the Tools menu and select Add-Ins.

    Here is how I give that sort of instruction in these notes: Menu: Tools, Add-Ins

    A window with the title Add-Inswill appear, listing the Add-Ins you currently have available.

    Menu: Tools, Data Analysis

    Excel has 2 ways to do almost every statistical analysis, and in many cases this workbook illustrates both.

    There are separate sheets for each topic listed below. You can find the sheets by selecting the appropriate

    "tab" at the bottom of the screen.

    Contents:These are the sheets in this workbook.

    Introduction

    Sorting

    Frequencies & GraphsHistogram

    Scatter Plot

    Descriptive Statistics

    Rank & Percentile

    Covariance

    Correlation

    Sampling

    Confidence Intervals

    One-Sample t-tests

    Two-Sample t-tests

    Regression

    Additional Files Available:

    PivotTab.xls: Explains how to use Excel's Pivot Tables to do Cross-Tabs.

    It also contains a macro to do the Chi-squared test for a contingency table.

    PredInt.xls: Contains a Visual Basic macro to do multiple regression with Prediction Intervals,

    a feature that is not included in the Regression tool in the Analysis ToolPak.

    If you see Analysis ToolPakand also Analysis ToolPak - VBAin the list, make sure the boxes next to them have

    check marks. If not, clicking on a box makes the check mark appear. Then click OK.

    If either Analysis ToolPakor Analysis ToolPak - VBAis NOT in the list, you have to install it from the disks or

    CD that came with Microsoft Office or Microsoft Excel. If you don't know how to do that, get help.Once you have "attached" BOTH Analysis ToolPaks (that is, checked the boxes), a new menu item will appear under

    the Tools menu. That item will be used throughout these spreadsheets. Try it now.

  • 8/11/2019 ExcelStats.xls

    2/21

    Sheet: Sorting File: 246706782.xls.ms_office Page 2 of 21

    Sorting a Data Set

    *** Sorting does not require the Data Analysis package.

    Sorting changes the data set. If you want to be able to restore the original order of the data, begin by

    numbering the data points. The first column of the Example Data Set contains these numbers.

    Example Data Set Sorted "Ascending" by a Sorted "Descending" by b

    Numb. a b Numb. a b Numb. a b1 1 2 1 1 2 6 5 11

    2 2 3 2 2 3 8 6 10

    3 3 4 11 2 5 9 5 7

    4 4 3 3 3 4 7 8 6

    5 3 5 5 3 5 5 3 5

    6 5 11 10 3 4 11 2 5

    7 8 6 4 4 3 3 3 4

    8 6 10 6 5 11 10 3 4

    9 5 7 9 5 7 2 2 3

    10 3 4 8 6 10 4 4 3

    11 2 5 7 8 6 1 1 2

    Instructions for the Sort Menu Item

    Select the data set. (Hold down the Left Mouse Button and drag the cursor over cells A6 to C17.)

    Select menu item Data, Sort

    Use the arrow next to the Sort Bywindow to select afrom the pull-down list.

    Click OK

    The results should look like the data in the first

    shaded area next to the Example Data Set above.

    Now repeat the steps, except select bfrom the

    pull-down list, and click the Descendingbutton.

    Click OK

    The results should look like the data in the

    second shaded area next to the Example Data Set.

    To return the data set to its original order, repeat

    the steps, selecting Numb.from the pull-down list,

    and click Ascending.

    The boxes Then Bymay be used to resolve ties.

    For example, select Sort Byaand Then Bybusing Ascendingfor both.

    Compare the results to the first shaded area next to the Example Data Set. The order

    has been arranged so that bis ascending when ais constant. For example, look at the

    three points for which a= 3. The values for bare 4, 4 and 5, whereas in the first shaded area they

    are 4, 5 and 4.

    Header Row No Header Row

    My List Has

    Ascending

    Descending

    Sort By

    a

    Ascending

    Descending

    Then By

    Ascending

    Descending

    Then By

  • 8/11/2019 ExcelStats.xls

    3/21

    Sheet: Frequencies & Graphs File: 246706782.xls.ms_office Page 3 of 21

    Counting and Graphing Frequency of Observations

    Data may or may not be numerical. The four counting functions illustrated below take into account

    COUNTA counts all entries, ignoring blanks.

    COUNT counts only numbers.

    COUNTBLANK counts the number of blank cells.

    COUNTIF counts the number of entries that match a specified condition.

    Data Excel Functions:

    a d a Entries = 15 =COUNTA($A$9:$A$24)

    1 High Numbers = 13 =COUNT($A$9:$A$24)

    2 Low Blanks = 1 =COUNTBLANK($A$9:$A$24)

    3 Med

    4 Med a Freq.

    3 Med 1 3 =COUNTIF($A$9:$A$24, "=" & E13)

    High 2 5 =COUNTIF($A$9:$A$24, "=" & E14)

    5 Low 3 3

    2 Med 4 1

    1 Med 5 1 Range to be Counted:

    2 Low 6 0 The Condition:

    - Low

    ? Med d Entries = 16

    1 High Numbers = 0

    3 Low Blanks = 0

    2 Low

    2 Low d Freq.

    Low 7 =COUNTIF($B$9:$B$24,"=" & E25)

    Med 6 =COUNTIF($B$9:$B$24,"=" & E26)

    High 3 =COUNTIF($B$9:$B$24,"=" & E27)

    Graphing Frequencies

    Frequencies may be graphed in several ways. We will illustrate two kinds of bar charts and a pie chart.

    Standard Bar Chart (Column

    Chart):

    Excel has a Chart Wizard to help you. It works much faster if you select the range that contains your data

    before you start making the graph. So begin by selecting the range E24:F27 above. Then,

    Select menu item Insert, Chart

    Chart Wizard, Step 1 of 4appears showing a selection of chart types.

    Under Chart Type:select Column. Then click on the first Chart sub-type.

    Click Next>

    0

    2

    4

    6

    8

    Low Med High

    Frequency

    Distribution of Shipment Sizes

  • 8/11/2019 ExcelStats.xls

    4/21

    Sheet: Frequencies & Graphs File: 246706782.xls.ms_office Page 4 of 21

    Chart Wizard, Step 2 of 4: The Data rangebox should show the data range you selected before starting.

    Click Next>

    Chart Wizard, Step 3 of 4:

    You can type a title for your chart, and for each axis, if you want to.

    Select the Legendtab. Un-checking Show Legendwill cause the legend to disappear.

    Click Next>

    Chart Wizard, Step 4 of 4allows you to place the chart on the current sheet, or insert a new sheet.

    Don't worry!If you place it on the current sheet, you can move it later. It will not affect any of your data.

    Click Finish

    Now you may move your chart and change its size. To move it, just click once on it and drag it to a

    new location. To change the size, click once on it and use the "handles" (little black boxes) on the corners.

    Horizontal Bar

    Chart with Stacked

    Bars:

    Select the range that contains your data and labels. This is E24:F27 in the example.

    Select menu item Insert, Chart

    Chart Wizard, Step 1 of 4: Under Chart Type select Bar,and select the secondChart Sub-type.

    Click Next>

    Chart Wizard, Step 2 of 4: The Data Rangeshould show all the data, including the labels.

    Important: Click the Rowsbutton.

    Click Next>

    Chart Wizard, Step 3 of 4:

    On the Axestab, un-check Category (X) axis.

    On the Data Labelstab, select Show Value.

    Click Finish

    ='Frequencies & Graphs'!$E$24:$F$27Data range:

    Rows

    Columns

    Data Series in:

    ='Frequencies & Graphs'!$E$24:$F$27Data range:

    7 6 3

    0 5 10 15 20

    Freq.

    Low

    Med

    High

  • 8/11/2019 ExcelStats.xls

    5/21

  • 8/11/2019 ExcelStats.xls

    6/21

    Sheet: Histogram File: 246706782.xls.ms_office Page 6 of 21

    Histograms

    The Histogram tool in the Data Analysis package is a fast way to get a picture and table of the distribution

    of your data. An example is shown below, together with the built-in Excel functions that give the

    same information. The Histogram tool cannotdescribe more than one variable at a time.

    The chart created by the Histogram tool usually needs to be modified. I have done so in the example.

    Instructions for modifying the appearance are given in the last part of this note.

    Data Output from Histogram Tool Excel Functions:

    a Bin Frequency Bin Freq. Cumul.

    1 1 4 1 4 4

    2 2 6 2 6 10

    3 3 4 3 4 14

    4 4 1 4 1 15

    3 More 1 1000 1 16

    1

    5

    2

    1

    2

    3

    2

    1

    3

    2

    2

    Instructions for the Histogram Data Analysis Tool.

    Select menu item Tools, Data AnalysisSelect Histogram(double-click on it, or select OK)

    Select the Input Range window, and either type or select the area that contains the data.

    On the first try, leave Bin Rangeblank. Later you may

    wish to customize the histogram by putting a range

    into this box (an example is given later).

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output. For example, if you type D8, the output will begin at

    cell D8 and continue down and to the right.

    Check Chart Outputif you want Excel to create a graph.

    Click OK

    D8

    New Worksheet Ply:

    New Workbook

    Chart Output

    Cumulative Percentage

    $B$8:$B$24Input Range:

    Input

    Bin Range:

    Labels

    Pareto (sorted histogram)

    Output Range:

    Output Options

    0

    1

    2

    3

    4

    5

    6

    7

    1 2 3 4 More

    Frequency

    Bin

  • 8/11/2019 ExcelStats.xls

    7/21

    Sheet: Histogram File: 246706782.xls.ms_office Page 7 of 21

    Improving the appearance of the histogram:

    The chart above, created by the Histogram tool, has been modified to look better.

    First, I changed its shape:

    Single-click on the chart and drag one of the "handles" (little boxes in the corners).

    Then, I changed what was displayed inside the chart:

    Delete the title (right-click on it and select Clear). Delete the legend (same way).

    Stretch the plot area to fill the box (click on the grey area and drag the handles).

    Formatting the numbers on the axes:

    Sometimes the histogram tool creates bins with many more decimal places than is necessary. This

    has an unfortunate effect on the appearance of the horizontal axis, but it is easy to fix.

    Since the problem did not occur in the example above, we first have to create the problem and then fix it.

    To create the problem:

    Select cell D9and enter the number 1.23456789

    Now look at the graph. Notice how the display has changed on the horizontal axis. Not very pretty, is it?

    To fix the problem:

    The format of the numbers on the chart is the same as their format on the spreadsheet. Therefore,

    Select the range of numbers below the word "Bin" (cells D8:D13 in the example above).

    Menu: Format, Cells, Number,

    then select Numberfrom the list of options and change the decimal places to 2.

    Notice that the numbers in the graph are now displayed with 2 decimal places, which looks better.

    You can use this method for any axis on any Excel graph, displaying however many decimal places

    are appropriate for the situation.

    Using Bins that You Choose

    To tell Excel what bins you want to use for the data,

    put the Bin Rangein this box.

    Notice that I had to include one cell abovethe

    desired range of bins, because the "Labels" box is

    checked.

    Output from Histogram Tool

    Desired Bins Bin Frequency

    2 2 10

    4 4 5

    6 6 1

    More 0

    D75

    New Worksheet Ply:

    New Workbook

    Chart Output

    Cumulative Percentage

    $B$8:$B$24Input Range:

    Input

    Bin Range: $C$75:$C$78

    Labels

    Pareto (sorted histogram)

    Output Range:

    Output Options

  • 8/11/2019 ExcelStats.xls

    8/21

    Sheet: Scatter Plot File: 246706782.xls.ms_office Page 8 of 21

    Scatter Diagrams (Scatter Plots)

    Scatter Plots offer a way to visualize the relationship between two variables. Excel's Chart Wizard

    makes it fairly easy to construct one. An example is shown below.

    Example Data Set:

    a y b

    1 33 2

    2 23 3

    3 14 4

    4 55 3

    3 3 5

    5 44 11

    8 35 6

    6 98 10

    5 41 7

    3 77 4

    2 8 5

    Instructions for Scatter Plots

    Follow these steps to reproduce the chart above. Notice that it plots aand b, but that in the data, variable

    y is in the column between aand b.

    Begin by selecting the data range. Click on cell A6. Then, holding down the Cntlkey, click and drag to cell

    A17; then, continuing to hold Cntl, click on C6 and drag to C17. This selects both aand b, leaving out y.

    Excel calls this "non-contiguous range selection."

    Select menu item Insert, Chart

    Chart Wizard, Step 1 of 4shows a selection of chart types.

    Under Chart Type:select XY (Scatter). Then click on the first Chart sub-type.

    Click Next>

    Chart Wizard, Step 2 of 4: The Data rangebox should show the data range you selected before starting.

    Note the comma between the two ranges.

    Click Next>

    Chart Wizard, Step 3 of 4:

    Select the Titlestab. Type a title for your chart, and for each axis.

    Select the Legendtab. Un-checking Show Legendwill cause the legend to disappear.

    Select the Grid Linestab. Check Major Gridlinesfor both the X axis and the Y axis.

    Click Next>

    Chart Wizard, Step 4 of 4allows you to place the chart on the current sheet, or insert a new sheet. Don't worry!You can move it later, and it will not affect any of your data.

    Click Finish

    Now you may move your chart and change its size. To move it, just click once on it and drag it to a

    new location. To change the size, click once on it and use the "handles" (little black boxes).

    0

    2

    4

    6

    8

    10

    12

    0 2 4 6 8 10

    b

    a

    Plot of a vs b

    ='Scatter Plot'!$A$6:$A$17 , 'Scatter Plot'!$C$6:$C$17Data range:

  • 8/11/2019 ExcelStats.xls

    9/21

    Sheet: Descriptive Stats File: 246706782.xls.ms_office Page 9 of 21

    Descriptive Statistics

    The Descriptive Statistics tool in the Data Analysis package is a fast way to get a bunch of numbers that

    describe your data. An example is shown below, together with the built-in Excel functions that give the

    same information. Copy the Excel Functions to the next column to get a description of variable b.

    Example Data Set Output from Descriptive Statistics Tool Excel Functions:

    a b a a1 2 Mean 3.818182 3.818182

    2 3 Standard Error 0.615234 0.615234

    3 4 Median 3 3

    4 3 Mode 3 3

    3 5 Standard Deviation 2.040499 2.040499

    5 11 Sample Variance 4.163636 4.163636

    8 6 Kurtosis 0.260801 0.260801

    6 10 Skewness 0.730477 0.730477

    5 7 Range 7 7

    3 4 Minimum 1 1

    2 5 Maximum 8 8Sum 42 42

    Count 11 11

    Largest(2) 6 6

    Smallest(2) 2 2

    Confidence Level(95.0%) 1.370826 1.370826

    Instructions for the Descriptive Statistics Data Analysis Tool.

    Select menu item Tools, Data Analysis

    Select Descriptive Statistics(double-click on it, or select OK)

    Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    If your input range includes names for the variables,

    select the Labels In checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Rangebutton,

    click on the window next to that button and

    either type in or select a location for the output.

    (If you type E6, the output will begin at cell E6

    and continue down and to the right.)

    Most Important:Check the Summary Statisticsbox.

    Confidence Level for the Meanbox gives a

    "Confidence Level" in the output, which is equal to

    half of the width of a confidence interval.

    Kth Largest orKth Smallest: Checking the boxes and entering "2" as shown would cause the output

    to include the second smallest and second largest values in the data set.

    Click OK

    Output Range: E6

    New Worksheet Ply:

    New Workbook

    Output Options

    Summary statistics

    $A$9:$A$20

    Columns

    Input Range:

    Rows

    Input

    Confidence Level for Mean: 95Kth Largest: 2

    Kth Smallest: 2

    Labels in First Row

    %

    Grouped By:

    Descriptive Statistics

  • 8/11/2019 ExcelStats.xls

    10/21

    Sheet: Rank & Percentile File: 246706782.xls.ms_office Page 10 of 21

    Rank and Percentile

    The Rank and Percentile tool in the Data Analysis package is a fast way to get a copy of your data,

    sorted from largest to smallest, with the associated ranks. An example is shown below.

    The Excel functions don't exactly repeat what the tool does. The tool begins by numbering the data

    points, then sorting them in descending order, and finally inserting ranks and percent ranks. The two

    related Excel functions, RANK() and PERCENTRANK() are shown for the first 3 data points. The

    first point in the Data is a =2, and this value is tied for 7th rank. That puts it at the 26.6 percentile

    of the data. The second data point is a =4, which is in sole posession of rank of 2, percentile 93.3.

    Data Related Excel Functions Output from Rank and Percentile Tool

    a Point a Rank Percent

    1 2 7 26.60% 7 5 1 100.00%

    2 4 2 93.30% 2 4 2 93.30%

    3 3 3 66.60% 3 3 3 66.60%

    4 1 5 3 3 66.60%

    5 3 11 3 3 66.60%

    6 1 14 3 3 66.60%

    7 5 1 2 7 26.60%8 2 8 2 7 26.60%

    9 1 10 2 7 26.60%

    10 2 12 2 7 26.60%

    11 3 15 2 7 26.60%

    12 2 16 2 7 26.60%

    13 1 4 1 13 .00%

    14 3 6 1 13 .00%

    15 2 9 1 13 .00%

    16 2 13 1 13 .00%

    Instructions for the Rank and Percentile Data Analysis Tool.Select menu item Tools, Data Analysis

    Select Rank and Percentile(double-click on it, or select OK)

    Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.

    Make sure that the Output Rangedoes notoverlap with

    the Input Range.

    Click OK

    $B$10:$B$26

    Columns

    Input Range:

    Grouped By:Rows

    Labels in First Row

    Input

    Output Range: E11New Worksheet Ply:

    New Workbook

    Output Options

    =PERCENTRANK $B$11:$B$26 $B11

    =RANK($B11,$B$11:$B$26)

  • 8/11/2019 ExcelStats.xls

    11/21

    Sheet: Covariance File: 246706782.xls.ms_office Page 11 of 21

    Sample Covariance

    Covariance measures the degree to which things "vary together". In that regard it is almost the

    same as correlation (see the next page). In fact, correlation is more useful for quantifying the

    relationship between two variables. The most common use of Covariance is when you are adding

    two random variables, such as when you are forming a portfolio of different stocks.

    Excel offers two ways to estimate the covariance between pairs of variables. Unfortunately, they are both

    "biased estimators" that divide by n rather than n-1. To obtain the "unbiased sample estimate", multiply by n/(n-1).

    (Previous versions of Excel used the unbiased method in the Data Analysis Tool.)

    The (n-1) method is almost* always preferred. (Use n if the data set is the entire population.)

    The table beginning at cell F20 shows how the built-in Excel function can be modified to use (n-1).

    Example Data Set with n= 4

    a b c Covariance Data Analysis Tool

    1 2 5 a b c

    2 3 4 a 1.25

    3 5 2 b 1 1.25

    4 4 3 c -1 -1.25 1.25

    Covariance Excel Function (denominator = n) Covariance Excel Function multiplied by n/(n-1)

    a b c a b c

    a 1.25 a 1.6666667

    b 1 1.25 b 1.3333333 1.6666667

    c -1 -1.25 1.25 c -1.333333 -1.666667 1.6666667

    Instructions for the Covariance Data Analysis Tool.

    Select menu item Tools, Data Analysis

    Select Covariance(double-click on it, or select OK)

    Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    Otherwise, select the Rowsbutton.

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.

    For example, if you type F15, the output will begin at

    cell F15 and continue down and to the right.

    Make sure that the Output Rangedoes notoverlap with

    the Input Range.

    Click OK

    $B$14:$D$18

    Columns

    Input Range:

    Grouped By:Rows

    Labels in First Row

    Input

    Output Range: F15

    New Worksheet Ply:

    New Workbook

    Output Options

  • 8/11/2019 ExcelStats.xls

    12/21

    Sheet: Correlation File: 246706782.xls.ms_office Page 12 of 21

    Sample Correlation

    Excel offers two ways to estimate the correlation between pairs of variables. The value of correlation

    is between -1 and +1.Positive correlation means that the variables tend to move in the same

    direction. That is, if one variable is above its mean, the other one is likely to be above its mean, too.

    Height and weight of people are positively correlated, because very tall people usually weigh more

    than very short people. Note that this is not always true, so the correlation is less than +1.0.

    Negative correlation means that they tend to move in opposite directions. Mountain climbers know

    that there is a negative correlation between altitude and stamina, because of decreasing oxygen.

    Correlation of +1 or -1 means that the relationship between the two variables is perfectly linear.

    When this happens, a "scatter plot" of the two variables yields a straight line. In the example below,

    variables b and c have correlation of -1.

    Example Data Set: Correlation Data Analysis Tool:

    a b c a b c

    1 2 5 a 1

    2 3 4 b 0.8 1

    3 5 2 c -0.8 -1 1

    4 4 3

    Correlation Excel Function:

    a b c

    a 1

    b 0.8 1

    c -0.8 -1 1

    Instructions for the Correlation Data Analysis Tool.

    Select menu item Tools, Data Analysis

    Select Correlation(double-click on it, or select OK)Select the Input Rangewindow, and either type or select the area that contains the data.

    If your data is arranged so that each vertical column

    represents a variable, select the Columnsbutton.

    Otherwise, select the Rowsbutton.

    If your area includes names for the variables,

    select the Labels checkbox.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.Make sure that the Output Rangedoes notoverlap

    with the Input Range.

    Click OK

    0

    2

    4

    6

    0 2 4 6

    b

    a 0

    2

    4

    6

    0 2 4 6

    c

    b

    $B$14:$D$18

    Columns

    Input Range:

    Grouped By:

    Rows

    Labels in First Row

    Input

    Output Range: F14

    New Worksheet Ply:

    New Workbook

    Output Options

  • 8/11/2019 ExcelStats.xls

    13/21

    Sheet: Sampling File: 246706782.xls.ms_office Page 13 of 21

    Random Sampling

    Example Data Set: Example After Sorting:

    FUND RandNo FUND RandNo

    Benchmarrk Div 0.999969 Freedom Cash 0.078951

    Bradford 0.172857 Capital Cash 0.082888

    BT INstit Treas 0.263466 Fortis 0.110691

    Capital Cash 0.082888 Flex-fund 0.119541

    Fidelity Cash 0.275826 Nationwide 0.165838

    Flex-fund 0.119541 Bradford 0.172857

    Fortis 0.110691 MarketWatch 0.183844

    Freedom Cash 0.078951 Piermont Money 0.220191

    Galaxy Money 0.291818 BT INstit Treas 0.263466

    MarketWatch 0.183844 Fidelity Cash 0.275826

    Nationwide 0.165838 NCC Funds 0.27604

    NCC Funds 0.27604 Galaxy Money 0.291818

    Piermont Money 0.220191 Benchmarrk Div 0.999969

    To select a random sample of size n,

    Put random numbers into the column next to the data set (instructions given below).Select the fi rst random numberand then go to the Standard Toolbar and press this button:(or use Menu: Data, Sort, Ascending. See instructions on worksheet Sorting.)

    Your sample is the first n rows.

    Here is how to put random numbers into cells B17:B29:

    Menu: Tools, Data Analysis, Random Number Generation Number of Variables: (leave blank)Number of Random Numbers: (leave blank)Distribution: UniformParameters Between:0and 1Output Range: B17:B29

    For a Sample of 8,

    choose the first 8 after

    sorting on the Random

    Numbers.

    A

    Z

  • 8/11/2019 ExcelStats.xls

    14/21

    Sheet: Confidence Intervals File: 246706782.xls.ms_office Page 14 of 21

    Confidence Intervals

    There are two ways to do confidence intervals: useBuilt-in Excel functions , or use information from

    theDescriptive Statistics toolin the Data Analysis package. They are both illustrated below.

    Confidence Intervals from the Descriptive Statistics Data Analysis Tool.

    First,generate the descriptive statistics (see the Descriptive Statssheet in this workbook): Menu: Tools, Data Analysis, Descriptive Statistics

    Select your data range,

    Check the Confidence Level for the Meanbox and enter your desired confidence level in the box,

    Check the Summary Statisticsbox.

    Click OK. You should get the output shown below.

    Example Data Set Output from Descriptive Statistics Tool

    a b a

    1 2 Mean 3.818182

    2 3 Standard Error 0.615234

    3 4 Median 3

    4 3 Mode 3

    3 5 Standard Deviation 2.040499

    5 11 Sample Variance 4.163636

    8 6 Kurtosis 0.260801

    6 10 Skewness 0.730477

    5 7 Range 7

    3 4 Minimum 1

    2 5 Maximum 8

    Sum 42

    Count 11

    Confidence Level (95.0% 1.370826

    Then, to get the confidence interval, add and subtract the "Confidence Level"from the "Mean".

    Calculations: Lower Confidence Limit: 2.447356 = E15 - E28

    Upper Confidence Limit: 5.189008 = E15 + E28

    Interpretation:We have 95% confidence that the population mean for variable a

    is in the interval 2.447 to 5.189."

    Confidence Intervals using Built-in Excel Functions.

    Basic Calculations: Average: 3.818182 =AVERAGE(A15:A25)

    Standard Deviation: 2.040499 =STDEV(A15:A25)Sample Size, n: 11 =COUNT(A15:A25)

    Probability Calculations: Confidence: 0.95

    Student's t (2-tail): 2.228139 =TINV(1-E42,E41-1)

    The Confidence Interval:Lower Confidence Limit: 2.447356 =E39-E43*E40/SQRT(E41)

    Upper Confidence Limit: 5.189008 =E39+E43*E40/SQRT(E41)

  • 8/11/2019 ExcelStats.xls

    15/21

    Sheet: One-Sample t-tests File: 246706782.xls.ms_office Page 15 of 21

    One-Sample t-Test

    The easiest way to do a One-Sample t-Test in Excel is to use a Confidence Interval. However, this method

    does not give a p-value directly. The second method is to construct the test statistic and compare it to a

    criti cal value. The test statistic can be used to compute a p-value. Both methods are illustrated below.

    One-Sample t-Tests using Confidence IntervalsTwo-tail test: set up a (1 - a) confidence interval (see the sheet Confidence I ntervalsfor instructions)

    and rejectH0(the Null Hypothesis) if the value specified in H0is outside the confidence interval.

    One-tail test: use a confidence level of (1 - 2a) and rejectH0if the value specified in H0is

    outside of the confidence interval in the direction predicted by the Alternative Hypothesis , Ha.

    Example Data

    a Calculation of the Confidence I ntervals:

    1 Two-tail test:For a= 0.05, One-tail test:For a= 0.05,

    2 set up a 95% confidence interval: set up a 90% confidence interval:

    3 Average: 3.8181818 Average: 3.8181818

    4 Standard Deviation: 2.040499 Standard Deviation: 2.040499

    3 Sample Size, n: 11 Sample Size, n: 11

    5 Confidence: 0.95 Confidence: 0.90

    8 Student's t (2-tail): 2.2281389 Student's t (2-tail): 1.8124611

    6 Lower Confidence Limit: 2.4473559 Lower Confidence Limit: 2.7030948

    5 Upper Confidence Limit: 5.1890077 Upper Confidence Limit: 4.9332688

    3

    2

    Hypothesis Tests using Conf idence I ntervals:

    In the following examples, assume that 4.4 has been given as the value to use in the null hypothesis.(This is the value often referred to as m0. Thus, m0= 4.4 in the examples.)

    Two-tail test: One-tail test:

    Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

    Reject H0if 4.4 is outside the confidence interval. Reject H0if 4.4 is above the upper confidence limit.

    Result: 4.4 is between 2.447 and 5.189, Result: 4.4 is not above 4.933,

    so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

    Example:H0: m> 4.4, Ha: m< 4.4

    Reject H0if 4.4 isbelow the lower confidence limit.

    Result: 4.4 is not below 2.703

    so the null hypothesis is NOTrejected.

  • 8/11/2019 ExcelStats.xls

    16/21

    Sheet: One-Sample t-tests File: 246706782.xls.ms_office Page 16 of 21

    One-Sample t-Tests using the Test Statistic

    The test statistic is (sample average - m0)/(standard error). The cri tical levelis the value from the

    Student's t distribution. There are two ways to test the hypothesis (they give the same result):

    Hypothesis test using the Test Statistic:

    For 2 tails, rejectH0if the test statistic is larger in absolute value than the critical level.

    For 1 tail, rejectH0if the test statistic is larger than the critical level in the direction predicted by Ha.

    Hypothesis test using the P-values:

    For 2 tails, rejectH0if thep-value is smaller than a.

    For 1 tail, rejectH0if thep-value is smaller than aand the direction is consistent with Ha.

    Basic Calculations: Average: 3.8181818

    Standard Deviation: 2.040499

    Sample Size, n: 11

    Probability Calculations: Hypothesized Value, m0: 4.4

    a: 0.05

    t-ratio, or Test Statistic: -0.945687

    p-value, one-tail: 0.183299t Critical one-tail: 1.8124611

    p-value, two-tail: 0.3665981

    t Critical two-tail: 2.2281389

    Tests using the Test Statistic (t-r atio):

    Two-tail test: One-tail test:

    Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

    Result: absolute value of t-ratio of 0.94569 is Result: sample average of 3.818 is below 4.4,

    smaller than the critical value of 1.81246, which IS NOTconsistent with Ha,

    so the null hypothesis is NOTrejected. so the null hypothesis is NOTrejected.

    Example:H0: m> 4.4, Ha: m< 4.4

    Result: sample average of 3.818 is below 4.4,

    which IS consistent with Ha,

    but the absolute value of the t-ratio of 0.94569 is

    smaller than the critical value of 2.22814

    so the null hypothesis is NOTrejected.

    Same Tests, using the p-value

    Two-tail test: One-tail test:

    Example:H0: m= 4.4, Ha: m< > 4.4 (not equal) Example:H0: m< 4.4, Ha: m> 4.4

    Result: p-value of 0.3666 is larger than 0.05 Result: sample average of 3.818 is below 4.4,

    so the null hypothesis is NOTrejected. which IS NOTconsistent with Ha,so the null hypothesis is NOT rejected.

    Example:H0: m> 4.4, Ha: m< 4.4

    Result: sample average of 3.818 is below 4.4,

    which IS consistent with Ha,

    but the p-value of 0.1833 is larger than 0.05,

    so the null hypothesis is NOTrejected.

  • 8/11/2019 ExcelStats.xls

    17/21

    Sheet: Two-Sample t-tests File: 246706782.xls.ms_office Page 17 of 21

    Two-Sample t-Tests.

    There are three t-Tests in the Excel Data Analysis Tools, and each has a corresponding built-in function.

    Data Analysis Tool: Excel Spreadsheet Formula:

    t-Test: Paired Two-Sample for Means = TTEST(Array1, Array2, Tails, 1)

    t-Test: Two-Sample Assuming Equal Variances = TTEST(Array1, Array2, Tails, 2)

    t-Test: Two-Sample Assuming Unequal Variances = TTEST(Array1, Array2, Tails, 3)

    The formulas give only the p-value for the test. The Data Analysis Tools give the complete analysis.

    t-Tests using the built-in function TTEST(array1, array2, tails, type)

    Example Data Set Paired Equal s Unequal s

    a b Hypothesized Difference 0 0 0

    1 2 p-value, one-tail: 0.016746 0.069742 0.0705883

    2 3 p-value, two-tail: 0.033492 0.139485 0.1411765

    3 4

    4 3 TTEST(Array1, Array2, Tails, Type)

    3 5 Array1 is the first data set.

    5 11 Array2 is the second data set.

    8 6 Tails = 1 for a one-tail test, or 2 for a two-tail test

    6 10 Type = 1 for a Paired Two-sampletest

    5 7 Type = 2 for a Two-sample test assuming Equalvariance

    3 4 Type = 3 for a Two-sample test assuming Unequalvariance

    2 5 Example: = TTEST( $A$14:$A$24, $B$14:$B$24, 2, 1)

    t-Tests using the t-Test Data Analysis toolsAt this point you should be familiar with how to use the input boxes, so here is a brief list of the steps.

    Menu: Tools, Data Analysis, t-Test: Two-Sample Assuming Equal Variances

    Put the addresses of the two variables in their respective Input Rangeboxes.

    In the Hypothesized Mean Differencebox,

    If your "null hypothesis" is that the two population means are equal, leave the box blank.

    If your "null hypothesis" is that the two population means are different by a specified amount:

    First, make sure that the variable hypothesized to have the larger mean is "Variable 1".

    If not, go back and re-do the Input Range boxes.

    Then, type the hypothesized difference in the Hypothesized Mean Differencebox.

    For example, if the null hypothesis states that Variable 1's population mean is 7.4 units

    larger than Variable 2's, enter 7.4 in the Hypothesized Mean Differencebox.

    If your Variable Ranges include a name for each variable, Check the Labelsbox.

    The Alphabox is where you enter the type I error probability. (Excel's output does not report this

    value, so be sure to note what value you used.)

    Enter your Output Optionsin the usual way, and click OK.

    Examples of each of the Tools are given on the next 2 pages.

    Select a cell to see theTTEST formula.

    2-tails Paired Two-Sample testArray 1 Array 2

  • 8/11/2019 ExcelStats.xls

    18/21

    Sheet: Two-Sample t-tests File: 246706782.xls.ms_office Page 18 of 21

    t-Test: Paired Two Sample for Means, Hypothesized Diff. = 0

    a b

    Mean 3.818182 5.454545

    Variance 4.163636 8.272727

    Observations 11 11

    Pearson Correlation 0.645926

    Hypothesized Mean Difference 0df 10

    t Stat -2.46321 Built-in function TTEST

    p-value, one-tail: P(T

  • 8/11/2019 ExcelStats.xls

    19/21

    Sheet: Two-Sample t-tests File: 246706782.xls.ms_office Page 19 of 21

    t-Test: Two-Sample Assuming Unequal Variances, Hypothesized Diff. = 0

    a b

    Mean 3.818182 5.454545

    Variance 4.163636 8.272727

    Observations 11 11

    Hypothesized Mean Difference 0

    df 18t Stat -1.53897 Built-in function TTEST

    p-value, one-tail: P(T

  • 8/11/2019 ExcelStats.xls

    20/21

    Sheet: Regression File: 246706782.xls.ms_office Page 20 of 21

    Regression

    Regression is a method to fit a linear function to a data set.

    The objective is to estimate values of b0, b1 and b2 in the following equation:

    y = b0 + b1 x1 + b2 x3

    In this equation, y is called theDependent Variable (sometimes called the Criterion Variable )

    x1 and x2 are called theIndependent Variables (orPredictor Variables ),

    b0, is called the "intercept", and b1 and b2 are the "slopes".

    Collectively, b0, b1 and b2 are referred to as the Coefficients. (This is their label in the output.)

    The results for the Example Data Set are shown below the instructions.

    Example Data Set:

    y x1 x2

    -3 2 5

    2 3 4

    11 5 2

    9 6 4

    8 4 3

    Instructions for the Regression Data Analysis Tool.

    Select menu item Tools, Data Analysis

    Select Regression(double-click on it, or select OK)

    Select the Input Y Rangewindow, and select the area that contains the Dependent Var iable.

    Select the Input X Rangewindow, and select the area that contains the I ndependent Var iable(s).

    (If there are 2 or more Independent Variables, they

    must be side-by-side in the worksheet.)

    You may specify Constant is Zeroto force b0

    (the intercept) to be zero.

    If the first row of your area contains names for the

    variables, select Labels in the First Row.

    You may set a Confidence Levelfor the confidence

    intervals for the coefficients.

    If you want the results to be written on the current

    worksheet, select the Output Range

    button, then click on the window next to that button

    and either type in or select a location for the output.

    Make sure that the Output Rangedoes notoverlap

    with the Input Range.

    Select additional output and plots that you would like.

    Click OK

    Note: Graphs produced by Excel's Regression program are badly sized. However, it is easy to change the size

    by clicking on the graph and dragging one of the corner "handles". An example is given below the output.

    $B$14:$B$18Input Y Range:

    Input X Range:

    Labels in First Row

    Input

    Output Range: $A$44

    New Worksheet Ply:

    New Workbook

    Output Options

    Confidence Level %

    Constant is Zero

    $C$14:$D$18

    95

    Residuals

    Residuals

    Standardized Residuals

    Residual Plots

    Line Fit Plots

    Normal Probability

    Normal Probability Plots

  • 8/11/2019 ExcelStats.xls

    21/21

    Sheet: Regression File: 246706782.xls.ms_office Page 21 of 21

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R 0.99322

    R Square 0.986486

    Adjusted R 0.972973

    Standard E 0.948683Observatio 5

    ANOVA

    df SS MS F ignificance F

    Regressio 2 131.4 65.7 73 0.0135135

    Residual 2 1.8 0.9

    Total 4 133.2

    Coefficientsandard Err t Stat P-value Lower 95% Upper 95% ower 95.0% Upper 95.0%

    Intercept 5.2 2.894823 1.79631 0.2142827 -7.2554266 17.655427 -7.2554266 17.6554266

    x1 2.3 0.360555 6.379052 0.0237044 0.7486554 3.8513446 0.7486554 3.851344584x2 -2.5 0.5 -5 0.0377496 -4.6513279 -0.3486721 -4.6513279 -0.348672137

    RESIDUAL OUTPUT

    bservatio Predicted y Residuals

    1 -2.7 -0.3

    2 2.1 -0.1

    3 11.7 -0.7

    4 9 5.33E-155 6.9 1.1

    One of the " L ine Fi t Plots" as produced

    by Excel. Note that the plot is much

    flatter than is customary.

    The other " Li ne F it Plot" after changing

    its height to a mor e sui table value.

    Note that you can see the difference

    between "y" and "Predicted y" in

    this graph, whereas they are "on top of

    each other" in the other graph.

    Increasing the height reveals the errors

    (or residuals).

    -5

    0

    5

    10

    15

    0 2 4 6 8

    y

    x1

    x1 Line Fit Plot

    y Predicted y

    -4

    -2

    0

    2

    4

    68

    10

    12

    14

    0 2 4 6

    y

    x2

    x2 Line Fit Plot

    y Predicted y