mba15 -1.ppt

download mba15 -1.ppt

of 44

Transcript of mba15 -1.ppt

  • 7/25/2019 mba15 -1.ppt

    1/44

    Business Statistics

  • 7/25/2019 mba15 -1.ppt

    2/44

    Why statistics?

    Decision making is often based on

    analysis of data.

    Statistics helps you to make sense of thedata by using tools that summarize,

    present and analyze the data.

    Decision maker can also ascertain the

    confidence in the decisions.

  • 7/25/2019 mba15 -1.ppt

    3/44

    Eamples

    !o" many ne"spapers should the #endor stock

    to maimize re#enue?

    $ Depends on the probability distribution of demand and

    epected profit %re t"o or more market segments significantly

    different?

    $ !ypothesis testing

    What proportion of people are happy "ith the

    Sith&pay commission report?

    $ 'arameter estimation

  • 7/25/2019 mba15 -1.ppt

    4/44

    Sample #s. 'opulation

    'opulation is the entire group(collection ofindi#iduals(ob)ects(things that "e "antinformation about.

    Sample is part of the population that "e actuallyeamine to gather information.

    Eample$ We "ish to find the a#erage di#idend percentage of

    all companies traded at *SE. %ll stocks traded at *SE comprises population

    +- of the stocks selected for gathering information is thesample

  • 7/25/2019 mba15 -1.ppt

    5/44

    Inferential Statistics Predict and forecast

    values of populationparameters

    Test hypotheses about

    values of population

    parameters

    Make decisions

    Descriptive Statistics Collect

    Organize

    Summarize

    Display

    nalyze

    Subdivision within Statistics

  • 7/25/2019 mba15 -1.ppt

    6/44

    Descripti#e statistics

    & data and freuency distribution /he follo"ing are the departure delay in minutes of 01 flights selected

    at random from a particular airport.

    + +1 02

    +3 4 0

    +3

    1 02

    52 34 67

    0 07 22

    26 2

    02 2 17

    2 +2 16

    30 +1 12

    04 0 12

    2 01 04

    23 00 13

    26 06 11

  • 7/25/2019 mba15 -1.ppt

    7/44

    8reuency Distribution

    /able "ith t"o columns listing9

    Each and e#ery group or class or inter#al of #alues

    %ssociated frequency of each group

    *umber of obser#ations assigned to each group Sum of freuencies is number of obser#ations

    :lassmidpoint is the middle #alue of a group or class or

    inter#al

    Relative frequencyis the percentage(proportion of totalobser#ations in each class

    Sum of relati#e freuencies ; +

  • 7/25/2019 mba15 -1.ppt

    8/44

    8reuency distribution

    Delay inminutes

    8reuency

  • 7/25/2019 mba15 -1.ppt

    9/44

    8reuency distribution& histogram

  • 7/25/2019 mba15 -1.ppt

    10/44

    /"o #ariable freuency distribution

    &cross tabulation

    % )oint freuency distribution of t"o #ariables =e.g. o"nership of airline, delay

    in minutes>

  • 7/25/2019 mba15 -1.ppt

    11/44

    Descripti#e statistics & measures

    easures of @ocation

    easures of Aariability

    Ske"ness and urtosis%ssociation bet"een t"o #ariables

  • 7/25/2019 mba15 -1.ppt

    12/44

    easures of @ocation

    %rithmetic ean

    edian

    ode 'ercentiles

    Cuartiles

  • 7/25/2019 mba15 -1.ppt

    13/44

    %rithmetic mean

    /he mean of a data set is the a#erage

    of all the data #alues.

    x xn

    i=x xn

    i=

    =xN

    i=xN

    i

    Sample mean

    'opulation mean

  • 7/25/2019 mba15 -1.ppt

    14/44

    ean $ eample

    %#erage delay in flight departure

    xx; +320(01 ; 31.134+ minutes

  • 7/25/2019 mba15 -1.ppt

    15/44

    edian

    t is the middle item in a data set that isarranged in ascending(descending order

    f there are n obser#ations then the

    edian ; =n+>(1 th obser#ation.

    computation rule

    if n is odd then =n+>(1 is an integer if n is e#en then use a#erage of n(1 and n(1 + th

    obser#ation

  • 7/25/2019 mba15 -1.ppt

    16/44

    Eample

    Sorted 01

    obser#ations

    median is a#erage of

    1+stand 11ndobser#ation

    ; =3034>(1

    ; 36

    11 02 13 06

    12 07

    12 04

    0 16 04

    2 17 2

    4 34 2

    + 38 2

    +1 0 23

    +1 0 22

    +3 01 26

    +3 00 26

    +2 02 67

    1 02 52

  • 7/25/2019 mba15 -1.ppt

    17/44

    ode

    ode is the highest occurring obser#ation

    $ mode in the eample is

    /he greatest freuency can occur at t"oor more different #alues.

    f the data ha#e eactly t"o modes, the

    data are bimodal.

    f the data ha#e more than t"o modes, the

    data are multimodal.

  • 7/25/2019 mba15 -1.ppt

    18/44

    Fi#en any set of ordered numerical

    obser#ations /he Pth percentilein the orderedset is that

    #alue belo" "hich lie P- =Ppercent> of the

    obser#ations in the set.

    /he positionof the Pthpercentile is gi#en by (n+

    1)P1!!, "here nis the number of obser#ations inthe set.

    'ercentiles and Cuartiles

  • 7/25/2019 mba15 -1.ppt

    19/44

    Eample

    :alculate 02thpercentile of the airline

    delay data

    the position of 02thpercentile is

    02G=01+>(+ ; +5.32th

    #alue of 02thpercentile

    ; +5th

    obser#ation .32 of =1 $ +5>thobser#ation

    ; 16.32 =16 .32=17&16>>

  • 7/25/2019 mba15 -1.ppt

    20/44

    Cuartiles

    Cuartiles are special names to percentiles

    C+ ; 12thpercentile

    C1 ; 2th

    percentile ; median C3 ; 72thpercentile

  • 7/25/2019 mba15 -1.ppt

    21/44

    easures of Aariability

  • 7/25/2019 mba15 -1.ppt

    22/44

  • 7/25/2019 mba15 -1.ppt

    23/44

    nteruartile range

    /he interuartile range of a data set is the

    difference bet"een the third uartileand the first

    uartile.

    t is the range for the middle 2- of the data. t o#ercomes the sensiti#ity to etreme data

    #alues.

  • 7/25/2019 mba15 -1.ppt

    24/44

    Aariance

    /he #ariance is a measure of #ariability

    that utilizes all the data.

    t is based on the difference bet"een the

    #alue of each obser#ation =xi> and the

    mean =xfor a sample, for a population>.

    2

    2

    = ( )xNi 2

    2

    = ( )xNi s xi x

    n2

    2

    1=

    ( )s xi x

    n2

    2

    1=

    ( )H & 'opulation #ariance

    Sample #ariance & I

  • 7/25/2019 mba15 -1.ppt

    25/44

    Standard de#iation

    /he standard de#iation of a data set is thepositi#e suare root of the #ariance.

    t is measured in the same units as the

    data, making it more easily comparable,than the #ariance, to the mean.

    f the data set is a sample, the standard

    de#iation is denoted s. f the data set is a population, the standard

    de#iation is denoted =sigma>.

  • 7/25/2019 mba15 -1.ppt

    26/44

    :oefficient of Aariation

    /he coefficient of #ariation indicates ho" large the

    standard de#iation is in relation to the mean. f the data set is a sample, the coefficient of #ariation

    is computed as follo"s9

    f the data set is a population, the coefficient of

    #ariation is computed as follo"s9

    s

    x ( )100

    s

    x ( )100

    ( )100

    ( )100

    s

    x ( )100

    s

    x ( )100

  • 7/25/2019 mba15 -1.ppt

    27/44

    Eample

    Aariance

    ; 062.45 minutes suare

    Standard De#iation

    ; 1+.242 minutes

    :oefficient of Aariation ;

    ; 1+.240(31.134+ =+> ; 66.52-

  • 7/25/2019 mba15 -1.ppt

    28/44

    S"ewness

    $ Ske"ness characterizes the degree of

    asymmetry of a distribution around its

    mean 'ositi#ely ske"ed

    Symmetric or unske"ed

    *egati#ely ske"ed

    Ske"ness

  • 7/25/2019 mba15 -1.ppt

    29/44

    !egatively ske"ed

    Ske"ness

  • 7/25/2019 mba15 -1.ppt

    30/44

    Ske"ness

    Symmetric

  • 7/25/2019 mba15 -1.ppt

    31/44

    Ske"ness

    Positively Ske"ed

  • 7/25/2019 mba15 -1.ppt

    32/44

    Ske"ness & measure

    3

    3

    1

    )(

    N

    X=

    Ske"ness of a distribution is measured by

    8or a gi#en data set you may use

  • 7/25/2019 mba15 -1.ppt

    33/44

    urtosis

    urtosis characterizes the relati#e

    peakedness or flatness of a symmetric

    distribution compared to the normal

    distribution

    'latykurtic=relati#ely flat>

    esokurtic=normal>

    @eptokurtic=relati#ely peaked>

  • 7/25/2019 mba15 -1.ppt

    34/44

    urtosis

    Platykurtic- flat distribution

  • 7/25/2019 mba15 -1.ppt

    35/44

    urtosis

    Mesokurtic - not too flat and not too peaked

  • 7/25/2019 mba15 -1.ppt

    36/44

    urtosis

    #eptokurtic- peaked distribution

  • 7/25/2019 mba15 -1.ppt

    37/44

    urtosis & measure

    urtosis for a distribution is measured by

    4

    4

    2

    )(

    N

    X=

    31

    =

    "here

    8or a gi#en data set you may use

  • 7/25/2019 mba15 -1.ppt

    38/44

    %ssociation bet"een t"o #ariables

    #elay $assen%ers #elay $assen%ers #elay $assen%ers

    23 62 26 2+ 2 64

    0 6+ 01 2 71

    06 23 12 27 34 70

    62 +3 27 22 64

    11 02 0 20 02 73

    2 24 4 20 +2 63

    00 64 17 62 04 64

    +1 62 67 27 22

    +1 26 04 61 + 02

    12 2 0 2 2 7+

    +3 7 02 6+ 26 60

    2 73 25 16 6

    02 63 30 63 07 6+

    13 26 52 05 1 04

  • 7/25/2019 mba15 -1.ppt

    39/44

    %ssociation bet"een t"o #ariables

    Scatter plot

    :o#ariance

    :orrelation :oefficient

  • 7/25/2019 mba15 -1.ppt

    40/44

    Scatter 'lot

    Scatter $lotsare used to identify any

    underlying relationships among pairs of

    data sets.

    /he plot consists of a scatter of points,

    each point representing an obser#ation.

  • 7/25/2019 mba15 -1.ppt

    41/44

    Scatter 'lot

  • 7/25/2019 mba15 -1.ppt

    42/44

    :o#ariance

    /he co#ariance is a measure of the linear

    association bet"een t"o #ariables.

    'ositi#e #alues indicate a positi#e

    relationship.

    *egati#e #alues indicate a negati#e

    relationship

    : i

  • 7/25/2019 mba15 -1.ppt

    43/44

    f the data sets are samples, the co#ariance

    is denoted by

    f the data sets are populations, theco#ariance is denoted by

    :o#ariance

    s x x y y

    nxy

    i i=

    ( )( )

    1s

    x x y y

    nxy

    i i=

    ( )( )

    1

    xy i x i yx yN

    =

    ( )( )

    xy i x i yx y

    N=

    ( )( )

    ; 1.01 in the

    %irline

    eample

    : l ti : ffi i t

  • 7/25/2019 mba15 -1.ppt

    44/44

    :orrelation :oefficient

    /he coefficient can take on #alues bet"een &+ and +.

    Aalues near &+ indicate a strong negati#e linear relationship. Aalues near + indicate a strong positi#e linear relationship.

    f the data sets are samples, the coefficient is

    f the data sets are populations, the coefficient is

    xyxy

    x y=

    xyxy

    x y=

    rs

    s sxy xy

    x y=r

    s

    s sxy xy

    x y= ; .+1+ in %irlineeample