IMPORTANCIA DEL TAMAÑO DEL EFECTO - ResearchGate · IMPORTANCIA DEL TAMAÑO DEL EFECTO. UNA...

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/265408941

The importance of effect size: A statistical example using physical condition

measurements

Article in Revista Internacional de Medicina y Ciencias de la Actividad Fisica y del Deporte · December 2012

CITATIONS

2READS

115

3 authors:

Some of the authors of this publication are also working on these related projects:

Optimization of endurance performance View project

Optimization of resistance training methods View project

Carlos Mª Tejero-González

Universidad Autónoma de Madrid

57 PUBLICATIONS 398 CITATIONS

SEE PROFILE

Carlos Balsalobre-Fernández

Universidad Autónoma de Madrid


SEE PROFILE

María Castro Morera

Complutense University of Madrid


SEE PROFILE

All content following this page was uploaded by Carlos Mª Tejero-González on 12 January 2015.

The user has requested enhancement of the downloaded file.

https://www.researchgate.net/publication/265408941_The_importance_of_effect_size_A_statistical_example_using_physical_condition_measurements?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_2&_esc=publicationCoverPdf

https://www.researchgate.net/publication/265408941_The_importance_of_effect_size_A_statistical_example_using_physical_condition_measurements?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_3&_esc=publicationCoverPdf

https://www.researchgate.net/project/Optimization-of-endurance-performance?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_9&_esc=publicationCoverPdf

https://www.researchgate.net/project/Optimization-of-resistance-training-methods?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_9&_esc=publicationCoverPdf

https://www.researchgate.net/?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_1&_esc=publicationCoverPdf

https://www.researchgate.net/profile/Carlos_Tejero-Gonzalez?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_4&_esc=publicationCoverPdf


https://www.researchgate.net/institution/Universidad_Autonoma_de_Madrid?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_6&_esc=publicationCoverPdf


https://www.researchgate.net/profile/Carlos_Balsalobre-Fernandez?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_4&_esc=publicationCoverPdf


https://www.researchgate.net/institution/Universidad_Autonoma_de_Madrid?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_6&_esc=publicationCoverPdf


https://www.researchgate.net/profile/Maria_Morera3?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_4&_esc=publicationCoverPdf


https://www.researchgate.net/institution/Complutense_University_of_Madrid?enrichId=rgreq-6bfaeab5914773b3dd964ffa15f28222-XXX&enrichSource=Y292ZXJQYWdlOzI2NTQwODk0MTtBUzoxODQ2OTc5MTcyODg0NDhAMTQyMTA0NjgyNDA4NA%3D%3D&el=1_x_6&_esc=publicationCoverPdf



Rev.int.med.cienc.act.fís.deporte- vol.12 - número 48 - ISSN: 1577-0354

715

Tejero-González, C.M.; Castro-Morera, M. y Balsalobre-Fernández, C. (2012). Importancia del tamaño del efecto. Una ejemplificación estadística con medidas de condición física / The importance of effect size: a statistical example using physical condition measurements. Revista Internacional de Medicina y Ciencias de la Actividad Física y el Deporte vol. 12 (48) pp. 715-727 Http://cdeporte.rediris.es/revista/revista48/artimportancia318.htm

ORIGINAL

THE IMPORTANCE OF EFFECT SIZE: A STATISTICAL

EXAMPLE USING PHYSICAL CONDITION MEASUREMENTS

IMPORTANCIA DEL TAMAÑO DEL EFECTO. UNA

EJEMPLIFICACIÓN ESTADÍSTICA CON MEDIDAS DE CONDICIÓN FÍSICA

Tejero-González, C.M.1; Castro-Morera, M.2 & Balsalobre-Fernández, C.3

1 Lecturer. Departament Educación Física, Deporte y Motricidad Humana, Universidad Autónoma de Madrid (Spain). Personal Web: www.uam.es/carlos.tejero; E-mail: [email protected] 2 Lecturer. Department Métodos de Investigación y Diagnóstico en Educación, Universidad Complutense de Madrid (Spain). E-mail: [email protected] 3 Research Student, Department of Educación Física, Deporte y Motricidad Humana, Universidad Autónoma de Madrid (Spain). E-mail: [email protected] Spanish-English translators: Trágora Traducciones, e-mail: [email protected], Trágora SCA. Código UNESCO / UNESCO Code: 5802.06. Análisis, Realización de Modelos y Planificación Estadística / Analysis, Modelling and Statistical Planning. Clasificación del Consejo de Europa / European Council classification: 17 Otras: Estadística aplicada a las ciencias de la actividad física / Others: Statistics applied to physical activity sciences. Recibido 8 de noviembre de 2010 Received November 8, 2010 Aceptado 20 de febrero de 2012 Accepted February 20, 2012

http://cdeporte.rediris.es/revista/revista48/artimportancia318.htm�

http://www.uam.es/carlos.tejero�

mailto:[email protected]�





716

ABSTRACT It is common in the field of physical activity and sports science, as well as in other scientific disciplines, to use designs that require groups to be compared in order to determine the existence of statistically significant differences. However, information regarding the magnitude of any differences found is not always provided. This work highlights the importance of combining statistical significance with values that provide information regarding the effect size. With this in mind, and in order to provide a more didactic discussion, herein we compare the physical abilities of adolescent boys and girls and estimate the delta parameter statistically using the corrected Hedges' g parameter. KEYWORDS: effect size, Hedges' g, delta parameter, statistical significance, α and β, group comparison, physical condition. RESUMEN

Tanto en las ciencias de la actividad física y del deporte como en otras áreas de conocimiento científico es habitual investigar con diseños que requieren comparar grupos, concluyendo sobre la existencia o no de diferencias estadísticamente significativas. Sin embargo, no siempre se informa sobre la magnitud de las diferencias encontradas. Este trabajo subraya la importancia de acompañar la significación estadística con valores que informen sobre el tamaño del efecto. Con este propósito y en aras de una argumentación didáctica, se contrasta la capacidad física de chicos y chicas adolescentes, ejemplificando estadísticamente la estimación del parámetro delta con la g de Hedges ajustada. PALABRAS CLAVE: tamaño del efecto, g de Hedges, parámetro delta, significación estadística, α y β, comparación grupos, condición física. INTRODUCTION It is common in the field of physical activity and sports science, as well as in other scientific disciplines, to come across reports that compare the results for various groups in order to determine the existence of statistically significant differences between them as regards a specific characteristic or variable. Thus, statistically significant differences indicate the probability (p) that the results observed for the response or dependent variable occur as a result of the action or influence of the independent variable rather than by chance. In this sense, statistical significance is the likelihood that the difference between the two groups may be due to a sampling accident. In other words, it is a measure of the probability that the difference observed is the same size as that which would have been obtained by chance, even when there is no difference between the two groups. However, there are some well-known problems with


717

the use of significance tests as the p value is the result of two factors, namely the size of the difference and the size of the sample. Thus, significant results can be found when the differences between the groups are large but the sample is small, and vice versa As a result, researchers must take two possible errors, known as type I and type II errors, into account when either designing a study or interpreting its results. A type I error, also known as a false positive, occurs when a null hypothesis that is actually true is rejected, in other words when the researcher infers that there is a statistically significant difference when there actually isn't one. Similarly, a type II error, also known as a false negative, occurs when a null hypothesis is accepted when it is actually false, in other words the researcher infers the absence of a statistically significant difference when one actually exists. Two indicators, namely α (alpha) and β (beta), are used to control type I and type II errors, respectively. Perhaps the best known of these is the alpha level, or statistical significance, which indicates the degree of type I risk assumed by the researcher. The scientific community has established two standards for alpha: α=0.05, to perform estimations with a maximum type I error margin of 5% (95% confidence level), and α=0.01, for an error margin of 1% (99% confidence level). However, the beta level, which indicates the risk assumed by the researcher that a type II error, or false negative, may occur, is also relevant. As for alpha, the scientific community has also established two standards for beta: β=0.10, when a maximum type II error margin of 10% needs to be ensured (contrast confidence or power of 90%), and β=0.20, to ensure a margin of error of 20% (contrast confidence or power of 80%). One of the possible ways to optimise confidence levels α and β is to increase the sample size (Cañadas, Borges, Sánchez and San Luis, 2000). However, neither of these indicators provides information regarding the magnitude or importance of any differences found. As such, the researcher must use another type of indicator known as the effect size or magnitude of the difference (Fan, 2001; Frías, Pascual and García, 2000; Monterde, Pascual and Frías, 2000; Thompson, 2006; Thomas and Nelson, 2007; Valera and Sánchez, 1997). Thus, the effect size indicates the calculated efficacy between the various levels of the independent variable, thereby complementing the information provided by the null hypothesis occurrence probability, as it provides information regarding the magnitude of any differences found as well as confirming their existence. The work of Dowson (2000), who studied the effect of the study moment variable at two levels (morning and afternoon) on the learning variable is good example of use of the effect magnitude. Thus, this study was performed with an incidental sample of 38 individuals, who were randomly distributed into morning and afternoon groups and presented with the same learning stimulus. Their


718

understanding of the text was then assessed on the basis of the number of correct answers provided (maximum of 20). The mean scores were 15.2 for the morning group and 17.9 for the afternoon group. These findings raise two substantive questions, namely are the differences between the two groups sufficiently large and can it be concluded that more is learned in the afternoon than in the morning. As well as the statistical significance, one means of solving this type of question is to use the effect size. Thus, if there is no overlap between the distributions of the two groups, the difference would be important, whereas if the overlap were larger, the difference between the groups would be less important. In order to represent this reasoning graphically, Figure 1 shows two situations in which the importance of the difference varies with the overlap of the distributions, with the differences on the left being large and significant but those on the right being less relevant.

FIGURE 1: Illustration of the effect magnitude: distribution overlaps.

Various procedures can be used to estimate the effect size, including, but not limited to, the coefficient of determination, eta squared, omega squared, Phi, etc. (Rosnow, Rosenthal and Rubin, 2000; Sink and Stroh, 2006; Trusty, Thompson and Petrocelli, 2004; Vacha and Thompson, 2004). However, in order to demonstrate exactly what the effect size is, and to help the reader understand its utility and relevance, herein we will concentrate on the standardised mean difference or delta parameter (hereafter δ), which is obtained using the corrected Hedges g parameter (hereafter gcorr), following the recommendations of Ledesma, Macbeth and Cortada de Kohan (2008), rather than reviewing all the methods available for estimating effect size. This decision was taken on the basis that three factors favour the use of gcorr: (1) it provides and accurate and unbiased estimate, (2) it is simple to calculate, and (3) interpretation of its results is easy. In order to obtain gcorr, g must first be calculated and then corrected. The g parameter is obtained from:


719

2)1()1(

21

222

211

21

−+⋅−+⋅−

−=

nnSnSn

XXg

where 1X is the arithmetic mean of group 1, 2X the arithmetic mean of group 2, n1 the sample size for group 1, 2

1S the variance in the scores for group 1, n2 the sample size for group 2 and 2

2S the variance in the scores for group 2.

g is then corrected as follows:

−

−=14

31gl

ggajust

where:

221 −+= nngl In short, gcorr estimates the difference between the means for the two groups and standardises it by dividing by the unified standard deviation for them, thus meaning that this procedure provides a standard parameter (z score) which is finally corrected to eliminate the sample size bias. Thus, this parameter expresses a standard value that is of great utility as it allows the percentage of cases in which one group is below the average of the other group to be inferred from the normal curve table. However, it should be noted that the normality and homoscedasticity assumptions must be fulfilled, especially for small sample sizes (less than 30 observations per group, for example; Pardo and San Martín, 2004). In light of the above, this paper is intended to highlight the importance of accompanying the statistical significance with values that indicate the effect or magnitude of the differences. To this end, we have undertaken a group-comparison study specifically designed for this purpose that has not been previously published elsewhere. METHOD Participants The sample consisted of 271 participants, 52% of whom were males (group 1; n1= 142) aged between 12 and 18 years (M=14.44, SD=1.52), with the remaining 48% being females (group 2; n2= 129) in the same age range (M=14.46; SD=1.52). Participants were selected by non-random incidental sampling for ease of access. Participation was voluntary, approved and unpaid.


720

Design, objective and variables A retrospective ex post facto design. The objective of the study was to determine whether adolescent males and females differed in terms of physical condition by measuring three dependent variables, namely strength, speed and flexibility. Hypothesis

First. Men are stronger than women.

1st-H0: X strength men ≤ X strength women

1st-H1: X strength men > X strength women

Second. Men are faster than women.

2nd-H0: X speed men ≤ X speed women

2nd-H1: X speed men > X speed women

Third. Men are less flexible than women.

3rd-H0: X flexibility men ≥ X flexibility women

3rd-H1: X flexibility men ˂ X flexibility women

Procedure Data were collected by samplers with a degree in physical activity and sports science at a secondary school in the Autonomous Community of Madrid (Spain) during several physical education classes. The School Council, which consists of representatives elected by students, parents/guardians and teachers, authorised the performance of this study. The strength variable was assessed by means of a medicine ball throw test (Legido, Segovia and Ballesteros, 1995), using a 2 kg ball and defining the strength in terms of distance thrown (in metres). To measure the speed variable, the participants were asked to run a distance of 50 m (Rosandich, 1999), with the time in seconds used to express the speed. Likewise, the flexibility variable was assessed by bending the upper body forwards from a sitting position (Sit and Reach test; Eurofit, 1993), using a flexibility bench equipped with a scale graduated in centimetres, with 0 being located at the soles of the feet. Once all data had been collected and analysed, all participants received an individual report regarding their physical condition.


721

Data analysis Data were analysed using inferential statistics for comparing groups, with initial confidence levels α and β of 0.05 and 0.10, respectively. All analyses were performed using the program IBM SPSS Statistics 18. RESULTS Normality and homoscedasticity assumptions All variables must fulfil the normality assumption when comparing groups using Student's t-test and estimating the effect size, especially when comparing small groups, which is not the case here. The results of the Kolmogorov-Smirnov test with no Lilliefors significance correction showed that the normality assumption for the strength, speed and flexibility distributions in women (p=0.20, p=0.93 and p=0.40, respectively), and those for the strength and flexibility in men (p=0.74 and p=0.38, respectively), should not be rejected. However, the normality hypothesis for the speed variable in men (p=0.03) had to be rejected. In light of the size of the group affected, and as the data for the other variables were compatible with the normal distribution hypothesis, this did not affect the aims of this study. As far as the homoscedasticity or equality of variances assumption was concerned, the Levene test confirmed that men and women had similar dispersities for the variables speed (F=0.199; p=0.65) and flexibility (F=0.060; p=0.80), whereas this was not the case for strength (F=53.4; p˂ 0.001). However, this finding does not affect our estimation of the effect size as it only occurs for one of the three variables and both groups are large and unbalanced (142 males and 129 females). Means comparison The Student t-test for two independent samples performed to compare the dependent variables for the two groups gave alpha values that allowed the three null hypotheses proposed to be rejected. In other words, it can be inferred, with a confidence level of 99%, that it is false that men are equally or less strong than women (t=11.87; gl=213.13; p˂ 0.001), that men are equally as fast, or slower, than women (t=-10.57; gl=269; p˂ 0.001), or that men or equally or more flexible than women (t=-5.61; gl=269; p˂ 0.001). This empirical evidence allows us to provisionally conclude that men are stronger and faster than women, whereas women are more flexible than men (Table 1).


722

TABLE 1. t-Test for comparing independent means Gender M SD p

Strength: metres a 2 kg ball is thrown

Men 7.33 1.85 ˂ 0.001 Women 5.23 0.94

Speed: seconds in 50 m Men 8.29 1.06 ˂ 0.001 Women 9.60 0.96

Flexibility: centimetres bending upper body when seated

Men 1.85 7.8 ˂ 0.001 Women 7.26 7.9

Effect size Estimation of the effect size using gcorr by solving the equation described in the introduction, gives the following delta magnitudes: δstrength = 1.39, δspeed = 1.27 and δflexibility = 0.68. These statistics correspond to the graphs contained in Figure 2.


723

05

1015202530354045

3 4 5 6 7 8 9 10 11 12 13

Medicine ball throw test (metres)

δ strength = 1,39

05

1015202530354045

5 6 7 8 9 10 11 12 1350 metres (seconds)

δ speed = 1,27

0

10

20

30

40

50

60

-30 -20 -10 0 10 20 30

Centimetres (Sit and Reach)

δ flexibility = 0,68

______Women _ _ _ _ Men

FIGURE 2: Effect sizes and graphs


724

A subsequent analysis of the cumulative probability of each difference found using a standardised normal distribution table (see, for example, Vincent, 2005) led to three new results: 91% of women have an equal or lower strength than the average for men (z=1.39), 89% of women have an equal or lower speed than the average for men (z=1.27), and 75% of men have an equal or lower flexibility than the average for women (z=0.68; Table 2).

TABLE 2. Standardised normal distribution z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

… … … … … … … … … … …

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

DISCUSSION AND CONCLUSIONS This work, which is based on two statistically comparable groups in terms of size and age distribution, has shown that men and women have different physical abilities (strength, speed and flexibility) during adolescence. To reach this conclusion, the two groups have been compared using Student's t-test for independent samples for the three dependent variables, with the same statistical significance, namely p<0.01, being obtained in each case. In other words, the probability that a false positive has occurred is less than 1% for all three variables. Up to this point, however, it can only be concluded that men and women differ in terms of these variables and that the confidence level in this statement is greater than 99%. It is also known that men are significantly stronger and faster than women and that women are statistically more flexible than men, but we don't know the size of these differences. Thus, we do not know the importance or magnitude of the differences found. To resolve this question, the


725

standardised mean difference or delta parameter has been estimated using gcorr and the following values found: δstrength=1.39, δspeed=1.27, δflexibility=0.68. These values allow us to infer, using the normal distribution table, that only 9% of women are stronger than the average for men, only 11% of women are faster than the average for men and only 25% of men are more flexible than the average for women. As a result, the δ values obtained, irrespective of the scales used to measure them (the variables analysed were measured in metres, seconds and centimetres) indicate that the largest difference between groups occurred for the strength variable, followed by speed and, finally, flexibility. This information may be relevant for both theoretical and practical purposes and is not provided by the statistical significance. Therefore, as can be seen from the results described herein, and in agreement with other authors (Fernández-Cano and Fernández-Guerrero, 2009; Lustig and Trauser, 2004; Rhea, 2004; Smith and Honoré, 2008; Thompson, 1999), the effect size has many advantages. The most important of these are listed below:

1. it is relatively easy to calculate;

2. as an expression of standard deviation or explained variance, effect size provides an intuitive interpretation of the results and the magnitude of the differences, and as such can be considered to be indicative of practical significance whilst not being incompatible with statistical significance;

3. as a result of its dimensionless nature, it allows differences between

variables with different units to be compared;

4. it is a statistic that allows more knowledge to be accumulated by allowing the meta-analytical comparison of results from different studies with the same objective; and

5. it maintains a greater independence as regards the effect of sample size

than statistical significance. In short, the aim of this work was to highlight, using statistical arguments and examples, the advisability of accompanying statistical significance probabilities with other values that provide information regarding the effect size or magnitude of the differences. If this is not done, we strongly believe that the resulting report will face limitations as regards the presentation of its conclusions.


726

LITERATURE REFERENCES Cañadas, I., Borges, A., Sánchez, A. y San Luis, C. (2000). Estudio de la

potencia de los contrastes de medias con dos y tres grupos con tamaño del efecto pequeño y en condiciones de normalidad y homo-heterocedasticidad, Psicothema, 12(2), 114-116.

Dowson V. (2000). Time of day effects in school-children's immediate and delayed recall of meaningful material. TERSE Report (en http://www.cem.dur.ac.uk/ebeuk/research/terse/library.htm).

Eurofit (1993). Eurofit Tests of Physical Fitness, 2nd Edition. Strasbourg. Fan, X. (2001). Statistical significance and effect size in education research: two

sides of a coin. The Journal of Educational Research, 94(5), 275-282. Fernández-Cano, A. y Fernández-Guerrero, I. (2009). Crítica y alternativas a la

significación estadística en el contraste de hipótesis, Madrid: La Muralla. Frías, M. D., Pascual, J. y García, J. F. (2000). Tamaño del efecto del

tratamiento y significación estadística. Psicothema, 12(2), 236-240. Ledesma, R., Macbeth, G. y Cortada de Kohan, N. (2008). Tamaño del efecto:

revisión teórica y aplicaciones con el sistema estadístico ViSta, Revista Latinoamericana de Psicología, 40(3), 425-439.

Legido, J. C., Segovia, J. C. y Ballesteros, J. M. (1995). Valoración de la condición física por medio de test. Madrid: Ediciones pedagógicas.

Lustig, D. y Trauser, D. (2004). Effect size and rehabilitation research. Journal of Rehabilitation, 70(4), 3.

Monterde, H., Pascual, J. y Frías, M. D. (2000). Errores de interpretación de los métodos estadísticos: importancia y recomendaciones, Psicothema, 18(4), 848-856.

Pardo, A. y San Martín, R. (1994). Análisis de datos en psicología II, Madrid: Pirámide.

Rhea, M. (2004). Determining the magnitude of treatment effects in strength training research through the use of the effect size. Journal of Strength & Conditioning Research, 18(4), 918-920.

Rosandich, T. P. (1999). International Physical Fitness Test. The Sport Journal, 2(1).

Rosnow, R. L., Rosenthal, R. y Rubin, D. B. (2000). Contrasts and correlation in effect-size estimation. Psychology Science, 11(6), 446-453.

Sink, A. y Stroh, H. R. (2006). Practical significance: the use of effect sizes in school counseling research. Professional School Counseling Journal, 8(1), 115-120.

Smith, M. L. y Honoré, H. H. (2008). Effect size reporting in current health education literature. American Journal of Health Studies, 23(3), 130-135.

Thomas, J. R. y Nelson, J. K. (2007). Métodos de investigación en actividad física. Badalona: Editorial Paidotribo.

Thompson, B. (1999). Why “encouraging” effect size reporting is not working: The etiology of researcher resistance to changing practices. The Journal of Psychology, 133(2), 133-140.

Thompson, B. (2006). Research synthesis: effect sizes. En J. Green, G. Gamilli y P. B. Elmore (Eds.). Handbook of complementary methods in education

http://www.cem.dur.ac.uk/ebeuk/research/terse/library.htm�


727

research (pp. 583-603). Washington, DC: American Educational Research Association.

Trusty, J., Thompson, B. y Petrocelli, J. V. (2004). Practical guide for reporting effect size in quantitative research in the Journal of Counselling and Development. Journal of Counselling & Development, 82, 107-110.

Vacha, T. y Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51(4), 473-481.

Valera, A. y Sánchez, J. (1997). Pruebas de significación y magnitud del efecto: Reflexiones y propuestas, Anales de Psicología, 13(1), 85-90.

Vincet, W. J. (2005). Statistics in Kinesilogy (3rd ed.). Human Kinetics. Número de citas totales / Total references: 23 (100%) Número de citas propias de la revista / Journal's own references: 0 (0 %)



715

Tejero-González, C.M.; Castro-Morera, M. y Balsalobre-Fernández, C. (2012). Importancia del tamaño del efecto. Una ejemplificación estadística con medidas de condición física / The importance of effect size: a statistical example using physical condition measurements. Revista Internacional de Medicina y Ciencias de la Actividad Física y el Deporte vol. 12 (48) pp. 715-727 Http://cdeporte.rediris.es/revista/revista48/artimportancia318.htm

ORIGINAL

IMPORTANCIA DEL TAMAÑO DEL EFECTO. UNA

EJEMPLIFICACIÓN ESTADÍSTICA CON MEDIDAS DE CONDICIÓN FÍSICA

THE IMPORTANCE OF EFFECT SIZE: A STATISTICAL

EXAMPLE USING PHYSICAL CONDITION MEASUREMENTS

Tejero-González, C.M.1; Castro-Morera, M.2 y Balsalobre-Fernández, C.3

1 Profesor Contratado Doctor. Departamento de Educación Física, Deporte y Motricidad Humana de la Universidad Autónoma de Madrid (Spain). Web personal: www.uam.es/carlos.tejero; Correo electrónico: [email protected] 2 Profesora Titular. Departamento de Métodos de Investigación y Diagnóstico en Educación de la Universidad Complutense de Madrid (Spain). Correo electrónico: [email protected] 3 Personal de Investigación en Formación del Departamento de Educación Física, Deporte y Motricidad Humana de la Universidad Autónoma de Madrid (Spain). Correo electrónico: [email protected] Código UNESCO / UNESCO Code: 5802.06. Análisis, Realización de Modelos y Planificación Estadística / Analysis, Modelling and Statistical Planning. Clasificación del Consejo de Europa / European Council classification: 17 Otras: Estadística aplicada a las ciencias de la actividad física / Others: Statistics applied to physical activity sciences. Recibido 8 de noviembre de 2010 Received November 8, 2010 Aceptado 20 de febrero de 2012 Accepted February 20, 2012 RESUMEN

Tanto en las ciencias de la actividad física y del deporte como en otras áreas de conocimiento científico es habitual investigar con diseños que requieren comparar grupos, concluyendo sobre la existencia o no de diferencias estadísticamente significativas. Sin embargo, no siempre se informa sobre la magnitud de las diferencias encontradas. Este trabajo subraya la importancia de acompañar la significación estadística con valores que informen sobre el tamaño del efecto. Con este propósito y en aras de una argumentación didáctica, se contrasta la capacidad física de chicos y chicas adolescentes,

http://cdeporte.rediris.es/revista/revista48/artimportancia318.htm�

http://www.uam.es/carlos.tejero�





716

ejemplificando estadísticamente la estimación del parámetro delta con la g de Hedges ajustada. PALABRAS CLAVE: tamaño del efecto, g de Hedges, parámetro delta, significación estadística, α y β, comparación grupos, condición física. ABSTRACT It is common in the field of physical activity and sports science, as well as in other scientific disciplines, to use designs that require groups to be compared in order to determine the existence of statistically significant differences. However, information regarding the magnitude of any differences found is not always provided. This work highlights the importance of combining statistical significance with values that provide information regarding the effect size. With this in mind, and in order to provide a more didactic discussion, herein we compare the physical abilities of adolescent boys and girls and estimate the delta parameter statistically using the corrected Hedges' g parameter. KEYWORDS: effect size, Hedges' g, delta parameter, statistical significance, α and β, group comparison, physical condition. INTRODUCCIÓN

En ciencias de la actividad física y del deporte, al igual que ocurre en otras áreas de conocimiento científico, es habitual encontrar informes que comparan resultados de grupos con el objeto de inferir si existen o no diferencias significativas entre ellos en una determinada característica o variable. Así, las diferencias estadísticamente significativas informan sobre la probabilidad (p) de que los resultados observados en la variable de respuesta o dependiente no se hayan producido por azar, sino por la acción o influencia de la variable independiente.

En este sentido, la significación estadística es la verosimilitud de que la

diferencia entre los dos grupos pueda ser un accidente del muestreo. Dicho con otras palabras, mide la probabilidad de que la diferencia observada sea del mismo tamaño que la que se hubiera obtenido por azar, incluso en el caso de que no hubiera diferencias entre los dos grupos. Sin embargo, existen problemas con el uso de pruebas de significación, ya que el valor p es resultado de dos cuestiones: del tamaño de las diferencias y del tamaño de la muestra. Se podrían obtener resultados significativos tanto si las diferencias entre los grupos son muy grandes, aunque la muestra fuera pequeña, como si la muestra fuera muy grande, aunque el tamaño de la diferencia fuera pequeño.

Al respecto, bien a la hora de diseñar el estudio o en la interpretación de

resultados, los investigadores deben vigilar dos posibles errores: el tipo I y el tipo II. El error tipo I, también denominado falso positivo, se produce cuando se


717

rechaza una hipótesis nula que en realidad es verdadera; es decir, cuando el investigador infiere que hay diferencia estadísticamente significativa cuando en realidad no la hay. Por su parte, el error tipo II, también llamado falso negativo, se produce cuando se acepta una hipótesis nula que en realidad es errónea, de tal forma que el investigador interpreta que no hay diferencia estadísticamente significativa cuando en realidad sí existe. Para controlar los errores tipo I y tipo II existen dos indicadores: α (alfa) y β (beta), respectivamente.

El más conocido es el nivel alfa o significación estadística, que indica el

nivel de riesgo tipo I que asume el investigador. La comunidad científica ha establecido dos estándares de alfa: α=0,05, para hacer estimaciones con un margen máximo de error tipo I del 5% (nivel de confianza del 95%), o α=0,01, para un margen de error del 1% (nivel de confianza del 99%).

También es relevante el nivel beta, que hace referencia al riesgo que

asume el investigador de que se produzca un error tipo II o falso negativo. En relación con beta, la comunidad científica establece dos estándares: β=0,10, cuando se quiere garantizar un margen máximo de error tipo II del 10% (confianza o potencia de contraste del 90%), y β=0,20, para asegurar un margen de error del 20% (confianza o potencia de contraste del 80%).

Pues bien, una de las posibles estrategias para optimizar los niveles de

confianza α y β es aumentar el tamaño de la muestra (Cañadas, Borges, Sánchez y San Luis, 2000). Sin embargo, ni α ni β informan adecuadamente de la magnitud o importancia de las diferencias, en el caso de que las hubiera. Y es precisamente al hilo de esta cuestión donde el investigador debe contar con otro tipo de indicador: el tamaño del efecto o magnitud de la diferencia (Fan, 2001; Frías, Pascual y García, 2000; Monterde, Pascual y Frías, 2000; Thompson, 2006; Thomas y Nelson, 2007; Valera y Sánchez, 1997).

Al respecto, el tamaño del efecto indica la eficacia cuantificada entre los

distintos niveles de la variable independiente, complementado así la información ofrecida por la probabilidad de ocurrencia de la hipótesis nula, pues además de confirmar la existencia de diferencias también informa de la magnitud de éstas.

Para ilustrar la aportación conceptual de la magnitud del efecto

citaremos la investigación de Dowson (2000), quien estudió el efecto de la variable momento de estudio con dos niveles: mañana y tarde, sobre la variable aprendizaje. La investigación se realizó con una muestra incidental de 38 sujetos que fueron distribuidos aleatoriamente en los grupos diurno y vespertino, presentándoles el mismo estímulo de aprendizaje. Para tal fin, se midió la comprensión del texto a través del número de respuestas correctas (nivel máximo de la escala 20). La puntuaciones medias fueron de 15,2 para el grupo de mañana y de 17,9 para el grupo de tarde. Llegados aquí, el debate aflora a propósito de dos cuestiones sustantivas: ¿son suficientemente grandes las diferencias entre ambos grupos?, ¿se puede concluir que se aprende más


718

por la tarde que por la mañana? Además de la significación estadística, una forma de solucionar este tipo de problemas es utilizar el tamaño del efecto. Si no hubiera superposición entre las distribuciones de los dos grupos, estaríamos ante una importante diferencia. En el otro extremo, si la superposición fuera mayor, la diferencia entre los grupos sería menos importante.

Con el objeto de representar gráficamente lo anteriormente expuesto, en

Figura 1 se muestran dos situaciones en las que la importancia de la diferencia varía en función de la superposición de las distribuciones. A la izquierda las diferencias son muy grandes y significativas, a la derecha la diferencia existente es menos relevante.

Existen diversos procedimientos de estimación del tamaño del efecto; por ejemplo, sin el ánimo de ser exhaustivos: el coeficiente de determinación, eta cuadrado, omega cuadrado, Phi, etc. (Rosnow, Rosenthal y Rubin, 2000; Sink y Stroh, 2006; Trusty, Thompson y Petrocelli, 2004; Vacha y Thompson, 2004). Si bien, este trabajo, con el fin de ejemplificar qué es el tamaño del efecto y que el lector comprenda su utilidad y relevancia, renuncia a revisar las distintas estimaciones del tamaño del efecto y se centrará en la diferencia estandarizada de medias o parámetro delta (en adelante δ) obtenida mediante la g de Hedges ajustada (en adelante, gajust), siguiendo para ello las directrices de Ledesma, Macbeth y Cortada de Kohan (2008). Decisión que los autores de este trabajo adoptan en virtud de lo que a su juicio son tres criterios favorables a la gajust: (1) estimación precisa e insesgada, (2) simplicidad de cálculo y (3) fácil interpretación del resultado.

Para obtener la gajust primero ha de procederse al cálculo de g y después

a su ajuste. El parámetro g se obtiene mediante:

2)1()1(

21

222

211

21

−+⋅−+⋅−

−=

nnSnSn

XXg

Siendo, 1X la media aritmética del grupo 1, 2X la media aritmética del

grupo 2, n1 el tamaño muestral del grupo 1, 21S la varianza de las puntuaciones

FIGURA 1. Ilustración de la magnitud del efecto: superposición de distribuciones.


719

del grupo 1, n2 el tamaño muestral del grupo 2 y 22S la varianza de las

puntuaciones del grupo 2. Posteriormente, se ajusta g de la siguiente manera:

−

−=14

31gl

ggajust

Donde,

221 −+= nngl

En definitiva, la gajust estima la diferencia entre las medias de los grupos y estandariza dicha diferencia dividiéndola entre la desviación típica unificada de los dos grupos, con lo que el procedimiento aporta un parámetro tipificado (puntuación z), al que finalmente se le elimina el sesgo derivado del tamaño muestral. Así, este parámetro expresa un valor tipificado que en última instancia es de gran utilidad ya que permite inferir mediante la tabla de la curva normal el porcentaje de casos que un grupo está por debajo del promedio del otro grupo. Como contrapartida, es necesario el cumplimiento de los supuestos de normalidad y homocedasticidad, especialmente con tamaños muestrales pequeños (por ejemplo, menos de 30 observaciones por grupo) (Pardo y San Martín, 2004).

En virtud de lo expuesto hasta el momento, este artículo tiene como

objetivo evidenciar la importancia de acompañar la significación estadística de valores que informen del efecto o magnitud de las diferencias. Con este fin, se desarrolla un estudio de comparación de grupos que se ha diseñado y llevado a cabo expresamente para la ocasión, no previamente publicado. MÉTODO Participantes

La muestra está formada por 271 participantes de los que el 53% son

hombres (grupo 1; n1= 142), con edades comprendidas entre 12 y 18 años (M=14,44, DT=1,52), y el 48% son mujeres (grupo 2; n2= 129), también con edades comprendidas entre 12 y 18 años (M=14,46; DT= 1,52). Los participantes fueron seleccionados mediante muestreo no aleatorio inccidental, por motivos de facilidad de acceso. La participación fue voluntaria, autorizada y no recompensada. Diseño, objetivo y variables

Diseño ex post facto retrospectivo. El objetivo del estudio es analizar si

hombres y mujeres adolescentes difieren en sus niveles de condición física,


720

para lo que se midieron tres variables dependientes: fuerza, velocidad y flexibilidad.

Hipótesis

Primera. Los hombres tienen más fuerza que las mujeres.

1ª-H0: X fuerza hombres ≤ X fuerza mujeres

1ª-H1: X fuerza hombres > X fuerza mujeres

Segunda. Los hombres son más veloces que las mujeres.

2ª-H0: X velocidad hombres ≤ X velocidad mujeres

2ª-H1: X velocidad hombres > X velocidad mujeres

Tercera. Los hombres son menos flexibles que las mujeres.

3ª-H0: X flexibilidad hombres ≥ X flexibilidad mujeres

3ª-H1: X flexibilidad hombres ˂ X velocidad mujeres

Procedimiento

La toma de datos se llevó a cabo por licenciados en ciencias de la

actividad física y del deporte en un instituto de enseñanza secundaria de la Comunidad Autónoma de Madrid (España), durante diversas clases de educación física. El Consejo Escolar del centro educativo, como órgano colegiado superior con representantes electos de alumnos, padres o tutores y profesorado, autorizó el desarrollo del estudio.

La variable fuerza se midió con el test de Lanzamiento de balón

medicinal (Legido, Segovia y Ballesteros, 1995), utilizando para ello un balón de 2 kg y operativizando la fuerza en metros de lanzamiento. Para la medición de la variable velocidad los participantes corrieron una distancia de 50 m (Rosandich, 1999), utilizando el tiempo en segundos como expresión de velocidad. Asimismo, la variable flexibilidad se valoró mediante flexión de tronco hacia delante desde posición sentada (Prueba de Sit and Reach) (Eurofit, 1993), utilizando un banco de flexibilidad con escala graduada en centímetros donde el valor 0 se situó en la planta de los pies.

Una vez recogidos y analizados los datos, todos los participantes

recibieron un informe individual sobre su nivel de condición física. Análisis de los datos

Se procedió con estadística inferencial de contraste de grupos,

estableciendo desde un primer momento niveles de confianza α y β iguales a


721

0,05 y 0,10, respectivamente. Los análisis se llevaron a cabo con ayuda de la aplicación informática IBM SPSS Statistics 18. RESULTADOS Supuestos de normalidad y homocedasticidad

Tanto para la comparación de grupos mediante prueba t de Student

como para la estimación del tamaño del efecto es necesario el cumplimiento del supuesto de normalidad de las variables, especialmente cuando los grupos a comparar son pequeños, que no es el caso de este estudio.

A tenor de los resultados de la prueba de Kolmogorov-Smirnov sin

corrección de significación Lilliefors, no se puede rechazar el supuesto de normalidad en las distribuciones de fuerza, velocidad y flexibilidad de las mujeres (p=,20, p=,93 y p=,40, respectivamente) ni de la fuerza y la flexibilidad de los hombres (p=,74 y p=,38, respectivamente). Si bien, debe rechazarse la hipótesis de normalidad en el caso de los hombres para la variable velocidad (p=,03); lo que no es determinante para el objetivo de este estudio, teniendo en cuenta el tamaño numeroso del grupo afectado y sabiendo que los datos del resto de variables son compatibles con la hipótesis de normalidad de distribución.

En cuanto al supuesto de homocedasticidad o igualdad de varianzas, la

prueba de Levene confirma que hombres y mujeres tienen la misma dispersión en las variables velocidad (F=,199; p=,65) y flexibilidad (F=,060; p=,80), pero no en la variable fuerza (F=53,4; p˂ ,001). No empero, el incumplimiento de este supuesto no es determinante para estimar el tamaño del efecto ya que sólo se produce violación en una de las tres variables y, además, los grupos tienen tamaños grandes y no desequilibrados (como se apuntó más arriba al hablar de la configuración de la muestra, 142 hombres y 129 mujeres). Comparación de medias

Mediante prueba t de Student para dos muestras independientes

comparamos los dos grupos en las variables dependientes, obteniendo probabilidades alfa que permiten rechazar las tres hipótesis nulas planteadas. Es decir, es plausible inferir con un nivel de confianza del 99% que es falso que los hombres tengan igual o menor fuerza que las mujeres (t=11,87; gl=213,13; p˂ ,001), que los hombres sean igual o menos veloces que las mujeres (t=-10,57; gl=269; p˂ ,001), o que los hombres tengan igual o mayor flexibilidad que las mujeres (t=-5,61; gl=269; p˂ ,001).

Evidencia empírica que permite considerar como provisionalmente

verdadero que los hombres son más fuertes y veloces que las mujeres y éstas más flexibles que aquéllos (Tabla 1).


722

TABLA 1. Prueba T para comparación de medias independientes Sexo M DT p

Fuerza: metros de lanzamiento con balón de 2kg.

Hombres 7,33 1,85 ˂ ,001 Mujeres 5,23 0,94

Velocidad: segundos en 50m. Hombres 8,29 1,06 ˂ ,001 Mujeres 9,60 0,96

Flexibilidad: centímetros en flexión de tronco sentados

Hombres 1,85 7,8 ˂ ,001 Mujeres 7,26 7,9

Tamaño del efecto Procediendo a la estimación del tamaño del efecto mediante gajust,

desarrollando la formulación descrita en la introducción, se obtienen las siguientes magnitudes delta: δ fuerza = 1,39, δ velocidad = 1,27 y δ flexibilidad = 0,68.

Estadísticos que corresponden a las representaciones gráficas ilustradas

en Figura 2.


723


724

Posteriormente, utilizando la Tabla de distribución normal estandarizada, analizando la probabilidad que acumula cada una de las diferencias tipificadas (véase por ejemplo Vincent, 2005) se infieren tres nuevos resultados: el 91% de las mujeres tiene una fuerza igual o inferior al promedio de los hombres (puntuación z=1,39), el 89% de las mujeres tiene igual o menor velocidad que el promedio de los hombres (puntuación z=1,27), y el 75% de los hombres tiene una flexibilidad igual o inferior al promedio de las mujeres(puntuación z=0,68) (Tabla 2).

TABLA 2. Distribución normal estandarizada

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

… … … … … … … … … … …

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

DISCUSIÓN Y CONCLUSIONES

Este trabajo, valiéndose de dos grupos estadísticamente comparables tanto en cuanto presentan tamaños similares y mismas distribuciones de edad, ha evidenciado que durante la adolescencia hombres y mujeres son distintos en sus capacidades físicas de fuerza, velocidad y flexibilidad. Para ello se ha procedido a contrastar los dos grupos con Prueba t de Student para muestras independientes en las tres variables dependientes, observándose la misma significación estadística en todas las ocasiones: p˂ 0,001. Es decir, la probabilidad de que se produzca un error por falso positivo es inferior al 1% en las tres variables.

Hasta el momento, sólo se puede concluir que hombres y mujeres son

distintos en las variables mencionadas y que nivel de confianza al hacer esta afirmación es superior al 99%. También se sabe que los hombres son


725

significativamente más fuertes y veloces que las mujeres, y que éstas son estadísticamente más flexibles, pero no sabemos cuánto más un grupo sobre el otro. Así, no se sabe cuál es la importancia o magnitud de las diferencias encontradas. Para esta última cuestión, se ha estimado la diferencia estandarizada de medias o parámetro delta, obtenida mediante gajust, encontrando los siguientes valores: δfuerza=1,39, δvelocidad=1,27, δflexibilidad=0,68. Datos que a su vez permiten inferir, de acuerdo a la Tabla de distribución normal, que sólo el 9% de las mujeres tiene más fuerza que el promedio de los hombres, que sólo el 11% de las mujeres es más veloz que el promedio de los hombres y que sólo el 25% de los hombres tiene más flexibilidad que el promedio de las mujeres.

En consecuencia, los valores δ obtenidos, independientemente de la

escala de medida, puesto que las variables analizadas se operativizaron en metros, segundos y centímetros, indican que la diferencia de mayor magnitud entre los grupos se sitúa en la variable fuerza, seguida de la velocidad, y finalmente la flexibilidad. Información que pudiera ser relevante a efectos teóricos y prácticos, y de la que no informa la significación estadística.

Por tanto, como se ha podido ver en los resultados expuestos, y en sintonía

con lo defendido por otros autores (Fernández-Cano y Fernández-Guerrero, 2009; Lustig y Trauser, 2004; Rhea, 2004; Smith y Honoré, 2008; Thompson, 1999), son muchas las virtudes del tamaño del efecto. Se destacan las siguientes:

1. el tamaño del efecto es relativamente fácil de calcular;

2. el tamaño del efecto, como expresión de desviación típica o de varianza

explicada, proporciona una interpretación intuitiva de los resultados y de la magnitud de las diferencias, pudiendo considerarse un indicador de significación práctica y no siendo incompatible con la significación estadística;

3. el tamaño del efecto permite comparar diferencias entre variables con distinta métrica por ser adimensional;

4. el tamaño del efecto es un estadístico que facilita la acumulación de conocimiento, pues permite la comparación meta-analítica de los resultados de distintas investigaciones que tienen un mismo objeto de estudio; y

5. el tamaño del efecto mantiene mayor independencia que la significación

estadística, con respecto a la influencia del tamaño de la muestra.

En definitiva, la intención de este trabajo ha sido evidenciar mediante argumentos y ejemplificación estadística la conveniencia de acompañar las probabilidades de significación estadística de otros valores que informen del


726

tamaño del efecto o magnitud de las diferencias. De no ser así, a juicio de los autores de este trabajo, el informe manifestará limitaciones en la presentación de las conclusiones.

REFERENCIAS BIBLIOGRÁFICAS Cañadas, I., Borges, A., Sánchez, A. y San Luis, C. (2000). Estudio de la

potencia de los contrastes de medias con dos y tres grupos con tamaño del efecto pequeño y en condiciones de normalidad y homo-heterocedasticidad, Psicothema, 12(2), 114-116.

Dowson V. (2000). Time of day effects in school-children's immediate and delayed recall of meaningful material. TERSE Report (en http://www.cem.dur.ac.uk/ebeuk/research/terse/library.htm).

Eurofit (1993). Eurofit Tests of Physical Fitness, 2nd Edition. Strasbourg. Fan, X. (2001). Statistical significance and effect size in education research: two

sides of a coin. The Journal of Educational Research, 94(5), 275-282. Fernández-Cano, A. y Fernández-Guerrero, I. (2009). Crítica y alternativas a la

significación estadística en el contraste de hipótesis, Madrid: La Muralla. Frías, M. D., Pascual, J. y García, J. F. (2000). Tamaño del efecto del

tratamiento y significación estadística. Psicothema, 12(2), 236-240. Ledesma, R., Macbeth, G. y Cortada de Kohan, N. (2008). Tamaño del efecto:

revisión teórica y aplicaciones con el sistema estadístico ViSta, Revista Latinoamericana de Psicología, 40(3), 425-439.

Legido, J. C., Segovia, J. C. y Ballesteros, J. M. (1995). Valoración de la condición física por medio de test. Madrid: Ediciones pedagógicas.

Lustig, D. y Trauser, D. (2004). Effect size and rehabilitation research. Journal of Rehabilitation, 70(4), 3.

Monterde, H., Pascual, J. y Frías, M. D. (2000). Errores de interpretación de los métodos estadísticos: importancia y recomendaciones, Psicothema, 18(4), 848-856.

Pardo, A. y San Martín, R. (1994). Análisis de datos en psicología II, Madrid: Pirámide.

Rhea, M. (2004). Determining the magnitude of treatment effects in strength training research through the use of the effect size. Journal of Strength & Conditioning Research, 18(4), 918-920.

Rosandich, T. P. (1999). International Physical Fitness Test. The Sport Journal, 2(1).

Rosnow, R. L., Rosenthal, R. y Rubin, D. B. (2000). Contrasts and correlation in effect-size estimation. Psychology Science, 11(6), 446-453.

Sink, A. y Stroh, H. R. (2006). Practical significance: the use of effect sizes in school counseling research. Professional School Counseling Journal, 8(1), 115-120.

Smith, M. L. y Honoré, H. H. (2008). Effect size reporting in current health education literature. American Journal of Health Studies, 23(3), 130-135.

Thomas, J. R. y Nelson, J. K. (2007). Métodos de investigación en actividad física. Badalona: Editorial Paidotribo.

http://www.cem.dur.ac.uk/ebeuk/research/terse/library.htm�


727

Thompson, B. (1999). Why “encouraging” effect size reporting is not working: The etiology of researcher resistance to changing practices. The Journal of Psychology, 133(2), 133-140.

Thompson, B. (2006). Research synthesis: effect sizes. En J. Green, G. Gamilli y P. B. Elmore (Eds.). Handbook of complementary methods in education research (pp. 583-603). Washington, DC: American Educational Research Association.

Trusty, J., Thompson, B. y Petrocelli, J. V. (2004). Practical guide for reporting effect size in quantitative research in the Journal of Counselling and Development. Journal of Counselling & Development, 82, 107-110.

Vacha, T. y Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51(4), 473-481.

Valera, A. y Sánchez, J. (1997). Pruebas de significación y magnitud del efecto: Reflexiones y propuestas, Anales de Psicología, 13(1), 85-90.

Vincet, W. J. (2005). Statistics in Kinesilogy (3rd ed.). Human Kinetics. Número de citas totales / Total references: 23 (100%) Número de citas propias de la revista / Journal's own references: 0 (0 %)


View publication statsView publication stats

https://www.researchgate.net/publication/265408941

IMPORTANCIA DEL TAMAÑO DEL EFECTO - ResearchGate · IMPORTANCIA DEL TAMAÑO DEL EFECTO. UNA...

Documents

Transcript of IMPORTANCIA DEL TAMAÑO DEL EFECTO - ResearchGate · IMPORTANCIA DEL TAMAÑO DEL EFECTO. UNA...