...

Psychological Statistics - Complementary course of BSc Counselling Programme - III semester, CUCBCSS 2014 Admn.

by user

on
1

views

Report

Comments

Transcript

Psychological Statistics - Complementary course of BSc Counselling Programme - III semester, CUCBCSS 2014 Admn.
PSYCHOLOGICAL STATISTICS
Complementary course of BSc Counselling Psychology
III semester - CUCBCSS 2014 Admission onwards
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
CALICUT UNIVERSITY.P.O., MALAPPURAM,
KERALA, INDIA – 673 635
School of Distance Education
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL
PSYCHOLOGICAL STATISTICS
III Semester (CUCBCSS – 2014 Admission onwards)
Complementary course of BSc Counselling Psychology - III
Prepared by:
Dr. Nice Mary Francis P
Assistant Professor , Dept of Psychology ,
Prajyoti Niketan College,
Pudukad , Thrissur 680301
Layout:
[Type text]
Computer Section, SDE
Page 2
School of Distance Education
CONTENTS
[Type text]
PAGE No
Module - I
4
Module - II
17
Page 3
School of Distance Education
MODULE – 1
CORRELATION
Correlation is a statistical tool that helps to study the relationship
between two or more than two variables. The measure of correlation,
called the correlation coefficient, summarizes in one figure the direction
and degree of correlation. Thus correlation analysis refers to the
techniques used in measuring the closeness of the relationship between
the variables. A very simple definition given by A.M. Tuttle on
correlation are:
“An analysis of the co variation of two or more variables is usually
called correlation”.
The problem of analyzing the relation between different series can
be broken down into three steps.
1) Determining whether a relation exists and, if it does, measuring
it.
2) Testing whether it is significant.
3) Establishing the cause and effect relation if any.
It should be noted that the detection and analysis of correlation (i.e,
co variation) between two statistical variables require relationships of
some sort which associate the observations in pairs, one of which pair
being a value of each of the two variables. The pairing relationship may
be of almost any nature, such as observations at the same time or place
over a period of time or different places.
The computation concerning the degrees of closeness is based on
the regression equation. However it is possible to perform correlation
analysis without actually having a regression equation.
Scatter Diagram
A scatter diagram is a graph that shows the relation between two
variables. One axis is for one variable; the other axis for the other
variable. The graph has a dot for each pair is placed above that of the
score for that pair on horizontal axis variable and directly across from the
score for that pair on the vertical axis variable.
There are three main steps for making a scatter diagram they are:
(1) Draw the axes and decide which variable goes on which axis
(2) Determine the range of values to use for each variable and mark
them on axes.
(3) Mark a dot for each pair of scores.
[Type text]
Page 4
School of Distance Education
By looking to the scatter of the various points we can form an idea as to
whether the variable are related or not. The more the plotted points
scatter over a chart, the less relationship there is between the two
variables. The more nearly the points come to falling on a line, the higher
the degree of relationship.
Perfect Positive Correlation
If all the points lie on a straight line falling from the lower left hand
corner to the upper right corner, correlation is said to be perfectly
positive. (i.e, r –t 1)
Y
x
x
x
x
Perfect +ve Correlation
X
PERFECT NAGATIVE CORRELATION
If all points are lying on a straight line rising from the upper left
hand corner to the lower right hand corner of the diagram correlation is
said to be perfectly negative. (i.e, r = -1)
Y
x
x
x
x
Perfect -ve Correlation
[Type text]
X
Page 5
School of Distance Education
High Degree of positive correlation and negative correlation
If all plotted points fall in a narrow band there would be a high
degree of correlation between the variables – Correlation shall be positive
if the points showing a rising tendency from the lower left hand corner to
the upper right hand corner.
x
x x
x
x
x
x
High Degree of +ve correlation
And negative if the points shown a declining tendency from the upper left
hand corner to the lower right hand corner of the diagram.
Low Degree of Positive correlation and Negative Correlation
If the points are widely scattered over the diagram it is the
indication of very little relationship between the variables – correlation
shall be positive if the points are rising from the lower left hand corner to
the upper right hand and negative if the points are running from the upper
left hand side to the lower right hand side of the diagram.
Y
[Type text]
Y
Page 6
School of Distance Education
Low degree of +ve correlation
Low Degree of –ve
correlation
If the plotted points lie on a straight line parallel to the X-axis or in
a hap hazard manner, it shows the absence of any relationship between
the variables (r=o).
Y
PROBLEM
Given the following pairs of value of the variable x and y
X
2
3
7
6
8
Y
6
5
5
8
12
9
11
(a) Make a scatter diagram
(b) Do you think that there is any correlation between the variables x
and y? if it positive or negative? It is high or low
(c) By graphic inspection draw an estimating line
(An estimating line or regression line is a line of average
relationship)
[Type text]
Page 7
School of Distance Education
25
20
15
10
5
0
2
4
6
8
10
12
(b) The variables x and y are correlated correlation is positive because
the trend of the points is upward rising from the lower left hand corner to
the upper right hand corner of the diagram. The degree of relationship is
high because the plotted points are near to the line which shows perfect
relationship between variables.
Merits and Limitations of Scatter Diagram
Merits
1) It is a simple and non-mathematical method of studying correlation
between the variables. As such it can be easily understood and a
rough idea can very quickly be formed as to whether or not the
variables are related.
2) It is not influenced by the size of extreme items where as most of
the mathematical methods of finding correlation are influenced by
extreme items.
3) Making a scatter diagram usually is the first step in investigating
the relationship between two variables.
Limitations
By applying this method we can get an idea about the direction of
correlation and also whether it is high or low. But we cannot establish the
[Type text]
Page 8
School of Distance Education
exact degree of correlation between the variables is possible by applying
the mathematical methods.
Karl Pearson’s Coefficient of Correlation
The Karl Pearson’s Method popularly known as a Pearsonian
coefficient of correlation is most widely used in Practice. The pearsonian
coefficient of correlation is denoted by the symbol ‘r’. It is one of the
very few symbols that is used universally for describing the degree of
correlation between two series. The formula for computing Pearsonian r
is:
R = ∑ xy
N√
Hence
x = (x – x), (y – y) = y
√ = Standard deviation of series x
= Standard deviation of series y
N = Number of paired observations
This methods is to be applied only when the deviations of items are
taken from actual means and not from assumed means.
The value of the coefficient of correlation as obtained by the above
formula shall always lie between ± 1 when r = +1, it means there is
perfect correlation between the variables. When r = -1, it means there is
perfect negative correlation between the variables. How ever in practice,
such values of r as +1, -1 and 0 are rare. We normally get values which
lie between +1 and -1 such as +.1, -.4 etc. The coefficient of correlation
describes not only the magnitude of correlation but also its direction.
Thus +.8 would mean that correlation is positive because the signs of r is
+ and the magnitude of correlation is .8.
The above formula for computing Pearsonian coefficient of
correlation can be transformed in the following form which is easier to
apply.
r =
∑xy
∑ +∑
Where
x = (x – x), y = (y – y)
This simplifies greatly the task of calculating correlation.
[Type text]
Page 9
School of Distance Education
Steps
i) Take the deviation of x series from the mean of x and denote the
deviations by x.
ii) Square these deviations and obtain the total, i.e, ∑ .
iii) Take the deviations of y series from the mean of y and denote these
deviations by y.
iv) Square these deviations and obtain the total, i.e, ∑ .
v) Multiply the values (deviations) of x and y series and obtain the
total, i.e, ∑
.
vi) Substitute the values of ∑
, ∑ , . ∑ in the above formula.
Problem
a) Calculate Karl Pearson’s coefficient of correlation from the
following data.
X : 6
8
12
15
18
20
24
28
31
Y : 10
12
15
15
18
25
22
26
28
Ans: Calculation of Karl Pearson’s Correlation Coefficient
x
(x-18) x
x2
y
(y-19) y
y2
6
-12
144
10
-9
8
-10
100
12
-7
12
-6
36
15
-4
15
-3
9
15
-4
18
0
0
18
-1
20
+2
4
25
+6
24
+6
36
22
+3
28
+10
100
26
+7
31
+13
169
28
+9
∑ x=162
r =
∑ x=0
∑ x2=598
∑ y=171
∑y=0
81
49
16
16
1
36
9
49
81
∑ y2=338
xy
+108
+70
+24
+12
0
+12
+18
+70
+117
∑xy=431
∑xy
∑ +∑
∑xy = 431, ∑x2 = 598, ∑y2 = 338
r=
=
[Type text]
431
√598 + 338
431
Page 10
School of Distance Education
449 . 582
= + 0.959
2.
Calculation of correlation coefficient when change of scale and
origin is made
Since r is a pure number, shifting the origin and changing the scale of
series do not affect its value.
Find the coefficient of correlation from the following data : given x and y
datas as follows:
Ans: In order to simplify calculations, let us divide each value of the
variable y by 100.
Calculation of correlation coefficient
X
300
350
400
450
500
550
600
650
700
x/50 x1
(x1–x)
x1=10x
6
7
8
9
10
11
12
13
14
-4
-3
-2
-1
0
+1
+2
+3
+4
16
9
4
1
0
1
4
9
16
∑x1=90 ∑x=0
r =
x2
∑x2=60
y
800
900
1000
1100
1200
1300
1400
1500
1600
y/100
y1
(y1–y)
y1=12y
y2
xy
8
9
10
11
12
13
14
15
16
-4
-3
-2
-1
0
+1
+2
+3
+4
16
9
4
1
0
1
4
9
16
16
9
4
1
0
1
4
9
16
∑y1-108
∑y=0
∑y2=60 ∑xy=60
∑xy
∑ +∑
∑xy = 60, ∑x2 = 60, ∑y2 = 60
r=
=
[Type text]
60
√60 + 60
60
60
=
1
Page 11
School of Distance Education
Assumptions of the Pearsonian Coefficient
Karl Pearson’s coefficient of correlation is based on the following
assumptions:
1) There is linear relationship between the variables, i.e., when the
two variables are plotted on a scatter diagram straight line will be
formed by the points so plotted.
2) The two variables under study are affected by a large number of
independent causes so as to form a normal distribution. Variables
like height, weight, price, demand, supply, etc., are affected by
such forces that a normal distribution is formed.
3) There is a cause and effect relationship between the forces
affecting the distribution of the items in the two series. If such a
relationship is not formed between the variables, i.e., If the
variables are independent, there cannot be any correlation.
Merits and Limitations of the Pearsonian Coefficient.
Amongst the mathematical methods used for measuring the degree
of relationship, Karl Pearson’s method is most popular. The correlation
coefficient summarizes in one figure not only the degree of correlation
but also the direction, i.e., whether correlation is positive or negative.
Positive or negative
However, the utility of this coefficient depends in part on a wide
knowledge of the meaning of this ‘yardstick’ together with its limitations.
The chief limitations of the method are:
1. The correlation coefficient always assumes linear relationship
regardless of the fact whether that assumption is correct or not
2. Great care must be exercised in interpreting the value of this
coefficient as very often the coefficient is misinterpreted.
3. The value of the coefficient is unduly affected by the extreme items
4. As compared with some other methods this method is more time
consuming.
Coefficient of correlation and Probable error
The probable error of the coefficient of correlation helps in
interpreting its value. With the help of probable error it is possible to
determine the reliability of the value of the coefficient in so far as it
depends on the conditions of random sampling. The probable error of the
coefficient of correlation is obtained as follows:P.E. = 0.6745
[Type text]
1 – r2
√
Page 12
School of Distance Education
If 0.6745 is omitted from the formula of probable error, we get the error
of coefficient of correlation. The standard of r, therefore is
S.E. =
1 – r2
√
Conditions for the use of probable error:1) The data must approximate a normal frequency curve
2) The statistical measure for which the P.E. is computed must have
been calculated from a sample.
3) The sample must have been selected in a unbiased manner and the
individual items must be independent.
Coefficient of Determination
One very unconvenient and useful way of interpreting the value of
coefficient of correlation between two variables is to use the square of
coefficient of correlation, which is called coefficient of determination.
Properties of Coefficient of Correlation
The following are the important properties of the correlation
coefficient, r:
1) The coefficient of correlation lies between -1 and +1 symbolically 1≤ r ≤ + 1 or |r|≤1
2) The coefficient of correlation is independent of change of scale and
origin of the variables x and y.
3) The coefficient of correlation is the geometric mean of 2 regression
coefficients.
4) Symbolically r =
RANIC correlation Coefficient
This method of finding our convariability or the lack of it between
two variables was developed by the British Psychologist Charles Edward
Spearman in 1904. This measure is especially useful when quantitative
measure for certain factors cannot be fixed (like evaluation of leadership
ability or the judgment of female beauty). But the individuals in the
group can be arranged in order thereby obtaining for each individual a
number indicating his rank in the group. In any event, the rank
correlation coefficient is applied to a set of ordinal rank numbers with 1
for individual ranked first in quantity or quality and so on, to n for the
individual ranked last in a group of n individuals.
[Type text]
Page 13
School of Distance Education
Spearman’s rank correlation is defined as:
Where
R = 1 - 6 ∑ D2
N2 –N
R = Rank coefficient of correlation
D = difference of ranks between paired item in 2 series.
In rank correlation we may have two types of problems:
a) Where actual ranks are given
b) Where ranks are not given
4)
Where actual ranks are given: steps required for computation
(i)
Take the differences of the two ranks, i.e, (R1 – R2) and denote
these differences by D.
(ii) Square these differences and obtain the total ∑ D2
(iii) Apply the formula
R = 1 - 6 ∑ D2
N3 –N
Problem
1. Two judges in a beauty competition rank the 12 entries as follows:
X
1
2
3
4
5
6
7
8
9
10 11 12
Y
12 11 10 9
8
7
6
5
4
3
2
1
What degree of agreement is there between the judgment of the
two judges?
Calculation of Rank Correlation Coefficient
X
Y
(R1-R2)
R1
R2
D
1
12
-11
2
9
-7
3
6
-3
4
10
-6
5
3
+2
6
5
+1
7
4
+3
8
7
+1
9
8
+1
10
2
+8
11
11
0
12
1
+11
[Type text]
D2
121
49
9
36
4
1
9
1
1
64
0
121
2
∑ D = 416
Page 14
School of Distance Education
R = 1 - 6 ∑ D2
N3 –N
∑ D2 = 416, N = 12
R = 1 - 6 x 416
123 - 12
= 1 – 2496
1716
= 1 – 1.454
=
– 0.454
(b)
Where ranks are not given : Steps for computation
When we are given the actual data and not the ranks, it will be
necessary to assign the ranks. Ranks can be assigned by taking either the
highest value as 1 or the lowest value as 1 and we must follow the same
method in case of both the variables.
Problems
1. Calculate spearman’s coefficient of rank correlation for the
following
X
56
98
95
81
75
61
59
55
Y
47
25
32
37
30
40
39
45
Calculation of Rank Correlation Coefficient
X
R1
Y
R2
53
98
95
81
75
61
59
55
1
8
7
6
5
4
3
2
47
25
32
37
30
40
39
45
8
1
3
4
2
6
5
7
R=1 -
[Type text]
(R1 –R2)2
D2
49
49
16
4
9
4
4
25
2
∑ D = 160
6∑ D2
N3 – N
Page 15
School of Distance Education
∑ D2 = 160, N = 8
R = 1 – 6 x 160
83 – 8
= 1 – 960
504
= 1 – 1.905 = 0.905
LINEAR CORRELATION
Relation between two variables that shows up on a scatter diagram as the
dots roughly following a straight line; a correlation of r unequal to 0
Multiple Correlation
Correlation of a criterion variable with two or more predictor variables.
Partial Correlation Coefficient
Measure of the degree of correlation between two variables over and
above the influence of one or more other variables.
Linear and Nonlinear (Corvilliner) Correlation
If the amount of change in one variable tends to bear a constant ratio to
the amount of change in another variable, then the correlatin is said to be
linear.
Correlation would be non – linear if the amount of change in one variable
does not bear a constant ratio to the amount of change in other variable.
Simple Correlation
The correlation of criterion with only two variables.
[Type text]
Page 16
School of Distance Education
MODULE – 2
NON PARAMETRIC TESTS
The word parameter is a quaintly (such as the mean or standard deviation)
that characterizes a statistical population and that can be estimated by7
calculations from sample data. Statisticians have developed several types
of tests of hypothesis for the purpose of testing hypothesis. The test of
hypothesis is also known as the test of significance. The test of
hypothesis can be classified into two categories: they are:
i)
Parametric tests or standard of hypothesis.
ii)
Non parametric tests or distribution free test of hypothesis.
Parametric test are usually based on certain properties of the parent
population from which we draw samples. Their assumptions are:
1. They are from a normal population
2. Sample size is large
3. Assumption about the population parameter like mean, variance etc
must hold good.
But there are situation when an investigator does not want to take such as
assumptions. In such situations statistical methods for test hypothesis are
used and they are non-parametric tests. These tests are not based on the
assumptions about the parameters of the parent population. Most of the
non parametric tests are based on the measurement equivalent to at least
an interval scale. The parametric methods make more assumptions than
non parametric methods. Some of the important non parametric or
distribution free tests are:
1) Test of hypothesis, concerning some location parameter for the
given data (one sample sign test)
2) Test of hypothesis concerning difference among 2 or more sets of
data such as 2 sample sign test, rank – sum test, signed rank test.
3) Test of hypothesis of the relationship between variables, such as
rank correlation, Kengalle’s coefficient of concordance and other
tests for dependence.
4) Test of hypothesis concerning the variance in the given data. That
is tests analysis to ANOVA namely Kruskal- wallis.
5) Test of randomneu of a sample based on the theory of runs (namely
one sample run test).
6) Test of hypothesis to determinate whether catagonical data shows
dependency or not. The chi-square test can be used as well to make
comparison between theoretical population and actual data when
categories are used.
In statistical tests two kinds of assertions are involved they are:
[Type text]
Page 17
School of Distance Education
1) An aueration directly related to the purpose of investigation and to
2) An auertion to make a probability statement.
The former is an auertion to be tested and is technically called a
hypothesis where as the set of all other auertions is called the model.
When we applied a test without a model it is known as distribution
free test or non parametric test do not make an assumption about the
parameters of the populations and thus do not make use of the
parameters of the distribution. In other words under non parametric or
distribution free test we do not assume that a particular distribution is
applicable or that a certain value is attached to a parameter of a
population.
For instance while testing the two training methods say A and B
for determining the superiority of one over the other, if we do not
assume that the scores of the training are normally distributed or that
the mean scores of all trainee taking method ‘A’ would be certain
value then the testing method is known as “ a distribution free or non
parametric test” In fact there is a growing use of such tests in
situations when the normality assumptions is open to doubt. As a
result many distribution on free test have been developed that do not
depend on the shape of the distribution with parameters of underlined
population.
ADVANTAGES OF NON PARAMETRIC TEST
Non parametric tests are distribution free i.e, they do not require any
assumptions to be made on the population that is assumptions like
populations, skived distributions and so on. Generally they are simple
to understand, easy to apply even when sample sizes are small. Most
of the non parametric tests do not require lengthy calculations and
hence they are less time consuming.
1) Non parametric test are applicable to all types of data qualitative or
quantitative.
2) Many non parametric tests make it possible to work with very
small samples that is particularly helpful to the researches to
conduct pilot shades for a medical researchers working with rare
diseases.
3) Non parametric methods have less stringent assumptions than the
classical procedures.
DISADVANTAGES OF NON PARAMETRIC TESTS
1) If all the assumptions of parametric tests are infract met in the data,
if the measurement is of all the required strength. Non parametric
tests are based on the full data.
[Type text]
Page 18
School of Distance Education
2) There are no non parametric methods for testing the interactions of
ANOVA.
3) Tables of critical values may not be easily available.
4) Non parametric techniques sometimes lack sensitivity or power
because it produces confidence intervals that are too wide.
USES OF NON PARAMETRIC TEST
1) When a quick or preliminary data analysis is used.
2) When the assumptions of a distribution or a parametric produce is
not satisfied or either of there is unknown.
3) When the data are only roughly scales (quantities data- nominal or
ordinal)
4) Basic questions of interest is distribution free or non parametric in
nature.
CHI – SQUARE TEST
The Chi- Square distribution was first discovered by Helmert in 1875
and then rediscovered independently by Karl Pearson in 1900, who
applied it as a test of ‘goodness of fit’ Any fitting problem mainly viewed
as a problem of finding theoretical or expected frequencies given a set of
observed frequencies 01, 02………….. ok denote a set of observed
frequencies and let E1, E2… EK be the corresponding set of expected
frequencies. We know that the corresponding set og expected
frequencies. We know that
∑k Oi = ∑k Ei = N
i=1
i=1
The statistic of Chi-square feet of goodness of fit is defined as
X2= ∑k [Oi – Ei]2 ~X2
i=1 E1
(k-r-1)
Where K is the number of cells, r is the number of parameters
estimated. For example in the position case we estimate only one
parameter namely (lamda), the mean of distribution. In this case r =1
and we have,
X2-X
=X2
(k-1-1)
(k-2)
If no parameter is estimated then chi- square (X2) follows i.e
X2 ~ X2
(K-1)
From the corresponding to level of significance α and the degrees of
freedom (k-r-l). if the calculates value of X2 is less than X2 α. . we
accept the null hypothesis Ho that the fit is good. This means that there is
not much difference b/w the set of observed and expected frequencies and
[Type text]
Page 19
School of Distance Education
the difference found is not significant. There are some conditions to be
satisfied before X2 feet of goodness of fit is applied they are:
1) Total frequency N ≥ 50
2) All the expected frequencies should be ≥ 5. If some of them are
leu than 5 the corresponding cells are merged to their adjacent cells
to make the expected frequency greater than or equal to 5. Now K
is the number of cells remaining. The feel statistics X2.
X2= ∑k
[ Oi – Ei ]2 ~ X2
i=1
E1
(k – r – 1 )
= ∑k
Oi – N ~ X2
i=1 E1
(k–r–1)
X2 test of independence of attributes
Consider the attributes AB. We want to test the hypothesis that A and
B are independent versus the alternative that they are not independent.
Suppose that the attributes A is divided in to ‘r’ clauses, B into ‘S’
clauses, we then say that are have an r x s contingency in each cell.
The table is represented as follows.
A /B
B1
B2
……………..
Bs
TOTAL
A1
F11
F12
……………..
F1s
F1
A2
F21
F22
……………..
F2s
F2
A3
F31
F32
……………..
F3s
F3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Ar
Total
.
.
Ar1
F.1
.
.
Fr2
F.2
.
.
.
.
.
.
Frs
F.S
.
.
Fr
N/(F)
We also have,
F1+ F2+.F3……………… Fr = F.1+ F.2+.F.3……………… F.s = N/ F
To find the expected frequencies : the expected is F(11) can be evaluated
as follows:
P (P11) = P (A1 B1) o r P (A1 ^ B1) or P (A1) x P (B)
Since under the null hypothesis, we assume that the attributes A and B are
independent:
[Type text]
Page 20
School of Distance Education
= F1 x F1
N
N
. : E (F11) = N x F1. x F.1
N
= F1 x F1
N
Similarly E(F12), E(Fr() etc and so on.
Similarly an event Ai Bj.i = 1,2…………r and J = 1,2. Is an event
of finding an individual having attributes Ai and Bj at the same time, we
now calculate the X2 statics .
The calculated value greater than the table value corresponding to (r-1) x
S-1), we reject the null hypothesis that the attributes are independent
2 x 2 contingency table
The observations are given in the form of 2 x 2, contingency table can be
represented as
A/b in the X2 statistics as:
C/d
(ad-bc) N
~ X2 (1)
(a+b) (c+d) (a+c) (b+d)
The table value of X2 (1) for 5% level of significance is equal to 3.841
and for 1% is equal to 6.635.
X2 test of homogeneity of proportions is concerned with the following
questions:
- Are the samples coming from homogenous population
(homogenous with some certain classification)
The null hypothesis states that all the population are identical with the
alternative that they are not should test this hypothesis, we assume that
each of there ‘K’ populations is again sub divided into categories.
The expected frequencies corresponding to the altered frequencies can
be calculated as follows:
Eij = ith row total x jth column total
N
If the calculated value of X2 is leu than table value corresponding to
(k-1) S-1) d.f, accept the null hypothesis that the K populations are
homogenous or otherwise reject Ho
YATES CORRECTION
When self frequencies are small and X2 is just on
significance level the correction suggested by Yates is popularity known
as the Yates correction. It involves the reduction of deviation of observed
from expected frequencies which of course reduces the value of X2 . The
[Type text]
Page 21
School of Distance Education
rule of correction is to adjust the observed frequency in each cello 2 x 2
contingency table, in such a way as to reduce the deviation from the
observed frequency for that cell by 0.6, but this adjustment is made in all
the cell without disturbing the marginal total. The formula for finding the
value of X2 after applying Yates correction can be stated as
X2
=
N (1ad – bc1 – 0.5)2
(corrected
(A+B) (C+D) (A+C) (B+D)
In case we use the usual formula for calculating the value of X2 =
∑(Oij – Eij)2 the yater correction can be applied as under
Eij
X2 (corrected ) = [ |O1 – E1| - 0.5 ]2 + [|O2 – E2| - 0.5 ]2
E1
E2
It may be emphasized as Yates correction is made only in case of 2
x 2 table and that two when self frequencies are small
CONVERSION OF X2 INTO COEFFICIENT OF CONTINGENCY
X2 value may also be converted into coefficient of contingency
especially in case of the contingency table of higher order than 2 x 2 table
to study the magnitude of the relatin or the degree of association between
two attributes as shown below.
C=
√
While finding out the value of c we proceed on the assumption of null
hypothesis that the 2 attributes are independent and exhibit no
association. Coefficient of contingency is also known as coefficient
means square contingency. This measures also comes under the
category of non parametric measure of relationship.
PROBLEM
1. A die is thrown 600 times the frequencies of the face numbers are
as follows
X
O1
E1
1
92
100
2
87
100
3
90
100
4
110
100
5
113
100
6
108
100
Test whether the die is unbiased
Ho ; The die is unbiased
[Type text]
Page 22
School of Distance Education
H1 ; The die is biased
∑D12 – N – X2 (k – r – 1)
E1
606666 – 600
100
=
=
=
=
606.666 – 600
6.66
6.66 X2 (6 – r -1)
6.66 X2 (5)
6.66 X2 (5)
Where the table value is 11.070
The calculated value is less than the table value. Hence we accept HD
and die is unbiased.
2. The following table relates to marital status and performance in an
examination. Test whether performance depends on the marital
status.
Performance
Marital status
Married
Unmarried
Total
Good
Bad
Total
60
20
80
80
40
120
140
120
Ho; The performance marital status are independent.
Hi; The performance and marital status are not independent
X2 =
(ad-bc)2 N
~ X2 (1)
(a+b) (c+d) (a+c) (b+d)
=
(60 x 40 – 80 x 20 )2 200
(60 + 80) (20 +40) (60 + 20) (80 +40)
=
8002 x 200
80640000
=
1.5873, ~ X21 = 3.841
Since the calculated value is less than table value we accept Ho. The
performance and marital status are independent.
[Type text]
Page 23
School of Distance Education
Important Characteristics of X2 Test
1. This test is based on frequencies and not on the parameters like
mean and SD.
2. The test is used for testing the hypothesis and is not useful for
estimation. This test possess the adihve property that has already
been explained.
3. This test is an important non parametric test as no rigid
assumptions are necessary in regard to the type of population, no
need of parameter values and relatively less mathematical details
are involved.
Median Test
The non parametric test may be used to test null hypothesis that 2
independent samples have been brought from population that equal
medians is a median test. Here the null hypothesis is Ho the 2 population
medians are equal against the alternative that they are not equal.
Assumptions
a) Samples are selected independently and at random from their
respective populations.
b) The population are at same form differing only in occation.
c) The variable of interest is continous
d) The level of measurement is atteast ordinal.
e) Teo samples need not have equal size
f) The test statistics follows a thi square distribution with degree of
freedom approximately
Calculation of test statistic
1) Compute the common median of 2 samples combined.
2) Now determine for each group the number of observation falling
above and below common median. The resulted frequencies are
arranged in a 2 x 2 table.
PROBLEMS
1. Member of a random sample of 12 male students from a rural
junior high school and an independent random samples of 16 male
students from an urban Junior high school were given a test to
measure the level of mental health. Test whether there is a
different in their average source.
[Type text]
Page 24
School of Distance Education
Level of mental health
Urban
Rural
35
26
27
21
27
38
23
25
25
27
45
46
33
26
45
41
29
50
43
22
42
47
42
32
50
37
34
31
Observation
21
22
23
25
25
26
26
27
27
29
31
32
33
34
35
37
38
In
ascending
order
41
42
43
45
46
47
50
50
33+34 = 33.5
Ho ; the two medians (avgs) are equal
H1; the two medians (avgs) are not equal
Rural
No of observation above 8
median
No of observation below 4
median
Total
12
Urban
6
Total
14
10
14
16
28
Since the calculated value is less than table value we accept Ho; hence 2
averages are equal.
Mann- Whitney U test is considered more powerful than median test for
comparing data of two unrelated samples. Where median test helps only
in comparing central tendencies of the populations from which there
samples have been deawn, the U test is capable of testing the differences
between population distributions in so many aspects other than the central
[Type text]
Page 25
School of Distance Education
tendencies. The null hypothesis is to be tested in this is that these lies no
difference between the distribution of two samples. The producer requires
first to determine the value of U by counting how many scores from
sample A produce each score of sample B and then comparing it with the
critical value of U lead from a given table for a required level of
significance for the given NL and NS. The procedure for computing U
slightly differs with moderately large samples (having N1or N2 between 9
and 20). For large samples, we first convert U into Z function and then
use the Z value for accepting or rejecting Ho.
PROBLEMS
Two plastics each produced by a different process where tested for
the ultimate strength. The measurements in the accompanying table
represents breaking loads in units of 1000 pounds/inch sq. do the data
present evidence of the difference between the locations of the
distribution of ultimate strength for the 2 plant u. test by using Mann
whitney U tets with the level of significance 2=.1
Plastic 1
Plastic 2
15.3
21.2
18.7
22.4
22.3
18.3
17.6
19.3
19.1
17.1
14.8
27.7
Ho : The locations of the distributions of the strength of two
plastics are the same.
H1 : They are different
Arranging observation in ascending order
14.8
15.3
17.1
17.6
18.3
18.7
19.1
19.3
21.2
22.3
22.4
27.7
[Type text]
Page 26
School of Distance Education
u = 2+3+5+5+6+6+= 27
uo = 29
= (n1n2-uo)
= (6x6-29)
=7
We accept Ho. The locations of the distributions of the strength of 2
plastics are same.
SIGH TEST
The sign test may be used for testing the significance of differences
between two correlated samples in which the data is available either in
ordinal measurement or simply expressed in terms of positive and
negative signs, showing the directions of differences existing between the
observed scores of matched pairs .
The null hypothesis test here is that the median change is (o.i.e.)
there are equal number of positive and negative sign. If the number of
matched pairs are equal to or less than 10, a test of significance is applied
using the binominal probabilities distribution table. If the number is more
than 10, then distribution is assumed as normal and the Z values ae used
for rejecting or accepting Ho.
PROBLEM
Pair No
1
2
3
4
5
6
7
8
9
10
11
12
X
1.5
2
3.5
3
3.5
2.5
2
1.5
1.5
2
3
2
Y
2
2
4
2.5
4
3
3.5
3
2.5
2.5
2.5
2.5
Difference
0
+
+
-
K=2
K~ B (11, ½)
Ho : The two samples are same
H1 : They are different
0.5/2 = 0.025 compared to calculated value 0.0327 since
0.037>0.025 we accept Ho
[Type text]
Page 27
School of Distance Education
FORMULA FOR LARGE SAMPLE SIGN TEST
If the number of observations are large in the case of sign test the
test statistics Z=K-n/2
Sign test for pared data when the data to be analyzed consist of
observations in matched pairs and the assumptions underlined the test is
not met or the measurement scale is not adequate. Then the sign test may
be employed to test the null hypothesis that the median different is o.
WILCOXON MATCHED- PAIRS SIGNED RANKS TEST
This test is considered more powerful than the sign test as it takes
into consideration the magnitude along with the direction of the
differences existing between matched pairs. Here we computer the
statistics T and compare it with the critical value of T read from a given
table for a particular significance level and drawing inferences about
rejecting or accepting Ho. In case the number of matched pairs are more
than 25, then we first convert the computed T into the Z function and then
as usual issue this Z for testing the significance.
PROBLEM
Charles Darwin conducted an experiment to determine if self and
cross fertilize the plant has different rates. Pairs of pea plant one self and
others cross fertilized were planted in plots and their heights were
measured after a specified period of time. The data Darwin obtained were
Cross plant Self
Di
Rank
1
188
139
49
11
2
96
163
-67
14
3
168
160
8
2
4
176
160
16
4
5
153
147
6
1
6
172
149
23
5
7
177
149
28
7
8
163
122
41
9
9
146
132
14
3
10
173
144
29
8
11
186
130
56
12
12
168
144
24
6
13
177
102
75
15
14
184
124
60
13
15
96
144
-48
10
Ho : Gross plants and self fertilize has same growth rate
H1 : They have different growth rate
++ = 11+14+2+4+1+5+7+9+3+8+12+6+15+13
[Type text]
Page 28
School of Distance Education
=96
N=11
2=2.0447
CV>TV we reject Ho
LARGE SAMPLE APPROXIMATION TO MANN WHITTENTY U
TEST
When either we n1 or n2 is > 20 cannot use man whitteny U table
for the critical value. In this case we may compute a large sample test
statistic Z=U- n1 n2
[Type text]
Page 29
Fly UP