Psychological Statistics - Complementary course of BSc Counselling Programme - III semester, CUCBCSS 2014 Admn.
PSYCHOLOGICAL STATISTICS Complementary course of BSc Counselling Psychology III semester - CUCBCSS 2014 Admission onwards UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA, INDIA – 673 635 School of Distance Education UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION STUDY MATERIAL PSYCHOLOGICAL STATISTICS III Semester (CUCBCSS – 2014 Admission onwards) Complementary course of BSc Counselling Psychology - III Prepared by: Dr. Nice Mary Francis P Assistant Professor , Dept of Psychology , Prajyoti Niketan College, Pudukad , Thrissur 680301 Layout: [Type text] Computer Section, SDE Page 2 School of Distance Education CONTENTS [Type text] PAGE No Module - I 4 Module - II 17 Page 3 School of Distance Education MODULE – 1 CORRELATION Correlation is a statistical tool that helps to study the relationship between two or more than two variables. The measure of correlation, called the correlation coefficient, summarizes in one figure the direction and degree of correlation. Thus correlation analysis refers to the techniques used in measuring the closeness of the relationship between the variables. A very simple definition given by A.M. Tuttle on correlation are: “An analysis of the co variation of two or more variables is usually called correlation”. The problem of analyzing the relation between different series can be broken down into three steps. 1) Determining whether a relation exists and, if it does, measuring it. 2) Testing whether it is significant. 3) Establishing the cause and effect relation if any. It should be noted that the detection and analysis of correlation (i.e, co variation) between two statistical variables require relationships of some sort which associate the observations in pairs, one of which pair being a value of each of the two variables. The pairing relationship may be of almost any nature, such as observations at the same time or place over a period of time or different places. The computation concerning the degrees of closeness is based on the regression equation. However it is possible to perform correlation analysis without actually having a regression equation. Scatter Diagram A scatter diagram is a graph that shows the relation between two variables. One axis is for one variable; the other axis for the other variable. The graph has a dot for each pair is placed above that of the score for that pair on horizontal axis variable and directly across from the score for that pair on the vertical axis variable. There are three main steps for making a scatter diagram they are: (1) Draw the axes and decide which variable goes on which axis (2) Determine the range of values to use for each variable and mark them on axes. (3) Mark a dot for each pair of scores. [Type text] Page 4 School of Distance Education By looking to the scatter of the various points we can form an idea as to whether the variable are related or not. The more the plotted points scatter over a chart, the less relationship there is between the two variables. The more nearly the points come to falling on a line, the higher the degree of relationship. Perfect Positive Correlation If all the points lie on a straight line falling from the lower left hand corner to the upper right corner, correlation is said to be perfectly positive. (i.e, r –t 1) Y x x x x Perfect +ve Correlation X PERFECT NAGATIVE CORRELATION If all points are lying on a straight line rising from the upper left hand corner to the lower right hand corner of the diagram correlation is said to be perfectly negative. (i.e, r = -1) Y x x x x Perfect -ve Correlation [Type text] X Page 5 School of Distance Education High Degree of positive correlation and negative correlation If all plotted points fall in a narrow band there would be a high degree of correlation between the variables – Correlation shall be positive if the points showing a rising tendency from the lower left hand corner to the upper right hand corner. x x x x x x x High Degree of +ve correlation And negative if the points shown a declining tendency from the upper left hand corner to the lower right hand corner of the diagram. Low Degree of Positive correlation and Negative Correlation If the points are widely scattered over the diagram it is the indication of very little relationship between the variables – correlation shall be positive if the points are rising from the lower left hand corner to the upper right hand and negative if the points are running from the upper left hand side to the lower right hand side of the diagram. Y [Type text] Y Page 6 School of Distance Education Low degree of +ve correlation Low Degree of –ve correlation If the plotted points lie on a straight line parallel to the X-axis or in a hap hazard manner, it shows the absence of any relationship between the variables (r=o). Y PROBLEM Given the following pairs of value of the variable x and y X 2 3 7 6 8 Y 6 5 5 8 12 9 11 (a) Make a scatter diagram (b) Do you think that there is any correlation between the variables x and y? if it positive or negative? It is high or low (c) By graphic inspection draw an estimating line (An estimating line or regression line is a line of average relationship) [Type text] Page 7 School of Distance Education 25 20 15 10 5 0 2 4 6 8 10 12 (b) The variables x and y are correlated correlation is positive because the trend of the points is upward rising from the lower left hand corner to the upper right hand corner of the diagram. The degree of relationship is high because the plotted points are near to the line which shows perfect relationship between variables. Merits and Limitations of Scatter Diagram Merits 1) It is a simple and non-mathematical method of studying correlation between the variables. As such it can be easily understood and a rough idea can very quickly be formed as to whether or not the variables are related. 2) It is not influenced by the size of extreme items where as most of the mathematical methods of finding correlation are influenced by extreme items. 3) Making a scatter diagram usually is the first step in investigating the relationship between two variables. Limitations By applying this method we can get an idea about the direction of correlation and also whether it is high or low. But we cannot establish the [Type text] Page 8 School of Distance Education exact degree of correlation between the variables is possible by applying the mathematical methods. Karl Pearson’s Coefficient of Correlation The Karl Pearson’s Method popularly known as a Pearsonian coefficient of correlation is most widely used in Practice. The pearsonian coefficient of correlation is denoted by the symbol ‘r’. It is one of the very few symbols that is used universally for describing the degree of correlation between two series. The formula for computing Pearsonian r is: R = ∑ xy N√ Hence x = (x – x), (y – y) = y √ = Standard deviation of series x = Standard deviation of series y N = Number of paired observations This methods is to be applied only when the deviations of items are taken from actual means and not from assumed means. The value of the coefficient of correlation as obtained by the above formula shall always lie between ± 1 when r = +1, it means there is perfect correlation between the variables. When r = -1, it means there is perfect negative correlation between the variables. How ever in practice, such values of r as +1, -1 and 0 are rare. We normally get values which lie between +1 and -1 such as +.1, -.4 etc. The coefficient of correlation describes not only the magnitude of correlation but also its direction. Thus +.8 would mean that correlation is positive because the signs of r is + and the magnitude of correlation is .8. The above formula for computing Pearsonian coefficient of correlation can be transformed in the following form which is easier to apply. r = ∑xy ∑ +∑ Where x = (x – x), y = (y – y) This simplifies greatly the task of calculating correlation. [Type text] Page 9 School of Distance Education Steps i) Take the deviation of x series from the mean of x and denote the deviations by x. ii) Square these deviations and obtain the total, i.e, ∑ . iii) Take the deviations of y series from the mean of y and denote these deviations by y. iv) Square these deviations and obtain the total, i.e, ∑ . v) Multiply the values (deviations) of x and y series and obtain the total, i.e, ∑ . vi) Substitute the values of ∑ , ∑ , . ∑ in the above formula. Problem a) Calculate Karl Pearson’s coefficient of correlation from the following data. X : 6 8 12 15 18 20 24 28 31 Y : 10 12 15 15 18 25 22 26 28 Ans: Calculation of Karl Pearson’s Correlation Coefficient x (x-18) x x2 y (y-19) y y2 6 -12 144 10 -9 8 -10 100 12 -7 12 -6 36 15 -4 15 -3 9 15 -4 18 0 0 18 -1 20 +2 4 25 +6 24 +6 36 22 +3 28 +10 100 26 +7 31 +13 169 28 +9 ∑ x=162 r = ∑ x=0 ∑ x2=598 ∑ y=171 ∑y=0 81 49 16 16 1 36 9 49 81 ∑ y2=338 xy +108 +70 +24 +12 0 +12 +18 +70 +117 ∑xy=431 ∑xy ∑ +∑ ∑xy = 431, ∑x2 = 598, ∑y2 = 338 r= = [Type text] 431 √598 + 338 431 Page 10 School of Distance Education 449 . 582 = + 0.959 2. Calculation of correlation coefficient when change of scale and origin is made Since r is a pure number, shifting the origin and changing the scale of series do not affect its value. Find the coefficient of correlation from the following data : given x and y datas as follows: Ans: In order to simplify calculations, let us divide each value of the variable y by 100. Calculation of correlation coefficient X 300 350 400 450 500 550 600 650 700 x/50 x1 (x1–x) x1=10x 6 7 8 9 10 11 12 13 14 -4 -3 -2 -1 0 +1 +2 +3 +4 16 9 4 1 0 1 4 9 16 ∑x1=90 ∑x=0 r = x2 ∑x2=60 y 800 900 1000 1100 1200 1300 1400 1500 1600 y/100 y1 (y1–y) y1=12y y2 xy 8 9 10 11 12 13 14 15 16 -4 -3 -2 -1 0 +1 +2 +3 +4 16 9 4 1 0 1 4 9 16 16 9 4 1 0 1 4 9 16 ∑y1-108 ∑y=0 ∑y2=60 ∑xy=60 ∑xy ∑ +∑ ∑xy = 60, ∑x2 = 60, ∑y2 = 60 r= = [Type text] 60 √60 + 60 60 60 = 1 Page 11 School of Distance Education Assumptions of the Pearsonian Coefficient Karl Pearson’s coefficient of correlation is based on the following assumptions: 1) There is linear relationship between the variables, i.e., when the two variables are plotted on a scatter diagram straight line will be formed by the points so plotted. 2) The two variables under study are affected by a large number of independent causes so as to form a normal distribution. Variables like height, weight, price, demand, supply, etc., are affected by such forces that a normal distribution is formed. 3) There is a cause and effect relationship between the forces affecting the distribution of the items in the two series. If such a relationship is not formed between the variables, i.e., If the variables are independent, there cannot be any correlation. Merits and Limitations of the Pearsonian Coefficient. Amongst the mathematical methods used for measuring the degree of relationship, Karl Pearson’s method is most popular. The correlation coefficient summarizes in one figure not only the degree of correlation but also the direction, i.e., whether correlation is positive or negative. Positive or negative However, the utility of this coefficient depends in part on a wide knowledge of the meaning of this ‘yardstick’ together with its limitations. The chief limitations of the method are: 1. The correlation coefficient always assumes linear relationship regardless of the fact whether that assumption is correct or not 2. Great care must be exercised in interpreting the value of this coefficient as very often the coefficient is misinterpreted. 3. The value of the coefficient is unduly affected by the extreme items 4. As compared with some other methods this method is more time consuming. Coefficient of correlation and Probable error The probable error of the coefficient of correlation helps in interpreting its value. With the help of probable error it is possible to determine the reliability of the value of the coefficient in so far as it depends on the conditions of random sampling. The probable error of the coefficient of correlation is obtained as follows:P.E. = 0.6745 [Type text] 1 – r2 √ Page 12 School of Distance Education If 0.6745 is omitted from the formula of probable error, we get the error of coefficient of correlation. The standard of r, therefore is S.E. = 1 – r2 √ Conditions for the use of probable error:1) The data must approximate a normal frequency curve 2) The statistical measure for which the P.E. is computed must have been calculated from a sample. 3) The sample must have been selected in a unbiased manner and the individual items must be independent. Coefficient of Determination One very unconvenient and useful way of interpreting the value of coefficient of correlation between two variables is to use the square of coefficient of correlation, which is called coefficient of determination. Properties of Coefficient of Correlation The following are the important properties of the correlation coefficient, r: 1) The coefficient of correlation lies between -1 and +1 symbolically 1≤ r ≤ + 1 or |r|≤1 2) The coefficient of correlation is independent of change of scale and origin of the variables x and y. 3) The coefficient of correlation is the geometric mean of 2 regression coefficients. 4) Symbolically r = RANIC correlation Coefficient This method of finding our convariability or the lack of it between two variables was developed by the British Psychologist Charles Edward Spearman in 1904. This measure is especially useful when quantitative measure for certain factors cannot be fixed (like evaluation of leadership ability or the judgment of female beauty). But the individuals in the group can be arranged in order thereby obtaining for each individual a number indicating his rank in the group. In any event, the rank correlation coefficient is applied to a set of ordinal rank numbers with 1 for individual ranked first in quantity or quality and so on, to n for the individual ranked last in a group of n individuals. [Type text] Page 13 School of Distance Education Spearman’s rank correlation is defined as: Where R = 1 - 6 ∑ D2 N2 –N R = Rank coefficient of correlation D = difference of ranks between paired item in 2 series. In rank correlation we may have two types of problems: a) Where actual ranks are given b) Where ranks are not given 4) Where actual ranks are given: steps required for computation (i) Take the differences of the two ranks, i.e, (R1 – R2) and denote these differences by D. (ii) Square these differences and obtain the total ∑ D2 (iii) Apply the formula R = 1 - 6 ∑ D2 N3 –N Problem 1. Two judges in a beauty competition rank the 12 entries as follows: X 1 2 3 4 5 6 7 8 9 10 11 12 Y 12 11 10 9 8 7 6 5 4 3 2 1 What degree of agreement is there between the judgment of the two judges? Calculation of Rank Correlation Coefficient X Y (R1-R2) R1 R2 D 1 12 -11 2 9 -7 3 6 -3 4 10 -6 5 3 +2 6 5 +1 7 4 +3 8 7 +1 9 8 +1 10 2 +8 11 11 0 12 1 +11 [Type text] D2 121 49 9 36 4 1 9 1 1 64 0 121 2 ∑ D = 416 Page 14 School of Distance Education R = 1 - 6 ∑ D2 N3 –N ∑ D2 = 416, N = 12 R = 1 - 6 x 416 123 - 12 = 1 – 2496 1716 = 1 – 1.454 = – 0.454 (b) Where ranks are not given : Steps for computation When we are given the actual data and not the ranks, it will be necessary to assign the ranks. Ranks can be assigned by taking either the highest value as 1 or the lowest value as 1 and we must follow the same method in case of both the variables. Problems 1. Calculate spearman’s coefficient of rank correlation for the following X 56 98 95 81 75 61 59 55 Y 47 25 32 37 30 40 39 45 Calculation of Rank Correlation Coefficient X R1 Y R2 53 98 95 81 75 61 59 55 1 8 7 6 5 4 3 2 47 25 32 37 30 40 39 45 8 1 3 4 2 6 5 7 R=1 - [Type text] (R1 –R2)2 D2 49 49 16 4 9 4 4 25 2 ∑ D = 160 6∑ D2 N3 – N Page 15 School of Distance Education ∑ D2 = 160, N = 8 R = 1 – 6 x 160 83 – 8 = 1 – 960 504 = 1 – 1.905 = 0.905 LINEAR CORRELATION Relation between two variables that shows up on a scatter diagram as the dots roughly following a straight line; a correlation of r unequal to 0 Multiple Correlation Correlation of a criterion variable with two or more predictor variables. Partial Correlation Coefficient Measure of the degree of correlation between two variables over and above the influence of one or more other variables. Linear and Nonlinear (Corvilliner) Correlation If the amount of change in one variable tends to bear a constant ratio to the amount of change in another variable, then the correlatin is said to be linear. Correlation would be non – linear if the amount of change in one variable does not bear a constant ratio to the amount of change in other variable. Simple Correlation The correlation of criterion with only two variables. [Type text] Page 16 School of Distance Education MODULE – 2 NON PARAMETRIC TESTS The word parameter is a quaintly (such as the mean or standard deviation) that characterizes a statistical population and that can be estimated by7 calculations from sample data. Statisticians have developed several types of tests of hypothesis for the purpose of testing hypothesis. The test of hypothesis is also known as the test of significance. The test of hypothesis can be classified into two categories: they are: i) Parametric tests or standard of hypothesis. ii) Non parametric tests or distribution free test of hypothesis. Parametric test are usually based on certain properties of the parent population from which we draw samples. Their assumptions are: 1. They are from a normal population 2. Sample size is large 3. Assumption about the population parameter like mean, variance etc must hold good. But there are situation when an investigator does not want to take such as assumptions. In such situations statistical methods for test hypothesis are used and they are non-parametric tests. These tests are not based on the assumptions about the parameters of the parent population. Most of the non parametric tests are based on the measurement equivalent to at least an interval scale. The parametric methods make more assumptions than non parametric methods. Some of the important non parametric or distribution free tests are: 1) Test of hypothesis, concerning some location parameter for the given data (one sample sign test) 2) Test of hypothesis concerning difference among 2 or more sets of data such as 2 sample sign test, rank – sum test, signed rank test. 3) Test of hypothesis of the relationship between variables, such as rank correlation, Kengalle’s coefficient of concordance and other tests for dependence. 4) Test of hypothesis concerning the variance in the given data. That is tests analysis to ANOVA namely Kruskal- wallis. 5) Test of randomneu of a sample based on the theory of runs (namely one sample run test). 6) Test of hypothesis to determinate whether catagonical data shows dependency or not. The chi-square test can be used as well to make comparison between theoretical population and actual data when categories are used. In statistical tests two kinds of assertions are involved they are: [Type text] Page 17 School of Distance Education 1) An aueration directly related to the purpose of investigation and to 2) An auertion to make a probability statement. The former is an auertion to be tested and is technically called a hypothesis where as the set of all other auertions is called the model. When we applied a test without a model it is known as distribution free test or non parametric test do not make an assumption about the parameters of the populations and thus do not make use of the parameters of the distribution. In other words under non parametric or distribution free test we do not assume that a particular distribution is applicable or that a certain value is attached to a parameter of a population. For instance while testing the two training methods say A and B for determining the superiority of one over the other, if we do not assume that the scores of the training are normally distributed or that the mean scores of all trainee taking method ‘A’ would be certain value then the testing method is known as “ a distribution free or non parametric test” In fact there is a growing use of such tests in situations when the normality assumptions is open to doubt. As a result many distribution on free test have been developed that do not depend on the shape of the distribution with parameters of underlined population. ADVANTAGES OF NON PARAMETRIC TEST Non parametric tests are distribution free i.e, they do not require any assumptions to be made on the population that is assumptions like populations, skived distributions and so on. Generally they are simple to understand, easy to apply even when sample sizes are small. Most of the non parametric tests do not require lengthy calculations and hence they are less time consuming. 1) Non parametric test are applicable to all types of data qualitative or quantitative. 2) Many non parametric tests make it possible to work with very small samples that is particularly helpful to the researches to conduct pilot shades for a medical researchers working with rare diseases. 3) Non parametric methods have less stringent assumptions than the classical procedures. DISADVANTAGES OF NON PARAMETRIC TESTS 1) If all the assumptions of parametric tests are infract met in the data, if the measurement is of all the required strength. Non parametric tests are based on the full data. [Type text] Page 18 School of Distance Education 2) There are no non parametric methods for testing the interactions of ANOVA. 3) Tables of critical values may not be easily available. 4) Non parametric techniques sometimes lack sensitivity or power because it produces confidence intervals that are too wide. USES OF NON PARAMETRIC TEST 1) When a quick or preliminary data analysis is used. 2) When the assumptions of a distribution or a parametric produce is not satisfied or either of there is unknown. 3) When the data are only roughly scales (quantities data- nominal or ordinal) 4) Basic questions of interest is distribution free or non parametric in nature. CHI – SQUARE TEST The Chi- Square distribution was first discovered by Helmert in 1875 and then rediscovered independently by Karl Pearson in 1900, who applied it as a test of ‘goodness of fit’ Any fitting problem mainly viewed as a problem of finding theoretical or expected frequencies given a set of observed frequencies 01, 02………….. ok denote a set of observed frequencies and let E1, E2… EK be the corresponding set of expected frequencies. We know that the corresponding set og expected frequencies. We know that ∑k Oi = ∑k Ei = N i=1 i=1 The statistic of Chi-square feet of goodness of fit is defined as X2= ∑k [Oi – Ei]2 ~X2 i=1 E1 (k-r-1) Where K is the number of cells, r is the number of parameters estimated. For example in the position case we estimate only one parameter namely (lamda), the mean of distribution. In this case r =1 and we have, X2-X =X2 (k-1-1) (k-2) If no parameter is estimated then chi- square (X2) follows i.e X2 ~ X2 (K-1) From the corresponding to level of significance α and the degrees of freedom (k-r-l). if the calculates value of X2 is less than X2 α. . we accept the null hypothesis Ho that the fit is good. This means that there is not much difference b/w the set of observed and expected frequencies and [Type text] Page 19 School of Distance Education the difference found is not significant. There are some conditions to be satisfied before X2 feet of goodness of fit is applied they are: 1) Total frequency N ≥ 50 2) All the expected frequencies should be ≥ 5. If some of them are leu than 5 the corresponding cells are merged to their adjacent cells to make the expected frequency greater than or equal to 5. Now K is the number of cells remaining. The feel statistics X2. X2= ∑k [ Oi – Ei ]2 ~ X2 i=1 E1 (k – r – 1 ) = ∑k Oi – N ~ X2 i=1 E1 (k–r–1) X2 test of independence of attributes Consider the attributes AB. We want to test the hypothesis that A and B are independent versus the alternative that they are not independent. Suppose that the attributes A is divided in to ‘r’ clauses, B into ‘S’ clauses, we then say that are have an r x s contingency in each cell. The table is represented as follows. A /B B1 B2 …………….. Bs TOTAL A1 F11 F12 …………….. F1s F1 A2 F21 F22 …………….. F2s F2 A3 F31 F32 …………….. F3s F3 . . . . . . . . . . . . . . . . . . . . Ar Total . . Ar1 F.1 . . Fr2 F.2 . . . . . . Frs F.S . . Fr N/(F) We also have, F1+ F2+.F3……………… Fr = F.1+ F.2+.F.3……………… F.s = N/ F To find the expected frequencies : the expected is F(11) can be evaluated as follows: P (P11) = P (A1 B1) o r P (A1 ^ B1) or P (A1) x P (B) Since under the null hypothesis, we assume that the attributes A and B are independent: [Type text] Page 20 School of Distance Education = F1 x F1 N N . : E (F11) = N x F1. x F.1 N = F1 x F1 N Similarly E(F12), E(Fr() etc and so on. Similarly an event Ai Bj.i = 1,2…………r and J = 1,2. Is an event of finding an individual having attributes Ai and Bj at the same time, we now calculate the X2 statics . The calculated value greater than the table value corresponding to (r-1) x S-1), we reject the null hypothesis that the attributes are independent 2 x 2 contingency table The observations are given in the form of 2 x 2, contingency table can be represented as A/b in the X2 statistics as: C/d (ad-bc) N ~ X2 (1) (a+b) (c+d) (a+c) (b+d) The table value of X2 (1) for 5% level of significance is equal to 3.841 and for 1% is equal to 6.635. X2 test of homogeneity of proportions is concerned with the following questions: - Are the samples coming from homogenous population (homogenous with some certain classification) The null hypothesis states that all the population are identical with the alternative that they are not should test this hypothesis, we assume that each of there ‘K’ populations is again sub divided into categories. The expected frequencies corresponding to the altered frequencies can be calculated as follows: Eij = ith row total x jth column total N If the calculated value of X2 is leu than table value corresponding to (k-1) S-1) d.f, accept the null hypothesis that the K populations are homogenous or otherwise reject Ho YATES CORRECTION When self frequencies are small and X2 is just on significance level the correction suggested by Yates is popularity known as the Yates correction. It involves the reduction of deviation of observed from expected frequencies which of course reduces the value of X2 . The [Type text] Page 21 School of Distance Education rule of correction is to adjust the observed frequency in each cello 2 x 2 contingency table, in such a way as to reduce the deviation from the observed frequency for that cell by 0.6, but this adjustment is made in all the cell without disturbing the marginal total. The formula for finding the value of X2 after applying Yates correction can be stated as X2 = N (1ad – bc1 – 0.5)2 (corrected (A+B) (C+D) (A+C) (B+D) In case we use the usual formula for calculating the value of X2 = ∑(Oij – Eij)2 the yater correction can be applied as under Eij X2 (corrected ) = [ |O1 – E1| - 0.5 ]2 + [|O2 – E2| - 0.5 ]2 E1 E2 It may be emphasized as Yates correction is made only in case of 2 x 2 table and that two when self frequencies are small CONVERSION OF X2 INTO COEFFICIENT OF CONTINGENCY X2 value may also be converted into coefficient of contingency especially in case of the contingency table of higher order than 2 x 2 table to study the magnitude of the relatin or the degree of association between two attributes as shown below. C= √ While finding out the value of c we proceed on the assumption of null hypothesis that the 2 attributes are independent and exhibit no association. Coefficient of contingency is also known as coefficient means square contingency. This measures also comes under the category of non parametric measure of relationship. PROBLEM 1. A die is thrown 600 times the frequencies of the face numbers are as follows X O1 E1 1 92 100 2 87 100 3 90 100 4 110 100 5 113 100 6 108 100 Test whether the die is unbiased Ho ; The die is unbiased [Type text] Page 22 School of Distance Education H1 ; The die is biased ∑D12 – N – X2 (k – r – 1) E1 606666 – 600 100 = = = = 606.666 – 600 6.66 6.66 X2 (6 – r -1) 6.66 X2 (5) 6.66 X2 (5) Where the table value is 11.070 The calculated value is less than the table value. Hence we accept HD and die is unbiased. 2. The following table relates to marital status and performance in an examination. Test whether performance depends on the marital status. Performance Marital status Married Unmarried Total Good Bad Total 60 20 80 80 40 120 140 120 Ho; The performance marital status are independent. Hi; The performance and marital status are not independent X2 = (ad-bc)2 N ~ X2 (1) (a+b) (c+d) (a+c) (b+d) = (60 x 40 – 80 x 20 )2 200 (60 + 80) (20 +40) (60 + 20) (80 +40) = 8002 x 200 80640000 = 1.5873, ~ X21 = 3.841 Since the calculated value is less than table value we accept Ho. The performance and marital status are independent. [Type text] Page 23 School of Distance Education Important Characteristics of X2 Test 1. This test is based on frequencies and not on the parameters like mean and SD. 2. The test is used for testing the hypothesis and is not useful for estimation. This test possess the adihve property that has already been explained. 3. This test is an important non parametric test as no rigid assumptions are necessary in regard to the type of population, no need of parameter values and relatively less mathematical details are involved. Median Test The non parametric test may be used to test null hypothesis that 2 independent samples have been brought from population that equal medians is a median test. Here the null hypothesis is Ho the 2 population medians are equal against the alternative that they are not equal. Assumptions a) Samples are selected independently and at random from their respective populations. b) The population are at same form differing only in occation. c) The variable of interest is continous d) The level of measurement is atteast ordinal. e) Teo samples need not have equal size f) The test statistics follows a thi square distribution with degree of freedom approximately Calculation of test statistic 1) Compute the common median of 2 samples combined. 2) Now determine for each group the number of observation falling above and below common median. The resulted frequencies are arranged in a 2 x 2 table. PROBLEMS 1. Member of a random sample of 12 male students from a rural junior high school and an independent random samples of 16 male students from an urban Junior high school were given a test to measure the level of mental health. Test whether there is a different in their average source. [Type text] Page 24 School of Distance Education Level of mental health Urban Rural 35 26 27 21 27 38 23 25 25 27 45 46 33 26 45 41 29 50 43 22 42 47 42 32 50 37 34 31 Observation 21 22 23 25 25 26 26 27 27 29 31 32 33 34 35 37 38 In ascending order 41 42 43 45 46 47 50 50 33+34 = 33.5 Ho ; the two medians (avgs) are equal H1; the two medians (avgs) are not equal Rural No of observation above 8 median No of observation below 4 median Total 12 Urban 6 Total 14 10 14 16 28 Since the calculated value is less than table value we accept Ho; hence 2 averages are equal. Mann- Whitney U test is considered more powerful than median test for comparing data of two unrelated samples. Where median test helps only in comparing central tendencies of the populations from which there samples have been deawn, the U test is capable of testing the differences between population distributions in so many aspects other than the central [Type text] Page 25 School of Distance Education tendencies. The null hypothesis is to be tested in this is that these lies no difference between the distribution of two samples. The producer requires first to determine the value of U by counting how many scores from sample A produce each score of sample B and then comparing it with the critical value of U lead from a given table for a required level of significance for the given NL and NS. The procedure for computing U slightly differs with moderately large samples (having N1or N2 between 9 and 20). For large samples, we first convert U into Z function and then use the Z value for accepting or rejecting Ho. PROBLEMS Two plastics each produced by a different process where tested for the ultimate strength. The measurements in the accompanying table represents breaking loads in units of 1000 pounds/inch sq. do the data present evidence of the difference between the locations of the distribution of ultimate strength for the 2 plant u. test by using Mann whitney U tets with the level of significance 2=.1 Plastic 1 Plastic 2 15.3 21.2 18.7 22.4 22.3 18.3 17.6 19.3 19.1 17.1 14.8 27.7 Ho : The locations of the distributions of the strength of two plastics are the same. H1 : They are different Arranging observation in ascending order 14.8 15.3 17.1 17.6 18.3 18.7 19.1 19.3 21.2 22.3 22.4 27.7 [Type text] Page 26 School of Distance Education u = 2+3+5+5+6+6+= 27 uo = 29 = (n1n2-uo) = (6x6-29) =7 We accept Ho. The locations of the distributions of the strength of 2 plastics are same. SIGH TEST The sign test may be used for testing the significance of differences between two correlated samples in which the data is available either in ordinal measurement or simply expressed in terms of positive and negative signs, showing the directions of differences existing between the observed scores of matched pairs . The null hypothesis test here is that the median change is (o.i.e.) there are equal number of positive and negative sign. If the number of matched pairs are equal to or less than 10, a test of significance is applied using the binominal probabilities distribution table. If the number is more than 10, then distribution is assumed as normal and the Z values ae used for rejecting or accepting Ho. PROBLEM Pair No 1 2 3 4 5 6 7 8 9 10 11 12 X 1.5 2 3.5 3 3.5 2.5 2 1.5 1.5 2 3 2 Y 2 2 4 2.5 4 3 3.5 3 2.5 2.5 2.5 2.5 Difference 0 + + - K=2 K~ B (11, ½) Ho : The two samples are same H1 : They are different 0.5/2 = 0.025 compared to calculated value 0.0327 since 0.037>0.025 we accept Ho [Type text] Page 27 School of Distance Education FORMULA FOR LARGE SAMPLE SIGN TEST If the number of observations are large in the case of sign test the test statistics Z=K-n/2 Sign test for pared data when the data to be analyzed consist of observations in matched pairs and the assumptions underlined the test is not met or the measurement scale is not adequate. Then the sign test may be employed to test the null hypothesis that the median different is o. WILCOXON MATCHED- PAIRS SIGNED RANKS TEST This test is considered more powerful than the sign test as it takes into consideration the magnitude along with the direction of the differences existing between matched pairs. Here we computer the statistics T and compare it with the critical value of T read from a given table for a particular significance level and drawing inferences about rejecting or accepting Ho. In case the number of matched pairs are more than 25, then we first convert the computed T into the Z function and then as usual issue this Z for testing the significance. PROBLEM Charles Darwin conducted an experiment to determine if self and cross fertilize the plant has different rates. Pairs of pea plant one self and others cross fertilized were planted in plots and their heights were measured after a specified period of time. The data Darwin obtained were Cross plant Self Di Rank 1 188 139 49 11 2 96 163 -67 14 3 168 160 8 2 4 176 160 16 4 5 153 147 6 1 6 172 149 23 5 7 177 149 28 7 8 163 122 41 9 9 146 132 14 3 10 173 144 29 8 11 186 130 56 12 12 168 144 24 6 13 177 102 75 15 14 184 124 60 13 15 96 144 -48 10 Ho : Gross plants and self fertilize has same growth rate H1 : They have different growth rate ++ = 11+14+2+4+1+5+7+9+3+8+12+6+15+13 [Type text] Page 28 School of Distance Education =96 N=11 2=2.0447 CV>TV we reject Ho LARGE SAMPLE APPROXIMATION TO MANN WHITTENTY U TEST When either we n1 or n2 is > 20 cannot use man whitteny U table for the critical value. In this case we may compute a large sample test statistic Z=U- n1 n2 [Type text] Page 29