GAM 470

Chi Squared Test

 

The Chi Squared test is useful in comparing actual observations to expectations. Observations must be taken one at a time. For example if you were to test the fairness of a keno game you might record the outcome of each ball, keeping a tally from 1 to 80. So total observations must always be an integer. Also, each trial must be independent.

Roulette

For example suppose you had the following outcomes on your roulette wheel the last 1000 spins.

Outcome

Observations

1

20

2

25

3

23

4

38

5

27

6

30

7

30

8

22

9

28

10

21

11

21

12

28

13

15

14

29

15

27

16

30

17

19

18

29

19

31

20

26

21

26

22

21

23

21

24

30

25

36

26

24

27

28

28

18

29

26

30

24

31

27

32

23

33

31

34

31

35

23

36

27

0

34

0-0

31

Total

1000

 

Here is a graph of the results.

 

In roulette every number is equally likely so the expected number of observations for each number on the roulette wheel is 1000/38 = 26.31579. Next let’s define some terms:

 

ai = actual number of observations for the number i.

ei = expected number of observations for the number i.

 

The chi-squared statistic for any sample is

n

Σ (ai - ei)2 / ei

i=1
where n is the number of possible outcomes.
For example, for the number 1 there were 20 observations and 26.31579 expected. So the "measure of deviation" (to make up my own term) for the number 1 is (20-26.31579)2 / 26.31579 = 1.515789.
The chi-squared statistic is the sum of all measure of deviations as follows.

Outcome

Observations

Expected

Measure of Deviation

1

20

26.3158

1.5158

2

25

26.3158

0.0658

3

23

26.3158

0.4178

4

38

26.3158

5.1878

5

27

26.3158

0.0178

6

30

26.3158

0.5158

7

30

26.3158

0.5158

8

22

26.3158

0.7078

9

28

26.3158

0.1078

10

21

26.3158

1.0738

11

21

26.3158

1.0738

12

28

26.3158

0.1078

13

15

26.3158

4.8658

14

29

26.3158

0.2738

15

27

26.3158

0.0178

16

30

26.3158

0.5158

17

19

26.3158

2.0338

18

29

26.3158

0.2738

19

31

26.3158

0.8338

20

26

26.3158

0.0038

21

26

26.3158

0.0038

22

21

26.3158

1.0738

23

21

26.3158

1.0738

24

30

26.3158

0.5158

25

36

26.3158

3.5638

26

24

26.3158

0.2038

27

28

26.3158

0.1078

28

18

26.3158

2.6278

29

26

26.3158

0.0038

30

24

26.3158

0.2038

31

27

26.3158

0.0178

32

23

26.3158

0.4178

33

31

26.3158

0.8338

34

31

26.3158

0.8338

35

23

26.3158

0.4178

36

27

26.3158

0.0178

0

34

26.3158

2.2438

0-0

31

26.3158

0.8338

Total

1000

1000

35.1200

So the chi-squared statistic is 35.12.
Before you do anything with this you have to know the number of degrees of freedom. This is simply the number of possible outcomes of the test minus 1. In this case 38-1 = 37.
You can compare this statistic to chi-squared tables available in any respectable introductory statistics book. There are also tables available at these web site:
http://www.richland.edu/james/lecture/m170/tbl-chi.html
http://www.stat.ualberta.ca/~schmu/stat151/tables/chi-table.html
http://perdana.fsktm.um.edu.my/~tehyw/Principles%20of%20Biology%20-%20BIOL1-BIOL2%20%20Chi-Squared%20Table.htm
http://www.felixgrant.co.uk/resource/mathbits/chi-sqr-tbl.htm
There is also a chi-squared calculator here:
http://www.mirrorservice.org/sites/home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/goodness.htm
When I was in college we had to use statistics tables because there was no such thing as online calculators or Excel. However it is much easier today to do a chi-squared test in Excel. In any cell put =chidist(35.12,37)
That should produce a "p" value of 0.557421 = 55.74%.
What does this 0.5574321 mean? It means that if you did the test again on a guaranteed fair wheel the probability of getting a distribution more skewed than what you observed is 55.74%.
Any p value close to 50% means the degree of skewness is close to expectations, as is the case with the roulette sample. A value close to 1 means the results fall very close to expectations, with less than expected skewness. A value close to 0 means the results are heavily skewed and may be biased.
Dice
The possible outcomes do not need to be equally likely. Let’s test the total in the roll of two dice. The following table shows the actual results and the expected totals based on 1000 rolls.

Dice Total

Observations

Expected

2

28

27.77778

3

74

55.55556

4

83

83.33333

5

108

111.1111

6

150

138.8889

7

159

166.6667

8

122

138.8889

9

115

111.1111

10

78

83.33333

11

47

55.55556

12

36

27.77778

Total

1000

1000

The following graph shows the same results.
The next table shows the chi-squared statistic calculations.

Dice Total

Observations

Probability

Expected

Measure of Deviation

2

28

0.0278

27.7778

0.0018

3

74

0.0556

55.5556

6.1236

4

83

0.0833

83.3333

0.0013

5

108

0.1111

111.1111

0.0871

6

150

0.1389

138.8889

0.8889

7

159

0.1667

166.6667

0.3527

8

122

0.1389

138.8889

2.0537

9

115

0.1111

111.1111

0.1361

10

78

0.0833

83.3333

0.3413

11

47

0.0556

55.5556

1.3176

12

36

0.0278

27.7778

2.4338

Total

1000

1.0000

1000.0000

13.7378

 
There are a total of 11 different outcomes. So the number of degrees of freedom is 11-1=10. The p value for a chi-squared statistic of 13.7378 and 10 degrees of freedom is chidist(13.7378,10) = 0.1828926 = 18.29%. So the probability that a pair of fair dice would produce results more skewed than this over the same test is 18.29%.
Let’s assume your computer is broken and you have to do all the calculations by hand but you do have the following chi-squared statistics table.
For 10 degrees of freedom our chi-squared statistic falls between the values of 15.987 and 13.442 on the table. The value of 15.987 corresponds to a p value of 0.1 and the 13.442 value corresponds to a value of 0.2. So this table tells us the p value for a chi-squared statistic of 13.7378 with 10 degrees of freedom falls somewhere between 0.1 and 0.2. In other words the probability that a fair pair of dice would produce results this skewed or more is 10% to 20%. Eyeballing the results our value of 13.7378 falls much closer to the table value of 13.442, which corresponds to p=0.2, so the true p value is going to be closer to the 20% side.
 

Degrees of Freedom

p=

0.01

p=

0.025

p=

0.05

p=

0.1

p=

0.2

p=

0.3

p=

0.4

p=

0.5

p=

0.6

p=

0.7

p=

0.8

p=

0.9

1

6.635

5.024

3.841

2.706

1.642

1.074

0.708

0.455

0.275

0.148

0.064

0.016

2

9.210

7.378

5.991

4.605

3.219

2.408

1.833

1.386

1.022

0.713

0.446

0.211

3

11.345

9.348

7.815

6.251

4.642

3.665

2.946

2.366

1.869

1.424

1.005

0.584

4

13.277

11.143

9.488

7.779

5.989

4.878

4.045

3.357

2.753

2.195

1.649

1.064

5

15.086

12.832

11.070

9.236

7.289

6.064

5.132

4.351

3.656

3.000

2.343

1.610

6

16.812

14.449

12.592

10.645

8.558

7.231

6.211

5.348

4.570

3.828

3.070

2.204

7

18.475

16.013

14.067

12.017

9.803

8.383

7.283

6.346

5.493

4.671

3.822

2.833

8

20.090

17.535

15.507

13.362

11.030

9.524

8.351

7.344

6.423

5.527

4.594

3.490

9

21.666

19.023

16.919

14.684

12.242

10.656

9.414

8.343

7.357

6.393

5.380

4.168

10

23.209

20.483

18.307

15.987

13.442

11.781

10.473

9.342

8.295

7.267

6.179

4.865

11

24.725

21.920

19.675

17.275

14.631

12.899

11.530

10.341

9.237

8.148

6.989

5.578

12

26.217

23.337

21.026

18.549

15.812

14.011

12.584

11.340

10.182

9.034

7.807

6.304

13

27.688

24.736

22.362

19.812

16.985

15.119

13.636

12.340

11.129

9.926

8.634

7.041

14

29.141

26.119

23.685

21.064

18.151

16.222

14.685

13.339

12.078

10.821

9.467

7.790

15

30.578

27.488

24.996

22.307

19.311

17.322

15.733

14.339

13.030

11.721

10.307

8.547

16

32.000

28.845

26.296

23.542

20.465

18.418

16.780

15.338

13.983

12.624

11.152

9.312

17

33.409

30.191

27.587

24.769

21.615

19.511

17.824

16.338

14.937

13.531

12.002

10.085

18

34.805

31.526

28.869

25.989

22.760

20.601

18.868

17.338

15.893

14.440

12.857

10.865

19

36.191

32.852

30.144

27.204

23.900

21.689

19.910

18.338

16.850

15.352

13.716

11.651

20

37.566

34.170

31.410

28.412

25.038

22.775

20.951

19.337

17.809

16.266

14.578

12.443

21

38.932

35.479

32.671

29.615

26.171

23.858

21.992

20.337

18.768

17.182

15.445

13.240

22

40.289

36.781

33.924

30.813

27.301

24.939

23.031

21.337

19.729

18.101

16.314

14.041

23

41.638

38.076

35.172

32.007

28.429

26.018

24.069

22.337

20.690

19.021

17.187

14.848

24

42.980

39.364

36.415

33.196

29.553

27.096

25.106

23.337

21.652

19.943

18.062

15.659

25

44.314

40.646

37.652

34.382

30.675

28.172

26.143

24.337

22.616

20.867

18.940

16.473

26

45.642

41.923

38.885

35.563

31.795

29.246

27.179

25.336

23.579

21.792

19.820

17.292

27

46.963

43.195

40.113

36.741

32.912

30.319

28.214

26.336

24.544

22.719

20.703

18.114

28

48.278

44.461

41.337

37.916

34.027

31.391

29.249

27.336

25.509

23.647

21.588

18.939

29

49.588

45.722

42.557

39.087

35.139

32.461

30.283

28.336

26.475

24.577

22.475

19.768

30

50.892

46.979

43.773

40.256

36.250

33.530

31.316

29.336

27.442

25.508

23.364

20.599

31

52.191

48.232

44.985

41.422

37.359

34.598

32.349

30.336

28.409

26.440

24.255

21.434

32

53.486

49.480

46.194

42.585

38.466

35.665

33.381

31.336

29.376

27.373

25.148

22.271

33

54.775

50.725

47.400

43.745

39.572

36.731

34.413

32.336

30.344

28.307

26.042

23.110

34

56.061

51.966

48.602

44.903

40.676

37.795

35.444

33.336

31.313

29.242

26.938

23.952

35

57.342

53.203

49.802

46.059

41.778

38.859

36.475

34.336

32.282

30.178

27.836

24.797

36

58.619

54.437

50.998

47.212

42.879

39.922

37.505

35.336

33.252

31.115

28.735

25.643

37

59.893

55.668

52.192

48.363

43.978

40.984

38.535

36.336

34.222

32.053

29.635

26.492

38

61.162

56.895

53.384

49.513

45.076

42.045

39.564

37.335

35.192

32.992

30.537

27.343

39

62.428

58.120

54.572

50.660

46.173

43.105

40.593

38.335

36.163

33.932

31.441

28.196

40

63.691

59.342

55.758

51.805

47.269

44.165

41.622

39.335

37.134

34.872

32.345

29.051

41

64.950

60.561

56.942

52.949

48.363

45.224

42.651

40.335

38.105

35.813

33.251

29.907

42

66.206

61.777

58.124

54.090

49.456

46.282

43.679

41.335

39.077

36.755

34.157

30.765

43

67.459

62.990

59.304

55.230

50.548

47.339

44.706

42.335

40.050

37.698

35.065

31.625

44

68.710

64.201

60.481

56.369

51.639

48.396

45.734

43.335

41.022

38.641

35.974

32.487

45

69.957

65.410

61.656

57.505

52.729

49.452

46.761

44.335

41.995

39.585

36.884

33.350

46

71.201

66.616

62.830

58.641

53.818

50.507

47.787

45.335

42.968

40.529

37.795

34.215

47

72.443

67.821

64.001

59.774

54.906

51.562

48.814

46.335

43.942

41.474

38.708

35.081

48

73.683

69.023

65.171

60.907

55.993

52.616

49.840

47.335

44.915

42.420

39.621

35.949

49

74.919

70.222

66.339

62.038

57.079

53.670

50.866

48.335

45.889

43.366

40.534

36.818

50

76.154

71.420

67.505

63.167

58.164

54.723

51.892

49.335

46.864

44.313

41.449

37.689

Chi-test function. In excel you can get the "p" value directly with the function chitest(range of actual results, range of expected results). The following image shows a test on the dice test described previously.