How to Sample from the Pooled Data

5.4. How to Sample from the Pooled Data#

Self-Assessment

Terminology Review

Use the flashcards below to help you review the terminology introduced in this chapter. \(~~~~ ~~~~ ~~~~\)

Permutation Tests: Code and additional discussion for Section 5.4.2 are included here.

When we carried out the bootstrap test, we observed that almost every row had repeated values of the data. This is attributable to resampling with replacement, and the probability of at least one repeated value increases as the size of the data increases. There is also nothing preventing the sample groups from repeating in different iterations, although the probability of repeated sample groups decreases with the size of the data.

If our goal is to conduct a statistical test that directly breaks any dependence on the assignment of the data to the two groups, then it makes more sense to consider simply shuffling, or permuting, the data among the groups, with no repetitions allowed. When the data is small, we can simply try every permutation of the data across the two groups. This is a permutation test.

For the examples here, we will use the grades data from Section 5.4:

import numpy as np
grades1=np.array([74.1, 74.5])
grades2=np.array([79.4,79.0, 78.4,79.3])

pooled = np.hstack( (grades1, grades2) )

For instance, with six data points, the number of permutations is \(6! = 720\). If we want to carry out a NHST, we first compute the test difference for each permutation. Then, we determine the proportion of permutations that result in sample test statistics greater than our observed test difference. This proportion is the \(p\)-value under the permutation test.

We will use the itertools library to find every permutation of the data across the two groups. Let’s first see how to get every permutation of a list of sequential integers:

import itertools
for perm in itertools.permutations(range(3)):
  print(perm)

(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)

For three values, we expect \(3!=6\) permutations, which is what we observe.

Note

itertools.permutations() returns a special object that can be iterated over but that you can’t directly get individual values from. Thus, we will always use itertools.permutations() in a for ... in statement.

To map permutations into two groups, we first permute the entire pooled data and then partition it according to the size of the two groups. For instance, here are the first ten permutations of the pooled grades data:

for i, perm in enumerate( itertools.permutations(pooled) ):
  print(perm)
  if i >= 10:
    break

(74.1, 74.5, 79.4, 79.0, 78.4, 79.3)
(74.1, 74.5, 79.4, 79.0, 79.3, 78.4)
(74.1, 74.5, 79.4, 78.4, 79.0, 79.3)
(74.1, 74.5, 79.4, 78.4, 79.3, 79.0)
(74.1, 74.5, 79.4, 79.3, 79.0, 78.4)
(74.1, 74.5, 79.4, 79.3, 78.4, 79.0)
(74.1, 74.5, 79.0, 79.4, 78.4, 79.3)
(74.1, 74.5, 79.0, 79.4, 79.3, 78.4)
(74.1, 74.5, 79.0, 78.4, 79.4, 79.3)
(74.1, 74.5, 79.0, 78.4, 79.3, 79.4)
(74.1, 74.5, 79.0, 79.3, 79.4, 78.4)

Because itertools is creating the permutations in a systematic fashion, only the last 4 values will change over these 10 permutations.

To partition the data, we can simply split each returned tuple using indexing:

for i, perm in enumerate( itertools.permutations(pooled) ):
  print(perm[:len(grades1)], perm[len(grades1):])
  if i >= 10:
    break

(74.1, 74.5) (79.4, 79.0, 78.4, 79.3)
(74.1, 74.5) (79.4, 79.0, 79.3, 78.4)
(74.1, 74.5) (79.4, 78.4, 79.0, 79.3)
(74.1, 74.5) (79.4, 78.4, 79.3, 79.0)
(74.1, 74.5) (79.4, 79.3, 79.0, 78.4)
(74.1, 74.5) (79.4, 79.3, 78.4, 79.0)
(74.1, 74.5) (79.0, 79.4, 78.4, 79.3)
(74.1, 74.5) (79.0, 79.4, 79.3, 78.4)
(74.1, 74.5) (79.0, 78.4, 79.4, 79.3)
(74.1, 74.5) (79.0, 78.4, 79.3, 79.4)
(74.1, 74.5) (79.0, 79.3, 79.4, 78.4)

Finally, we can carry out a NHST by determining the proportion of permutations that result in a sample test statistic as large as the observed test difference:

num_perms = 0
count = 0

observed_diff = np.round(np.mean(grades2) - np.mean(grades1),3)

for perm in itertools.permutations(pooled):
  num_perms += 1
  # Split the permuted data across the two sample groups
  sample1 = np.array(perm[: len(grades1)] )
  sample2 = np.array(perm[len(grades1) : ] )
  # Compute the test statistic
  sample_diff = sample2.mean() - sample1.mean()
  
  # Count if at least as large as observed test difference
  if sample_diff >= observed_diff:
    count += 1
    
print(f'After {num_perms} permutations:')
print(f'Proportion of events with difference >= '
      + f'{observed_diff:.3f} is {count/num_perms:.3f}')

  

After 720 permutations:
Proportion of events with difference >= 4.725 is 0.067

The proportion of permutations with a test difference as large as that observed in the data is 6.67%, so the null hypothesis cannot be rejected at the \(\alpha=0.05\) level. Note that the calculated \(p\)-value is larger than for the bootstrap test; this is typical of permutation tests. They are more conservative in the sense that the null hypothesis is less likely to be rejected than if a bootstrap test were applied.

One problem with exact permutation tests is that the number of possible permutations grows exponentially with the length of the data. For 6 data points, there are only 720 permutations, but for 60 data points, there are approximately \(8.3 \times 10^{81}\) permutations, which is too large to enumerate. As an alternative, we can use a Monte Carlo permutation test. As previously mentioned, this is usually implemented by performing resampling without replacement, which we can do in Python using npr.choice() with the keyword parameter replace = False.

The simulation for a Monte Carlo permutation test is a hybrid between the simulation for bootstrapping and that for the exact permutation test. We draw the data using npr.choice(), but we get a full permutation of the data and then split it among the two sample groups. An example is shown below:

import numpy.random as npr

num_sims = 100_000
count = 0

observed_diff = np.round(np.mean(grades2) - np.mean(grades1),3)

for sim in range(num_sims):
  # Get a random permutation of the pooled data:
  perm = npr.choice(pooled, len(pooled), replace = False) 
  # Split the permuted data across the two sample groups
  sample1 = np.array(perm[: len(grades1)] )
  sample2 = np.array(perm[len(grades1) : ] )
  sample_diff = sample2.mean() - sample1.mean()
  
  # Count if at least as large as observed test difference
  if sample_diff >= observed_diff:
    count += 1
    
print(f'Relative frequency of events with difference >= '
      + f'{observed_diff:.3f} is {count/num_sims:.3f}')

  

Relative frequency of events with difference >= 4.725 is 0.067

The relative frequency of the Monte Carlo test is very similar to the exact test. In this case, we have a lot of repetitions of the same permutations because there are only 720 unique permutations, but we are drawing 100,000 permutations at random. The computed relative frequency is an estimate of the proportion of permutations that meet the specified criteria.