Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.
Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.
import pandas as pd
import numpy as np
from scipy import stats
import math
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)
data.head()
# Black Rate of Callback
n1 = len(data[data.race=='b']) # number of black-sounding names
sum_b = sum(data[data.race=='b'].call) # number of callbacks for black-sounding names
rate_b = sum_b / n1
# White Rate of Callback
n2 = len(data[data.race=='w']) # number of white-sounding names
sum_w = sum(data[data.race=='w'].call) # number of callbacks for white-sounding names
rate_w = sum_w / n2
rate_b, rate_w
print "The black rate of callback is", rate_b
print "The white rate of callback is", rate_w
Requirements for CLT:
1) The samples must be independent: Black and White sounding names were randomly assigned to similar resumes so they represent a random sample and are independent. This requirement is met.
2) The sample size must be 'big enough':
The sample size must not be bigger than 10% of the entire population. With around 2400 data points each representing the black and white race in America, we can assume this would be less than 10% of the millions of each race in America. This requirement is met.
The sample size should be large enough so that:
rate_b * p (# of black sounding individuals in US) >= 10
rate_w * q (# of white sounding individuals in US) >= 10
Assuming there is a mimimum of a million of each race (VERY conservative estimate):
rate_b *1000000, rate_w * 1000000
The appropriate test for this problem will be a Difference in Proportion Confidence Interval and Two-Sided Z-Test. We using the z-statistic becuase there is a high sample size.
Null Hypothesis (Ho): rate_w - rate_b = 0
Alternative Hypothesis (Ha): rate_w = rate_b != 0
Significance Level = .05
p = rate_w - rate_b
std = math.sqrt((rate_w * (1-rate_w) / n2) + (rate_b * (1-rate_b)/n1))
z_score = 1.96
# with alpha = .05
# 95% chance that rate_w - rate_b is within 1.96 standard deviations from
# our sample proportion according to the z-table.
ME = z_score * std
lower = p - ME
upper = p + ME
print 'Margin of Error: ', ME
print 'Confidence Interval', [lower, upper]
#### Assuming the null hypothesis is true, how many times do we see that we see that the difference betweent the callback rates is zero (rate_w - rate_b = 0)?
#### If it occurs a lot then any difference we see between the white and black callback rate is probably due to chance. If it does not, then the effect is sigificant.
Converting proportion differences into standard deviations so we can use a score and proability: In other words what is the probability or likelihood of getting a rate_w - rate_b = 0 in standard deviations from the population difference? If it is less than 5% then significant.
# New calculation for std becuase we are assuming null hypothesis is true
# Therefore must calculate population std such that p1 = p2 = p_hat
# p_hat is proportion of total callbacks for whole sample
# calculating proportion of callbacks disregarding race
p = rate_w = rate_b
p_hat = (sum_b + sum_w) / (n1 + n2)
std = math.sqrt( (2*p_hat*1-p_hat) / n1 ) #n1 = n2 so divide by either
# Calculate z-score
# How many standard devations away from the mean is our sample statistics
z_score = (p - 0) / std
z_score
The probability of getting a z-score of 11 is very small, even while assuming the null hypothesis is true.
Therefore, with a z-score as high as 11, the p-value = 0
The is a 95% chance that the true difference of white-sounding call back rates and black-sounding call back rates is between .016 and .04.
This means we are 95% confident that there exists a difference between the two races in which individuals with white-sounding names are favored and called back more often.
p-value = 0
The probability of getting a z-score of 11 is very small, even while assuming the null hypothesis is true. This means that the effect that we see (a difference in the proportion of callbacks between white and balck sounding names) is significant.
No, the analysis only looked at the relationship between race and callbacks. There are a number of other variables included in the data that may have equal or more impact on the callback rate.
We must test the signiicance of these variables to callback rate to determine which is the best to determine callback success.
The next step in this analysis would be to add on the additional data in the table and test for significance and coorelation.