Let’s say we have a dataset that is the height of people in inches. Someone asks us “what’s the cut off where 2.2% of the tallest men would be at?” In other words, we have a population that ranges from 59.85″ to 78.53″, where does the top 2.2% come in?
%MINIFYHTMLa2d1e99d15db8819e0bc890474ef129625%Code language: PHP (php)
men <- rnorm(1000, 69.1, 2.9) women <- rnorm(1000, 63.7, 2.7) quantile(men, .978) # 2.2% quantile(women, .978) # 2.2% hist(men, col=rgb(1,0,0,0.5), main = "Height of Men & Women (in.)", xlab = "Height (in)") hist(women, col=rgb(0,0,1,0.5), xlab = "Height (in)", add=T)
By running the quantile method and passing in the dataset, and the percentile we are looking for. In the above example, 2.2% is the .978 percentile.
I’m using some random data in this example… so each time it’s run we’re creating the dataset with slightly different values. But in this case I get a quantile result of:
That tells us that the top 2.2% of the tallest men have a height of 74.95″.
hist(men, col=rgb(1,0,0,0.5), main = "Height of Men & Women (in.)", xlab = "Height (in)") hist(women, col=rgb(0,0,1,0.5), xlab = "Height (in)", add=T)
Using two histograms with the hist() function, I’ve combined them into one graph using the “add=T” parameter. I’ve colored the dataset for men as blue and 50% transparency (rgb(1,0,0,0.5))
The dataset for women is set to red, with a 50% transparency of rgb=0,0,1,0.5.
We can add a bit more… such as a vertical line for the 2.2%. Using abline(v=74.95, col=”red”) we can insert it into the graph.