Using Quantiles to Find Values

Let’s say we have a dataset that is the height of people in inches. Someone asks us “what’s the cut off where 2.2% of the tallest men would be at?” In other words, we have a population that ranges from 59.85″ to 78.53″, where does the top 2.2% come in?

men <- rnorm(1000, 69.1, 2.9) women <- rnorm(1000, 63.7, 2.7) quantile(men, .978) # 2.2% quantile(women, .978) # 2.2% hist(men, col=rgb(1,0,0,0.5), main = "Height of Men & Women (in.)", xlab = "Height (in)") hist(women, col=rgb(0,0,1,0.5), xlab = "Height (in)", add=T)

By running the quantile method and passing in the dataset, and the percentile we are looking for. In the above example, 2.2% is the .978 percentile.

I’m using some random data in this example… so each time it’s run we’re creating the dataset with slightly different values. But in this case I get a quantile result of:

74.95292

That tells us that the top 2.2% of the tallest men have a height of 74.95″.

Histogram

hist(men, col=rgb(1,0,0,0.5), main = "Height of Men & Women (in.)", xlab = "Height (in)") hist(women, col=rgb(0,0,1,0.5), xlab = "Height (in)", add=T)

Using two histograms with the hist() function, I’ve combined them into one graph using the “add=T” parameter. I’ve colored the dataset for men as blue and 50% transparency (rgb(1,0,0,0.5))

The dataset for women is set to red, with a 50% transparency of rgb=0,0,1,0.5.

We can add a bit more… such as a vertical line for the 2.2%. Using abline(v=74.95, col=”red”) we can insert it into the graph.

Leave a Reply

Your email address will not be published. Required fields are marked *