A lot of online statistics courses make use of generated data that follows a normal distribution. This is great for a basic understanding of the normal distribution, but it doesn’t help with real world data.
Even if the histogram for some data appears to be normally distributed, we really don’t have evidence it’s normally distributed. To get a clear idea of normal distribution there’s a few tools we can use in R.
QQNorm & QQLine
In R there’s a command called qqnorm that plots to try and show the potential of the data being Normally Distributed. If normally distributed the data will form along a line.
We can further try and plot the data by running qqline, which plots a line through the normal distribution. If the points are along the line, then we have a stronger indication that this is Normally Distributed.
Shapiro Wilks Test
When you have continuous data, we can run the Shapiro-Wilks statistical test to determine if the data is Normally Distributed. The Shapiro test will return a P-Value. If that P-Value is greater than 0.05, than the data qualifies as being normally distributed.
The equation is below:
In R we can execute the shapiro test using the shapiro.test method:
shapiro.test(data) Shapiro-Wilk normality test data: data W = 0.89019, p-value = 0.1185
The Shapiro Test (from base R) is only good for data sets under 5,000 observations.