R: Multi-Scatter Plots

When analyzing results, it can help to compare graphs side by side. R has a few options to achieve this within the base R package.

Parameters

Using parameters (par method) we can create a grid of graphs like so:

par(mfrow = c(2,2))

The command above sets the environment to render the next graphs in a grid. In this case the grid is 2 rows and two graphs per row. This is defined in the combine of 2,2. The parameter used is the multi-frame row, or mfrow.

After setting the environment, every plot rendered will fit into this framework. To reset, we need to change the environment back:

par(mfrow= c(1,1)

Pairing

Pairs is a different approach. In this case R will create a grid where the variables plotted are compared within the grid. This can be done with multiple variables (2, 4, 6 or more.)

First I did a subset of the suicide data, filtering for data pertaining to Albania:

albania <- master[master$country == 'Albania', ]

At this point, we can construct scatter plots comparing different variables for Albania:

pairs(~ albania$suicide_no+albania$year+albania$population+albania`gdp_per_capita ($)`)

Notice the format. First we use the tilde symbol, followed by the first variable. The subsequent variables are added with the + sign. In the case above we’re looking at a dataset of suicides in Albania. Variables are the amount of suicides in Albania, the years of the dataset, the Albanian population and the GDP Per Capita.

A resulting plot of all the variables as a set of scatter plots is produced:

R Scatter Plots of 4 Different Variables

The way this is read, is that each labeled box (such as the first one) is a variable on an axis, plotted against another variable on another axis.

For example, the first box is albania$suicide_no (number of suicides.) To it’s right is a scatter plot where the Y axis is the number of suicides, plotted against albania$year on the X axis.

The 2nd box in the first column plots the number of suicides along the X axis and years along the Y. See how that works? Each labeled box represents the variable in question for the graphs above/below or to the left/right.

So the bottom right corner has a label of GDP Per Capita. If we look above it, the GDP per Capita is plotted along the X Axis, while the Y axis is the population value.

Benefit of Pairs

Quite quickly we can see in the pairs output that only two graphs show visual correlation. These two are the same really: GDP per Capita and Year. The Albanian suicide count isn’t showing any direct correlation in this dataset to the other variables.

With regards to GDP per Capita and Year, there’s a correlation pulling up and to the right and this is logical… as usually a country’s GDP would be expected to change over time.

More Pairs

Subsetting the suicide data to other nations (namely Russia and the USA), may help look for trends or correlation in larger economies and populations.

Russia Suicide Data

The data from Russia is unclear on any correlation regarding the amount of suicides. There is some odd indication of a leap in suicides with population. This isn’t a good fit of data, it’s spread all over the place, but there is a spike in the suicide rate as population was larger. The rate of suicides also decreased over time (less suicides from 2010 onwards.)

USA Suicide Data

In the USA data above it appears to be a stronger correlation to population growth and the amount of suicide. Which, I think is somewhat expected that all things being equal, as a population grows so too would the amount of suicide. Yet the other countries sampled don’t show that correlation as strongly.

We also can see that the suicide rate (although not tightly fitting a linear regression) is increasing over time.

Of course the strongest correlation in this set of graphs is between the GDP per capita and time. Totally unrelated to suicide, but it is a very tight correlation that is quickly picked up on in the visuals above.

Finally, as a last example from this dataset, I took the country of Mexico and ran the same correlation matrix. Using pairs, as above, I got the following scatter plots:

Mexico suicide data

Suicides in the above example seem to correlate more than with the other tests. There’s an upward trend of suicides increasing along a time line. There’s an increase of suicide trending as population growth increases. Although not as strong of a correlation, there’s an upward pull of increased suicides as the GDP per Capita is increasing.

Leave a Reply

Your email address will not be published. Required fields are marked *