Correlation in R

Correlation in R

Generally, in statistics, correlation refers to the statistical measure which indicates the magnitude on which 2 variables or more fluctuate together.

by

What is Correlation ?

Generally, in statistics, correlation refers to the statistical measure which indicates the magnitude on which 2 variables or more fluctuate together. It measures the relationship’s strength and direction. There are two types of it. The positive and negative correlation. The positive correlation is an indication of the extent with which a variable either increases or decreases in parallel. The negative correlations, however, indicate the magnitude to which a variable could increase while another one could decrease.

When you solve for the correlation of the variables, the result of it will determine whether there is a relationship or not within the two variables. It may also determine if the relationship is positive or negative.

How Correlation Works ?

The main result of the coefficient is known as the correlation coefficient or the “r”. It ranges from -1.0 up to +1.0. If the correlation coefficient is more than zero, it only means a positive relationship. If the value is less than zero, then the relationship is a negative one. The closer the r is to -1 or 1, it is more likely that the two variables being compared is related. If the correlation coefficient is closer to 0 or is 0, then it means that there is no linear relationship between the two variables.

If the r signifies positive relation, then it means that if the one variable gets larger or smaller, the other one would get larger or smaller, too. However, if the r is negative, if one variable gets larger, the other one will get smaller. It is inversely proportional. It is also known as the inverse correlation.

If you are going to interpret the correlation coefficient, then it is much more efficient if you square it. It would be easier to understand. The coefficient’s square is equal to the percent of the variation of one variable which has a relationship to the other’s variation. Example; if you get an r of .5, square it and it becomes .25, ignore the decimal point and it will be interpreted as 25% of the variance is related.

Program for Correlation Plot in R

Import below libraries

library(ggplot2)
library(dplyr)
library(corrgram)
library(corrplot)
library(readr)

Import your CSV products details

df <- read_csv(“products-sale.csv”)
View(products_sale)

Apply the correlation to the data

num.cols <- sapply(df,is.numeric)
cor.data <- cor(df[,num.cols])
print(cor.data)
print(corrplot(cor.data, method = ‘circle’))

Diagram of output
corrplot(cor.data, order = “hclust”, addrect = 2)

Type two diagram
corrgram(cor.data,order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text.panel=panel.txt)

You may also like