Descriptive Statistics using R

Consider the following data-set showing the customer usage of the treadmill. The requirement is to do Descriptive Analytics on this data to create a customer profile for each Treadmill.

Product Age Gender Education Marital Status Usage Fitness Income Miles
TM195 18 Male 14 Single 3 4 29562 112
TM195 19 Male 15 Single 2 3 31836 75
TM195 19 Female 14 Partnered 4 3 30699 66
TM195 19 Male 12 Single 3 3 32973 85
TM195 20 Male 13 Partnered 4 2 35247 47
TM195 20 Female 14 Partnered 3 3 32973 66
TM195 21 Female 14 Partnered 3 3 35247 75
TM195 21 Male 13 Single 3 3 32973 85
TM195 21 Male 15 Single 5 4 35247 141
TM195 21 Female 15 Partnered 2 3 37521 85
TM498 19 Male 14 Single 3 3 31836 64
TM498 20 Male 14 Single 2 3 32973 53
TM498 20 Female 14 Partnered 3 3 34110 106
TM498 20 Male 14 Single 3 3 38658 95
TM498 21 Female 14 Partnered 5 4 34110 212
TM498 21 Male 16 Partnered 2 2 34110 42
TM498 21 Male 12 Partnered 2 2 32973 53
TM498 23 Male 14 Partnered 3 3 36384 95
TM498 23 Male 14 Partnered 3 3 38658 85
TM498 23 Female 16 Single 3 3 45480 95
TM798 31 Male 16 Partnered 6 5 89641 260
TM798 33 Female 18 Partnered 4 5 95866 200
TM798 34 Male 16 Single 5 5 92131 150
TM798 35 Male 16 Partnered 4 5 92131 360
TM798 38 Male 18 Partnered 5 5 104581 150
TM798 40 Male 21 Single 6 5 83416 200
TM798 42 Male 18 Single 5 4 89641 200
TM798 45 Male 16 Single 5 5 90886 160
TM798 47 Male 18 Partnered 4 5 104581 120
TM798 48 Male 18 Partnered 4 5 95508 180

The following code assumes the data-set to be available in .csv file to be read into the R environment
#Open the File dialog to select the .csv file containing the data-set
myFile <- file.choose()

#Read the .csv content into R and store in myData variable
myData  <- read.csv(myFile, header = TRUE)
class(myData)



#Create variables with .csv file column header values
attach(myData)

#Show summary for the entire file
summary(myData)

 

#Histogram of Age
hist(Age, col = heat.colors(14), main = "Histogram of Age", xlab = "Age")

#BoxPlot of Age
boxplot(Age, horizontal = TRUE, col = "RED", main = "BoxPlot of Age")

 

#BoxPlot of Age as a factor of Gender
boxplot(Age~Gender, horizontal = TRUE, col = c("RED", "LIGHTBLUE"), main = "BoxPlot of Age by Gender")

 

#BoxPlot of Age as a factor of Product
boxplot(Age~Product, horizontal = TRUE, col = c("RED", "LIGHTBLUE", "GREEN"), main = "BoxPlot of Age by Product")



#Show summary grouped by Product
by(myData, INDICES = Product, FUN = summary)

 
 
 

Some inference of this summary is as follows:
  • Product TM798 is mostly used by Males
  • Frequency of usage of TM798 is more when compared with other Products
  • TM798 is mostly used by customers with higher income and education compared to other products
  • Customers who bought TM798 are satisfied with the fitness achieved with use of this product

REFERENCES
https://www.greatlearning.in/great-lakes-pgpba/

Comments