Saturday, June 25, 2016

R lab - Exploring data Coursera Solution provided here

Here is solution of R lab of r lab exploring data of Basic Statistics which is offer by the online coursera website https://www.coursera.org/learn/basic-statistics/home/welcome
Its solution is given below you can 100% result in R lab experiment

Basic Statistics

R lab - Exploring data

Checking the dimensions of your data
Q1. #Use the dim() function on mtcars
Answer:- dim(mtcars) 
Data Structure
Q.2 According to R, what type of variable is am?
Answer factor

 

Levels

Q.3 # Look at the levels of the variable am

Answer  levels(mtcars$am)

 

Recoding Variables

Q #Assign the value of mtcars to the new variable mtcars2
mtcars2 <- mtcars
Q #Assign the label "high" to mpgcategory where mpg is greater than or equal to 20
mtcars2$mpgcategory[mtcars2$mpg >= 20] <- "high"
Q #Assign the label "low" to mpgcategory where mpg is less than 20
mtcars2$mpgcategory[mtcars2$mpg < 20] <- "low"
Q #Assign mpgcategory as factor to mpgfactor
mtcars2$mpgfactor <- as.factor(mtcars2$mpgcategory)

Examining Frequencies

Q #How many of the cars have a manual transmission?
13

Cumulative Frequency

Q # What percentage of cars have 3 or 5 gears?
62.5

Making a Bar Graph

Q #Assign the frequency of the mtcars variable "am" to a variable called "height"
height <- table(mtcars$am)
Q #Create a barplot of "height"
barplot(height)

Labelling A Bar Graph

Q # vector of bar heights
height <- table(mtcars$am)
Q # Make a vector of the names of the bars called "barnames"
barnames <- c("automatic", "manual")
Q # Label the y axis "number of cars" and label the bars using barnames
barplot(height, ylab = "number of cars", names.arg = barnames)

Interpreting A Bar Graph

Q Based on the bar chart of transmission type that you made in the previous exercise, which type of transmission is most common? (remember, 0 = automatic, 1 = manual)

automatic

Histograms

Q # Make a histogram of the carb variable from the mtcars data set. Set the title to "Carburetors"
hist(mtcars$carb, main = "Carburetors")

Formatting Your Histogram

Q # arguments to change the y-axis scale to 0 - 20, label the x-axis and colour the bars red
hist(mtcars$carb, main = "Carburetors", ylim = c(0,20), xlab = "Number of Carburetors", col = "red")

Bar Graph vs. Histogram

Bar Graph vs. Histogram

50xp
Why did we make a bar graph of transmission (mtcars$am), but a histogram of carburetors (mtcars$carb)

Possible Answers

Because transmission is categorical, and carb is continuous

Distributions

50xp
Take a look at the distributions in these histograms. Which of the following is correct?

Possible Answers

Graph 1 is left skewed, graph 2 is normally distributed, graph 3 is right skewed.

Mean and Median

Q # Calculate the mean miles per gallon
mean(mtcars$mpg)
Q # Calculate the median miles per gallon
median(mtcars$mpg)

Mode

# Produce a sorted frequency table of `carb` from `mtcars`
sort(table(mtcars$carb), decreasing = TRUE)

Range

# Minimum value
x <- min(mtcars$mpg)
# Maximum value
y <- max(mtcars$mpg)
# Calculate the range of mpg using x and y
y – x

Quartiles

Q # What is the value of the second quartile?
17.7100
Q # What is the value of the first quartile?
16.8925

IQR and boxplot

Q # Make a boxplot of qsec
boxplot(mtcars$qsec)
Q # Calculate the interquartile range of qsec
IQR(mtcars$qsec)

IQR outliers

Q # What is the threshold value for an outlier below the first quartile?
13.88125
Q # What is the threshold value for an outlier above the third quartile?
21.91125

Standard Deviation

Q # Find the IQR of horsepower
IQR(mtcars$hp)
Q # Find the standard deviation of horsepower
sd(mtcars$hp)
Q # Find the IQR of miles per gallon
IQR(mtcars$mpg)
Q # Find the standard deviation of miles per gallon
sd(mtcars$mpg)
Mean, median and mode.
50xp
Mean, median and mode are all measures of the average. In a perfect normal distribution the mean, median and mode values are identical, but when the data is skewed this changes. In the the graph on the right which of the following statements are most accurate?
The mode is higher than the mean. It makes most sense to use the median to measure central tendency.

Calculating Z-scores

# Calculate the z-scores of mpg
(mtcars$mpg - mean(mtcars$mpg)) / sd(mtcars$mpg)
Distributions And Z-scores
50xp
In the distribution shown on the right, what percentage of data will fall between the z-scores of -2 and 2?
95 %
Z-score Outliers
50xp
Outside of which boundaries might an observation be considered an outlier?
-3 and 3


4 comments: