WEEK 6:
A) Write an R script to find basic descriptive statistics using summary, str, quartile function
on mtcars & cars datasets.
Sol:-
Summary function
x<-c(1,2,3,4,5)
summary(x)
y<-c(2,3,4,5,6,7,8)
summary(y)
output:-
summary(x)
Min . 1st Qu. Median Mean 3rd Qu. Max.
1 2 3 3 4 5
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.0 3.5 5.0 5.0 6.5 8.0
Str function
rv <- c(11, 18, 19, 21, 46)
rv
str(rv)
ouput:-
str(rv)
num [1:5] 11 18 19 21 46
quartile functions in “r”
take mt cars data set to load in r –programme
data("mtcars") # to dataset in r
head(mtcars)
outpuit:-
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
nrow(mtcars)
out put:-
## [1] 32
ncol(mtcars)
## [1] 11
tail(mtcars)
output:-
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1
summary(mtcars)
4 2
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
B) Write an R script to find subset of dataset by using subset (), aggregate () functions on
iris dataset
subset fuction :
t<-data("iris")
t<-data.frame(iris)
df<-subset(t,select=2:3)
df
output:-
df<-subset(t,select=2:3)
> df
Sepal.Width Petal.Length
1 3.5 1.4
2 3.0 1.4
3 3.2 1.3
4 3.1 1.5
5 3.6 1.4
6 3.9 1.7
7 3.4 1.4
8 3.4 1.5
9 2.9 1.4
10 3.1 1.5
11 3.7 1.5
12 3.4 1.6
13 3.0 1.4
14 3.0 1.1
15 4.0 1.2
16 4.4 1.5
17 3.9 1.3
18 3.5 1.4
19 3.8 1.7
20 3.8 1.5
21 3.4 1.7
22 3.7 1.5
23 3.6 1.0
24 3.3 1.7
25 3.4 1.9
26 3.0 1.6
27 3.4 1.6
28 3.5 1.5
29 3.4 1.4
30 3.2 1.6
31 3.1 1.6
32 3.4 1.5
33 4.1 1.5
34 4.2 1.4
35 3.1 1.5
36 3.2 1.2
37 3.5 1.3
38 3.6 1.4
39 3.0 1.3
40 3.4 1.5
41 3.5 1.3
42 2.3 1.3
43 3.2 1.3
44 3.5 1.6
45 3.8 1.9
46 3.0 1.4
47 3.8 1.6
48 3.2 1.4
49 3.7 1.5
50 3.3 1.4
51 3.2 4.7
52 3.2 4.5
53 3.1 4.9
54 2.3 4.0
until
150 3.0 5.1
Aggregate function
Aggregate() Function in R Splits the data into subsets, computes summary statistics for each
subsets and returns the result in a group by form. Aggregate function in R is similar to group by in
SQL. Aggregate() function is useful in performing all the aggregate operations like
sum,count,mean, minimum and Maximum.
Lets see an Example of following
Aggregate() which computes group sum
calculate the group max and minimum using aggregate() function
Aggregate() function which computes group mean
Get group counts using aggregate() function
Syntax for Aggregate() Function in R:
aggregate(x, by, FUN, …, simplify = TRUE, drop = TRUE)
X an R object, Mostly a dataframe
by a list of grouping elements, by which the subsets are grouped by
FUN a function to compute the summary statistics
simplify a logical indicating whether results should be simplified to a vector or matrix if possible
drop a logical indicating whether to drop unused combinations of grouping values.
Example of aggregate()function:
# Aggregate function in R with mean summary statistics
agg_mean = aggregate(iris[,1:4],by=list(iris$Species),FUN=mean, na.rm=TRUE)
agg_mean
OUTPUT:
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.006 3.428 1.462 0.246
2 versicolor 5.936 2.770 4.260 1.326
3 virginica 6.588 2.974 5.552 2.026
# Aggregate function in R with SUM summary statistics
agg_sum = aggregate(iris[,1:4],by=list(iris$Species),FUN=sum, na.rm=TRUE)
agg_sum
OUTPUT:
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 250.3 171.4 73.1 12.3
2 versicolor 296.8 138.5 213.0 66.3
3 virginica 329.4 148.7 277.6 101.3
# Aggregate function in R with COUNT
agg_count = aggregate(iris[,1:4],by=list(iris$Species),FUN=length)
agg_count
OUTPUT:
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 50 50 50 50
2 versicolor 50 50 50 50
3 virginica 50 50 50 50
# Aggregate function in R with MAXIMUM
agg_max = aggregate(iris[,1:4],by=list(iris$Species),FUN=max, na.rm=TRUE)
agg_max
OUTPUT:
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.8 4.4 1.9 0.6
2 versicolor 7.0 3.4 5.1 1.8
3 virginica 7.9 3.8 6.9 2.5
# Aggregate function in R with MAXIMUM
agg_min = aggregate(iris[,1:4],by=list(iris$Species),FUN=min, na.rm=TRUE)
agg_min
OUTPUT:
Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 4.3 2.3 1.0 0.1
2 versicolor 4.9 2.0 3.0 1.0
3 virginica 4.9 2.2 4.5 1.4