Centrality: Mean, Median, Mode
Centrality: Mean, Median, Mode
0 . 0 . 0 . 0 . 0
o . o . o . o . o . o
Centrality: Mean, Median, Mode
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4 ( n=8 , n/2 = 4)
Centrality: Mean, Median, Mode
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Quantiles, Percentiles, and the Five-Number Summary
The median = the 0.5th quantile = The 50th percentile
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4
0.5th quantile = median = 2.6
Quantiles, Percentiles, and the Five-Number Summary
xdata <- c(2,4.4,3,3,2,2.2,2,4)
quantile(xdata,prob=0.8) # the 0.8th quan- tile (or 80th percentile)
## 80%
## 3.6
quantile(xdata,prob=c(0,0.25,0.5,0.75,1))
## 0% 25% 50% 75% 100%
## 2.00 2.00 2.60 3.25 4.40
summary(xdata) # A quartile is a type of quantile.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.600 2.825 3.250 4.400
Quantiles, Percentiles, and the Five-Number Summary
A quartile is a type of quantile.
Spread: Variance, Standard Deviation, and the Interquartile Range
xdata <- c(2,4.4,3,3,2,2.2,2,4)
ydata <- c(1,4.4,1,3,2,2.2,2,7)
mean(xdata)
## [1] 2.825
mean(ydata)
## [1] 2.825
plot(xdata,type="n",xlab="",ylab="data vector",yaxt="n",bty="n")
abline(h=c(3,3.5),lty=2,col="red")
abline(v=2.825,lwd=2,lty=3)
text(c(0.8,0.8),c(3,3.5),labels=c("x","y"))
points(jitter(c(xdata,ydata)),c(rep(3,length(xdata)), rep(3.5,length(ydata))))
the observations in ydata are more “spread out”
Spread: Variance, Standard Deviation, and the Interquartile Range
Spread: Variance, Standard Deviation, and the Interquartile Range
2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4 ( mean = 2.825)
Spread: Variance, Standard Deviation, and the Interquartile Range
0.953 represents the average distance of each observation from the mean
Spread: Variance, Standard Deviation, and the Interquartile Range,
Spread: Variance, Standard Deviation, and the Interquartile Range
xdata <- c(2,4.4,3,3,2,2.2,2,4)
var(xdata)
## [1] 0.9078571
sd(xdata)
## [1] 0.9528154
IQR(xdata)
## [1] 1.25
Covariance and Correlation
Covariance and Correlation
x = {x1,x2,…,xn}
y = {y1,y2,…,yn}
for i = 1,. . . ,n
When you get a positive result for rxy, it shows that there is a positive lin- ear relationship. When rxy = 0, this indicates that there is no linear relationship.
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
mean x and y = 2.825
positive relationship
Covariance and Correlation
Covariance and Correlation
Most common of these is Pearson’s product-moment correlation coefficient. (R default)
The correlation coefficient estimates the nature of the linear relationship between two sets of observations
−1 ≤ ρxy ≤ 1
ρxy = 1, which is a perfect positive linear relationship
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
(mean x and y = 2.825)
(sx = 0.953 and sy = 2.013)
(rxy = 1.479)
ρxy is positive
Covariance and Correlation
xdata <- c(2,4.4,3,3,2,2.2,2,4)
ydata <- c(1,4.4,1,3,2,2.2,2,7)
cov(xdata,ydata)
## [1] 1.479286
cor(xdata,ydata)
## [1] 0.7713962
Covariance and Correlation
xdata <- c(2,4.4,3,3,2,2.2,2,4)
ydata <- c(1,4.4,1,3,2,2.2,2,7)
plot(xdata,ydata,pch=13,cex=1.5)
Barplots and Pie Charts
station_data <- read.csv("https://web.itu.edu.tr/~tokerem/18397_Cekmekoy_Omerli_15dk.txt", header=T, sep = ";")
head(station_data)
## sta_no year month day hour minutes temp precipitation pressure
## 1 18397 2017 7 26 18 0 23.9 0 1003.0
## 2 18397 2017 7 26 18 15 23.9 0 1003.1
## 3 18397 2017 7 26 18 30 23.8 0 1003.2
## 4 18397 2017 7 26 18 45 23.8 0 1003.2
## 5 18397 2017 7 26 19 0 23.6 0 1003.2
## 6 18397 2017 7 26 19 15 23.2 0 1003.1
## relative_humidity
## 1 94
## 2 95
## 3 96
## 4 96
## 5 96
## 6 97
Barplots and Pie Charts
head(station_data$temp)
## [1] 23.9 23.9 23.8 23.8 23.6 23.2
barplot(station_data$temp)
Barplots and Pie Charts
table(station_data$temp)
##
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9 21 21.2 21.4 21.6 21.7 21.9
## 1 1 1 4 6 4 2 6 2 1 2 3 1 2 1
## 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 23 23.1 23.2 23.6 23.8 23.9 24.2
## 1 5 3 3 8 7 1 1 2 1 3 1 2 2 2
## 25.1 25.4 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8 28 28.4
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## 28.5 28.8 29 29.2 29.3 29.4 29.5 29.6 30 30.1 30.2 30.4 30.8 30.9 31
## 1 1 2 2 2 1 1 1 1 2 1 4 1 3 1
## 31.2 31.5
## 1 1
f_temp <- table(station_data$temp)
Barplots and Pie Charts
barplot(f_temp)
Barplots and Pie Charts
barplot(f_temp,beside=TRUE,horiz=TRUE,las=1,
main="Frequency of Station Temperature",
names.arg=c("T"),legend.text=c("TEMP-f"),
args.legend=list(x="bottomright"))
Barplots and Pie Charts
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
qplot(factor(station_data$temp),geom="bar")
Barplots and Pie Charts
head(station_data$precipitation)
pie(table(station_data$precipitation),labels=c("V1","V2","V3","V4","V5"),col=c("white","blue","green","orange"),main="pie chart for precipitation")
Histogram
hist(station_data$temp)
Histogram
hist(station_data$tem,breaks=seq(19,32,1),col="green",main="Temp",xlab="HP")
abline(v=c(mean(station_data$temp),median(station_data$temp)), col=c("blue","red"),lty=c(2,3),lwd=2)
legend("topright",legend=c("mean T","median T"),lty=c(2,3),lwd=2,col=c("blue","red"))
Histogram
qplot(station_data$temp,geom="blank",main="Temp Hist",xlab="Temp")+
geom_histogram(color="black",fill="white",breaks=seq(19,32,1),closed="right") +
geom_vline(mapping=aes(xintercept=c(mean(station_data$tem), median(station_data$tem)), linetype=factor(c("mean","median"))) , col=c("blue","red"),show.legend=TRUE)+
scale_linetype_manual(values=c(2,3)) +
labs(linetype="")
Boxplot
boxplot(station_data$temp)
Boxplot
Scatter Plots
plot(station_data$temp,station_data$relative_humidity)
Scatter Plots
head(station_data[,7:10])
## temp precipitation pressure relative_humidity
## 1 23.9 0 1003.0 94
## 2 23.9 0 1003.1 95
## 3 23.8 0 1003.2 96
## 4 23.8 0 1003.2 96
## 5 23.6 0 1003.2 96
## 6 23.2 0 1003.1 97
library("GGally")
ggpairs(station_data[,7:10],axisLabels="internal")