The Basics of Simulation
This post explores some of the basic concepts of simulation. I mostly explore these concepts using basic probablity and the built in distribution functions. This is a reproducible example if you have R Studio just make sure you have installed the correct packages.
#Ideas from Probablity Course https://www.probabilitycourse.com/chapter13/chapter13.php
set.seed(123)
p <- 0.5
n <- 1000
U <- runif(n)
toss <- as.integer(U < p)
#cumalative number of heads
a <- numeric(n+1)
#running average of heads
avg <- numeric(n)
for(i in 2:n+1){
a[i] <- a[i-1] + toss[i-1]
avg[i-1] <- a[i]/(i-1)
}
plot(1:n, avg, type = "l", lwd = 5, col = "blue", ylab = "ProportionofHeads",
xlab = "CoinTossNumber")
rm(p,n,U,toss,a,avg, i)
set.seed(123)
p <- 0.2
n <- 1000
U <- runif(n)
#The function U < p creates Bernouli Random variables with probablity p, 1 if U < p 0 otherwise
#the sum of Bernouli variables is a Binomial of (n,p) so X is a Binomial(1000,0.2)
X <- sum(as.integer(U < p))
X
## [1] 198
#The built in function for binomial (number of observations, number of trials, probablity)
rbinom(1,n,p)
## [1] 196
#Arbitrary Distribution
#P(X =1) = 0.35, P(X = 2) = 0.15, P(X=3) = 0.4, P(X=4) = 0.1
#P(X=xi) = P(U element Ai) = pi
P <- c(0.35,0.5,0.9,1)
X <- c(1,2,3,4)
i <- 1
r <- runif(1)
while(r > P[i]){
i <- i + 1
}
X[i]
## [1] 1
#Create RV with density f(x) = 2.5*x*sqrt(x) = x^5/2
#Using inverse X^5/2 = U >> X = U^2/5
U <- runif(1)
X <- U^(2/5)
print(paste('Distrubution with density f(x) = X^(5/2):',X))
## [1] "Distrubution with density f(x) = X^(5/2): 0.938571171721709"
#Generate RV with density function Beta(2,4), g(x) =1 0<x<1
#f(x) = 20x(1-x)^3, g(x) = 1, f(x)/g(x) = 20x(1-x)^3
#Find smallest c such that f(x)/g(x) <= c
#Using differention d(f(x)/g(x))/dx >>> x = 1/4 >>> f(x)/g(x) <= 135/64 >>> f(x)/(c*g(x)) = 256x(1-x)^3
#This code keeps looping until U2 (which is f(x)/c*g(x)) dips below its bound. Hence it rejects higher values
n <- 1
rejects <- 0
while(n == 1){
U1 <- runif(1)
U2 <- runif(1)
rejects <- rejects + 1
if(U2 <= 256/27*U1*(1-U1)^3){
X <- U1
n <- 0
}
}
print(paste('Total Number of Rejections:', rejects))
## [1] "Total Number of Rejections: 3"
print(paste('Beta RV:', X))
## [1] "Beta RV: 0.0656281118281186"
print(paste('R produced Beta:', rbeta(1,2,4, ncp = 0)))
## [1] "R produced Beta: 0.281731495984851"
rm(i,n,p,P,r,U,X, rejects)
#Transformations of Uniform Distribution to other distrubtions
set.seed(123)
#Inverse Transformation to Exponential
#F(x) = 1 - e^-x
#X = F-1(U) = - ln(1-U) >>> - ln(U)
lambda <- 1
U <- runif(1)
X <- (-1/lambda)*log(U)
print(paste('Exponential RV:',X))
## [1] "Exponential RV: 1.24626281987372"
print(paste('R produced Exponential RV:',rexp(1,lambda)))
## [1] "R produced Exponential RV: 0.576610270887613"
#Using sums to create Gamma(n,lambda) from exp(lambda) Gamma(n,lambda) = sum_n(Exponential(lambda))
n <- 20
X <- (-1/lambda)*sum(log(U))
print(paste('Gamma RV:',X))
## [1] "Gamma RV: 1.24626281987372"
print(paste('R produced Gamma RV:', rgamma(1,n,lambda)))
## [1] "R produced Gamma RV: 18.4968091472022"
rm(n,lambda,X)
#Create Poisson Distribution which is the number of exponential arrivals in a given time period
#Ti = 1/lambdaln(Ui)
set.seed(123)
lambda <- 2
i <- 0
U <- runif(1)
Y <- -1/lambda*log(U)
sum <- Y
while(sum < 1){
U <- runif(1)
Y <- -1/lambda*log(U)
sum <- sum + Y
i <- i+1
}
X <- i
print(paste('Poisson RV:',X))
## [1] "Poisson RV: 2"
print(paste('R produced Poisson RV:', rpois(1,lambda)))
## [1] "R produced Poisson RV: 4"
#Creating Normals with the Box Mueller Method (inefficient because of the sqrt, cos, and sine functions)
#Z1 = sqrt(-2ln(U1)cos(2*pi*U2))
#Z2 = sqrt(-2ln(U1)sin(2*pi*U2))
n <- 5000
U1 <- runif(n)
U2 <- runif(n)
Z1 <- sqrt(-2*log(U1))*cos(2*pi*U2)
Z2 <- sqrt(-2*log(U1))*sin(2*pi*U2)
#Created Via R Function
Z3 <- rnorm(5000)
hist(Z1,col = 'wheat', label = T)
hist(Z3,col = 'wheat', label = T)
#Geometric Function - Loops Bernoulis until first success
# K <- number of failures plust 1 success
K <- 1
p <- 0.2
while(runif(1) > p){
K <- K +1
}
print(paste('Geometric RV:', K))
## [1] "Geometric RV: 8"
print(paste('R produced Geometric RV:',rgeom(1,p)))
## [1] "R produced Geometric RV: 1"
#Negative Binomial Method - Number of Geometric trials needed to get r success NegBin(1,r,p)
K <- 1
p <- 0.2
r <- 2
success <- 0
while(success < r){
if(runif(1) > p){
K <- K + 1
#failure
}else{
success <- success + 1
}
}
print(paste('Negative Binomial RV:', K+r-1))
## [1] "Negative Binomial RV: 6"
print(paste('R produced Negative Binomial:', rnbinom(1,r,p)))
## [1] "R produced Negative Binomial: 6"
rt(1,1,1) #number of variables, df, ncp
## [1] -0.06314223
Typical metrics for Queueing that can be extended to all types of simulations:
• L: average number of jobs in the system • W: average time spent in the system (cycle time) • Q: average number of jobs in queue • d: average time in queue • system utilization • system throughput • distribution of waiting time • distribution of system size • distribution of queue size
#M/M/1 Queue Simulation
#Source for this code http://web02.gonzaga.edu/faculty/burchn/R_files/Miscellaneous/queueing_theory_MM1.html
#variable saying how many arrivals per time period
lambda = .3
#saying the average departures per time period
mu = 1
#How long the simulation runs note not a number of events
time = 500
t = 0
#the length of a queue after a number of events. Aka Q_history[500] = 5 says the queue is 5 people long after 500 events it isn't a sum.
#It updates after for n during event n
Q_hist = 0
#Think this is the sum of all the queues
s = 0
#exponential distb with mean 1/rate T1 is time uuntil next event rate lamba + mu if there is something in queue otherwise just an arrival
T1 = rexp(1,rate=lambda)
#Initializing parameter
Q = 1
#Time until first event is T1
event_times = T1
#The time of the first event is T1
t = T1
#Same reasoning
num_event = 1
i <- 1
sims <- 10
#Busy time simulation 1:10
B <- c(1:10)
BT <- c(1:10)
#Average length of queue for simulation 1:10
L <- c(1:10)
#Average time customer spends in line
W <- c(1:10)
while (i <= sims){
print(i)
while (t<time) {
num_event = num_event+1
if(Q>0) {
# we checked to make sure queue was not empty
#odds someone arrives or leaves the queue
T1 = rexp(1,rate=lambda+mu)
#use p as random number to determine if next even is an arrival or a departure
p = runif(1,0,1)
Q_hist[num_event] = Q
#if p is less than lambda/(lamda+mu) it is an arrival otherwise it is a departure
Q = ifelse(p<lambda/(lambda+mu),Q+1,Q-1)
} else {
# here, the queue was empty, so only arrivals are possible
T1 = rexp(1,rate=lambda)
Q_hist[num_event] = Q
Q = 1
}
#new time is the original t plus the time to the next event
t = t+T1
#A vector that shows how long it is to the next event
event_times[num_event] = T1
s = s+T1*Q_hist[num_event]
}
#Time system is busy
BT[i] <- sum(event_times)-sum(event_times[which(Q_hist %in% 0)])
num_cust <- lambda*time
B[i] <- BT[i]/length(which(Q_hist %in% 0))
#Average queue length in the system
L[i] <- s/t
#Average time customer spends in line
W[i] <- L[i] / lambda
time = 500
t = 0
Q_hist = 0
s = 0
T1 = rexp(1,rate=lambda)
Q = 1
event_times = T1
t = T1
num_event = 1
i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
#rho/(1-rho) [.4285, 9, -11]
avg_num <- mean(L)
#rho/(1-rho)^2 [.6122, 90, 110]
variance <- var(L)
Sd2 <- sum((L-avg_num)^2 / (sims - 1))
# 1/(mu-lambda) [1.42, 10, -10]
busy_time <- mean(B)
avg_num
## [1] 0.4417826
variance
## [1] 0.003876663
Sd2
## [1] 0.003876663
busy_time
## [1] 1.452299
A quick Monte Carlo Simulation to estimate the value of the integral \(\int_{0.01}^1x^{-0.5}\,dx\)
s <- NULL
m <- 1000
a <- 0.01
b <- 1
set.seed(5)
x <- runif(m,a,b)
y <- 10*runif(m,0,1)
for (i in 1:m){
if(y[i]<(x[i]^(-0.5))){
s[i] <- 1
}
else{
s[i] <-0
}
}
nn<- sum(s)*(b-a)/m*10 #note that the addition of the area of the rectangle
print(nn)
## [1] 1.683
plot(x,y)
f2 <- function(x) sqrt(1-x^2)
s <- seq(-1 , 1 ,by=0.001)
plot(s,f2(s))
# Get the max value of function within the range
c <- ceiling(max(f2(s)))
# [1] 1
n <- 1000000
a <- -1
b <- 1
set.seed(5)
x <- runif(n,a,b)
y <- c*runif(n,0,1)
R <- sum(y < f2(x))/n
(b-a)*c*R
## [1] 1.57063
#[1] 1.57063 # multiply it by 2 to get full area
pi/2
## [1] 1.570796
#[1] 1.570796
Sim HW2 does a good job showing confidence intervals and convergence of normals, exponentials, and lognormals.
Sim HW4 shows how to run 5 tests for independance (Runs, Autocorrelation), uniformity (Chi-Squared and KS), or both (Serial)
Sim HW5 shows how to generate RVs recurseively until they drop below a certain variance.
#CLT Basics https://stats.stackexchange.com/questions/22557/central-limit-theorem-versus-law-of-large-numbers
#https://www.probabilitycourse.com/chapter7/7_2_4_convergence_in_distribution.php
#https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/
#LLN (WLLN - convergence in prob) (SLLN - almost sure convergence) (CLT - convergence in distribution)
#WLLN https://www.probabilitycourse.com/chapter7/7_2_5_convergence_in_probability.php
#SLLN and continous mapping theroem https://www.probabilitycourse.com/chapter7/7_2_7_almost_sure_convergence.php
#Probability Basics
#https://daviddalpiaz.github.io/r4sl/probability-review.html
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.