The Basics of Simulation

This post explores some of the basic concepts of simulation. I mostly explore these concepts using basic probablity and the built in distribution functions. This is a reproducible example if you have R Studio just make sure you have installed the correct packages.

#Ideas from Probablity Course https://www.probabilitycourse.com/chapter13/chapter13.php


set.seed(123)
p <- 0.5
n <- 1000
U <- runif(n)

toss <- as.integer(U < p)

#cumalative number of heads 
a <- numeric(n+1)
#running average of heads
avg <- numeric(n)

for(i in 2:n+1){
  a[i] <- a[i-1] + toss[i-1]
  avg[i-1] <- a[i]/(i-1)
}

plot(1:n, avg, type = "l", lwd = 5, col = "blue", ylab = "ProportionofHeads",
xlab = "CoinTossNumber")

rm(p,n,U,toss,a,avg, i)
set.seed(123)
p <- 0.2
n <- 1000
U <- runif(n)
#The function U < p creates Bernouli Random variables with probablity p, 1 if U < p 0 otherwise
#the sum of Bernouli variables is a Binomial of (n,p) so X is a Binomial(1000,0.2)

X <- sum(as.integer(U < p))
X
## [1] 198
#The built in function for binomial (number of observations, number of trials, probablity)
rbinom(1,n,p)
## [1] 196
#Arbitrary Distribution
#P(X =1) = 0.35, P(X = 2) = 0.15, P(X=3) = 0.4, P(X=4) = 0.1
#P(X=xi) = P(U element Ai) = pi
P <- c(0.35,0.5,0.9,1)
X <- c(1,2,3,4)
i <- 1
r <- runif(1)

while(r > P[i]){
  i <- i + 1
}

X[i]
## [1] 1
#Create RV with density f(x) = 2.5*x*sqrt(x) = x^5/2
#Using inverse X^5/2 = U >> X = U^2/5

U <- runif(1)
X <- U^(2/5)
print(paste('Distrubution with density f(x) = X^(5/2):',X))
## [1] "Distrubution with density f(x) = X^(5/2): 0.938571171721709"
#Generate RV with density function Beta(2,4), g(x) =1 0<x<1
#f(x) = 20x(1-x)^3, g(x) = 1, f(x)/g(x) = 20x(1-x)^3
#Find smallest c such that f(x)/g(x) <= c
#Using differention d(f(x)/g(x))/dx >>> x = 1/4 >>> f(x)/g(x) <= 135/64 >>> f(x)/(c*g(x)) = 256x(1-x)^3

#This code keeps looping until U2 (which is f(x)/c*g(x)) dips below its bound. Hence it rejects higher values

n <- 1
rejects <- 0
while(n == 1){
  U1 <- runif(1)
  U2 <- runif(1)
  rejects <- rejects + 1
  if(U2 <= 256/27*U1*(1-U1)^3){
    X <- U1
    n <- 0
  }
}

print(paste('Total Number of Rejections:', rejects))
## [1] "Total Number of Rejections: 3"
print(paste('Beta RV:', X))
## [1] "Beta RV: 0.0656281118281186"
print(paste('R produced Beta:', rbeta(1,2,4, ncp = 0)))
## [1] "R produced Beta: 0.281731495984851"
rm(i,n,p,P,r,U,X, rejects)
#Transformations of Uniform Distribution to other distrubtions

set.seed(123)
#Inverse Transformation to Exponential
#F(x) = 1 - e^-x
#X = F-1(U) = - ln(1-U) >>> - ln(U)

lambda <- 1
U <- runif(1)
X <- (-1/lambda)*log(U)
print(paste('Exponential RV:',X))
## [1] "Exponential RV: 1.24626281987372"
print(paste('R produced Exponential RV:',rexp(1,lambda)))
## [1] "R produced Exponential RV: 0.576610270887613"
#Using sums to create Gamma(n,lambda) from exp(lambda) Gamma(n,lambda) = sum_n(Exponential(lambda))
n <- 20
X <- (-1/lambda)*sum(log(U))
print(paste('Gamma RV:',X))
## [1] "Gamma RV: 1.24626281987372"
print(paste('R produced Gamma RV:', rgamma(1,n,lambda)))
## [1] "R produced Gamma RV: 18.4968091472022"
rm(n,lambda,X)

#Create Poisson Distribution which is the number of exponential arrivals in a given time period
#Ti = 1/lambdaln(Ui)

set.seed(123)

lambda <- 2
i <- 0
U <- runif(1)
Y <- -1/lambda*log(U)
sum <- Y
while(sum < 1){
  U <- runif(1)
  Y <- -1/lambda*log(U)
  sum <- sum + Y
  i <- i+1
}
X <- i
print(paste('Poisson RV:',X))
## [1] "Poisson RV: 2"
print(paste('R produced Poisson RV:', rpois(1,lambda)))
## [1] "R produced Poisson RV: 4"
#Creating Normals with the Box Mueller Method (inefficient because of the sqrt, cos, and sine functions)
#Z1 = sqrt(-2ln(U1)cos(2*pi*U2))
#Z2 = sqrt(-2ln(U1)sin(2*pi*U2))

n <- 5000
U1 <- runif(n)
U2 <- runif(n)

Z1 <- sqrt(-2*log(U1))*cos(2*pi*U2)
Z2 <- sqrt(-2*log(U1))*sin(2*pi*U2)

#Created Via R Function
Z3 <- rnorm(5000)
hist(Z1,col = 'wheat', label = T)

hist(Z3,col = 'wheat', label = T)

#Geometric Function - Loops Bernoulis until first success
# K <- number of failures plust 1 success 

K <- 1
p <- 0.2

while(runif(1) > p){
  K <- K +1
}
print(paste('Geometric RV:', K))
## [1] "Geometric RV: 8"
print(paste('R produced Geometric RV:',rgeom(1,p)))
## [1] "R produced Geometric RV: 1"
#Negative Binomial Method - Number of Geometric trials needed to get r success NegBin(1,r,p)

K <- 1
p <- 0.2
r <- 2
success <- 0

while(success < r){
  if(runif(1) > p){
    K <- K + 1
    #failure
  }else{
    success <- success + 1
  }
}

print(paste('Negative Binomial RV:', K+r-1))
## [1] "Negative Binomial RV: 6"
print(paste('R produced Negative Binomial:', rnbinom(1,r,p)))
## [1] "R produced Negative Binomial: 6"
rt(1,1,1) #number of variables, df, ncp
## [1] -0.06314223

Typical metrics for Queueing that can be extended to all types of simulations:

• L: average number of jobs in the system • W: average time spent in the system (cycle time) • Q: average number of jobs in queue • d: average time in queue • system utilization • system throughput • distribution of waiting time • distribution of system size • distribution of queue size

#M/M/1 Queue Simulation

#Source for this code http://web02.gonzaga.edu/faculty/burchn/R_files/Miscellaneous/queueing_theory_MM1.html

#variable saying how many arrivals per time period
lambda = .3
#saying the average departures per time period
mu = 1
#How long the simulation runs note not a number of events
time = 500
t = 0
#the length of a queue after a number of events. Aka Q_history[500] = 5 says the queue is 5 people long after 500 events it isn't a sum.
#It updates after for n during event n
Q_hist = 0
#Think this is the sum of all the queues 
s = 0
#exponential distb with mean 1/rate T1 is time uuntil next event rate lamba + mu if there is something in queue otherwise just an arrival
T1 = rexp(1,rate=lambda)
#Initializing parameter
Q = 1
#Time until first event is T1
event_times = T1
#The time of the first event is T1
t = T1
#Same reasoning
num_event = 1

i <- 1
sims <- 10
#Busy time simulation 1:10
B <- c(1:10)
BT <- c(1:10)
#Average length of queue for simulation 1:10
L <- c(1:10)
#Average time customer spends in line
W <- c(1:10)

while (i <= sims){
  print(i)
  while (t<time) {
  num_event = num_event+1
  if(Q>0) {
    # we checked to make sure queue was not empty
    #odds someone arrives or leaves the queue
    T1 = rexp(1,rate=lambda+mu)
    #use p as random number to determine if next even is an arrival or a departure
    p = runif(1,0,1)
    Q_hist[num_event] = Q
    #if p is less than lambda/(lamda+mu) it is an arrival otherwise it is a departure
    Q = ifelse(p<lambda/(lambda+mu),Q+1,Q-1)
    } else {
      # here, the queue was empty, so only arrivals are possible
      T1 = rexp(1,rate=lambda)
      Q_hist[num_event] = Q
      Q = 1
          }
  #new time is the original t plus the time to the next event
  t = t+T1
  #A vector that shows how long it is to the next event
  event_times[num_event] = T1
  s = s+T1*Q_hist[num_event]
  }
#Time system is busy
BT[i] <- sum(event_times)-sum(event_times[which(Q_hist %in% 0)])
num_cust <- lambda*time
B[i] <- BT[i]/length(which(Q_hist %in% 0))

#Average queue length in the system
L[i] <- s/t

#Average time customer spends in line
W[i] <- L[i] / lambda

time = 500
t = 0
Q_hist = 0
s = 0
T1 = rexp(1,rate=lambda)
Q = 1
event_times = T1
t = T1
num_event = 1

i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
#rho/(1-rho) [.4285, 9, -11]
avg_num <- mean(L) 

#rho/(1-rho)^2 [.6122, 90, 110]
variance <- var(L) 

Sd2 <- sum((L-avg_num)^2 / (sims - 1))

# 1/(mu-lambda) [1.42, 10, -10]
busy_time <- mean(B)

avg_num
## [1] 0.4417826
variance
## [1] 0.003876663
Sd2
## [1] 0.003876663
busy_time
## [1] 1.452299

A quick Monte Carlo Simulation to estimate the value of the integral \(\int_{0.01}^1x^{-0.5}\,dx\)

#https://stackoverflow.com/questions/22001977/monte-carlo-integration-in-r-getting-the-wrong-answer-using-hit-or-miss

s <- NULL

m <- 1000
a <- 0.01
b <- 1
set.seed(5)
x <- runif(m,a,b)
y <- 10*runif(m,0,1)

for (i in 1:m){
    if(y[i]<(x[i]^(-0.5))){
        s[i] <- 1
    }
    else{
        s[i] <-0
    }
}

nn<- sum(s)*(b-a)/m*10 #note that the addition of the area of the rectangle
print(nn)
## [1] 1.683
plot(x,y)

f2 <- function(x)   sqrt(1-x^2)

s <- seq(-1 , 1 ,by=0.001)
plot(s,f2(s))

# Get the max value of function within the range
c <- ceiling(max(f2(s)))
# [1] 1

n <- 1000000
a <- -1
b <- 1

set.seed(5)
x <- runif(n,a,b)
y <- c*runif(n,0,1)
R <- sum(y < f2(x))/n

(b-a)*c*R
## [1] 1.57063
#[1] 1.57063 # multiply it by 2 to get full area

pi/2
## [1] 1.570796
#[1] 1.570796

Sim HW2 does a good job showing confidence intervals and convergence of normals, exponentials, and lognormals.

Sim HW4 shows how to run 5 tests for independance (Runs, Autocorrelation), uniformity (Chi-Squared and KS), or both (Serial)

Sim HW5 shows how to generate RVs recurseively until they drop below a certain variance.

#CLT Basics https://stats.stackexchange.com/questions/22557/central-limit-theorem-versus-law-of-large-numbers
#https://www.probabilitycourse.com/chapter7/7_2_4_convergence_in_distribution.php
#https://www.analyticsvidhya.com/blog/2019/05/statistics-101-introduction-central-limit-theorem/

#LLN (WLLN - convergence in prob) (SLLN - almost sure convergence) (CLT - convergence in distribution)
#WLLN https://www.probabilitycourse.com/chapter7/7_2_5_convergence_in_probability.php
#SLLN and continous mapping theroem https://www.probabilitycourse.com/chapter7/7_2_7_almost_sure_convergence.php

#Probability Basics
#https://daviddalpiaz.github.io/r4sl/probability-review.html

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

Erick Jones
Erick Jones
PhD Candidate

Erick Jones is a Ph.D. candidate in Operations Research and Industrial Engineering who develops multi-systems optimization models to analyze how energy systems, water resources, supply chains, urban space, and transportation networks operate in concert to influence economic and environmental well-being.