# Chapter 5 Maximum likelihood estimation for ERGMs

The estimation procedure starts assuming that the observed network $$y$$ is just one realization of a set of many possible networks that could have been formed by the same underlying structural processes. Each network statistic $$s(y)$$ is conceptualised as having a particular probability of occurring which is incorporated into the model as a parameter vector $$\theta$$.

The aim of the estimation is to find accurately a parameter value so that the observed network has the highest possible likelihood of being simulated by the given model. In other words, when attempting to generate an ERGM we find the set of parameters which maximise the probability that any random network graph generated by simulating from the ERGM will be identical to the observed network in terms of network statistics: $\begin{equation*} \mathbb{E}_{y\ |\ \theta}\left[s(y)\right] = s(y) \end{equation*}$ where $$\mathbb{E}_{y\ |\ \theta}$$ is the expectation under the likelihood distribution $$p(y\ |\ \theta)$$ so that it ensures that the probability mass of the ERGM is centred at $$s(y)$$.

In practice, an exact solution for the ERGM is actually impossible to calculate directly due to the intractability of the normalising constant $$c(\theta)$$.

## 5.1 Monte Carlo maximum likelihood estimation (MC-MLE)

The most advanced classical estimation method used is the Monte Carlo Maximum Likelihood Estimation (MC-MLE) procedure proposed by Geyer and Thompson (1992).

This involves the simulation of a set of random networks from a starting set of estimated parameter values $$\theta_0$$, and then the progressive refinement (until convergence) of the parameter values by measuring how closely these networks match the observed network.

Let’s denote with $$q_{\theta}(y) = \exp\{ \theta^T s(y) \}$$ the unnormalised likelihood. Mathematically, the key identity of the MC-MLE method is the following: \begin{align} \frac {c(\theta)} {c(\theta_0)} = \mathbb{E}_{y\ |\ \theta_0} \left[ \frac {q_{\theta}(y)} {q_{\theta_0}(y)} \right] &= \sum_{y} \frac {q_{\theta}(y)} {q_{\theta_0}(y)} \frac {q_{\theta_0}(y)} {z(\theta_0)} \label{eqn:key} \\ &\approx \frac{1}{m}\sum_{i=1}^m \exp\left\{ (\theta-\theta_0)^T s(y_i) \right\}\notag \end{align} where $$q(\cdot)$$ is the unnormalised likelihood, $$\theta_0$$ is fixed set of parameter values, and $$\mathbb{E}_{y\ |\ \theta_0}$$ denotes an expectation taken with respect to the distribution $$p(y\ |\ \theta_0)$$. In practice this ratio of normalising constants is approximated using graphs $$y_1,\dots,y_m$$ sampled via MCMC from $$p(y\ |\ \theta_0)$$ and importance sampling.

This yields the following approximated log likelihood ratio: $$$w_{\theta_0}(\theta)=\ell(\theta)-\ell(\theta_0) \approx (\theta - \theta_0)^T s(y) - \log\left[ \frac{1}{m}\sum_{i=1}^m \exp\left\{ (\theta-\theta_0)^T s(y_i) \right\} \right] \label{eqn:gey:thom}$$$ where $$\ell(\cdot)$$ is the log-likelihood. $$w_{\theta_0}$$ is a function of $$\theta$$, and its maximum value serves as a Monte Carlo estimate of the MLE.

A crucial aspect of this algorithm is the choice of $$\theta_0$$. Ideally $$\theta_0$$ should be very close to the maximum likelihood estimator of $$\theta$$. Viewed as a function of $$\theta$$, $$w_{\theta_0}(\theta)$$ is very sensitive to the choice of $$\theta_0$$. A poorly chosen value of $$\theta_0$$ may lead to an objective function that cannot even be maximised.

The result of the MC-MLE procedure is a set of estimated parameter values and relative standard errors which measure the degree of accuracy and significance of the estimates.

To implement the MC-MLE procedure we can use the ergm function:

library(statnet)

load(url("https://acaimo.github.io/teaching/network_data/lazega.RData"))

y <- network(Y, directed = FALSE)

# Add attributes to the network object y:
set.vertex.attribute(y, "Office", X$Office) set.vertex.attribute(y, "School", X$School)
set.vertex.attribute(y, "Years", X$Years) set.vertex.attribute(y, "Gender", X$Gender)
set.vertex.attribute(y, "Age", X$Age) set.vertex.attribute(y, "Practice", X$Practice)

model.2 <- y ~ edges +
nodematch("Office") +
nodecov("Years") +
gwesp(0.6, fixed = TRUE) +
gwdegree(0.6, fixed = TRUE)

MLE.2 <- ergm(model.2)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 20:
## Optimizing with step length 1.
## The log-likelihood improved by 2.393.
## Step length converged once. Increasing MCMC sample size.
## Iteration 2 of at most 20:
## Optimizing with step length 1.
## The log-likelihood improved by 0.07188.
## Step length converged twice. Stopping.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Using 20 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 .
## This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(MLE.2)
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula:   y ~ edges + nodematch("Office") + nodecov("Years") + gwesp(0.6,
##     fixed = TRUE) + gwdegree(0.6, fixed = TRUE)
##
## Iterations:  2 out of 20
##
## Monte Carlo MLE Results:
##                   Estimate Std. Error MCMC % z value Pr(>|z|)
## edges            -4.283020   0.628367      0  -6.816   <1e-04 ***
## nodematch.Office  0.879122   0.172198      0   5.105   <1e-04 ***
## nodecov.Years    -0.013444   0.005872      0  -2.290    0.022 *
## gwesp.fixed.0.6   1.375084   0.284870      0   4.827   <1e-04 ***
## gwdeg.fixed.0.6   0.368117   0.622425      0   0.591    0.554
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##      Null Deviance: 873.4  on 630  degrees of freedom
##  Residual Deviance: 491.0  on 625  degrees of freedom
##
## AIC: 501    BIC: 523.2    (Smaller is better.)
mcmc.diagnostics(MLE.2, vars.per.page = 5)
## Sample statistics summary:
##
## Iterations = 16384:4209664
## Thinning interval = 1024
## Number of chains = 1
## Sample size per chain = 4096
##
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
##
##                     Mean      SD Naive SE Time-series SE
## edges             3.6548  18.949  0.29607         0.7016
## nodematch.Office  2.3882  13.281  0.20751         0.4191
## nodecov.Years    91.7998 648.957 10.13996        22.8605
## gwesp.fixed.0.6   5.5210  35.059  0.54779         1.2743
## gwdeg.fixed.0.6   0.8887   4.052  0.06331         0.1509
##
## 2. Quantiles for each variable:
##
##                       2.5%      25%     50%     75%   97.5%
## edges              -31.000   -9.000  3.0000  16.000   42.00
## nodematch.Office   -23.000   -7.000  2.0000  11.000   29.00
## nodecov.Years    -1147.000 -354.500 85.0000 537.250 1408.88
## gwesp.fixed.0.6    -60.120  -18.606  4.7040  29.545   76.52
## gwdeg.fixed.0.6     -5.604   -2.138  0.3728   3.343   10.07
##
##
## Sample statistics cross-correlations:
##                      edges nodematch.Office nodecov.Years gwesp.fixed.0.6
## edges            1.0000000        0.8836640     0.9617906       0.9902512
## nodematch.Office 0.8836640        1.0000000     0.8846236       0.8872925
## nodecov.Years    0.9617906        0.8846236     1.0000000       0.9525337
## gwesp.fixed.0.6  0.9902512        0.8872925     0.9525337       1.0000000
## gwdeg.fixed.0.6  0.8407130        0.7348789     0.8045749       0.7818362
##                  gwdeg.fixed.0.6
## edges                  0.8407130
## nodematch.Office       0.7348789
## nodecov.Years          0.8045749
## gwesp.fixed.0.6        0.7818362
## gwdeg.fixed.0.6        1.0000000
##
## Sample statistics auto-correlation:
## Chain 1
##              edges nodematch.Office nodecov.Years gwesp.fixed.0.6
## Lag 0    1.0000000        1.0000000     1.0000000       1.0000000
## Lag 1024 0.6480951        0.6050305     0.6108816       0.6184689
## Lag 2048 0.4516968        0.4055946     0.4116749       0.4303103
## Lag 3072 0.3176008        0.2792022     0.2761935       0.2995633
## Lag 4096 0.2553005        0.2202678     0.2279571       0.2443971
## Lag 5120 0.1784598        0.1528834     0.1530760       0.1698626
##          gwdeg.fixed.0.6
## Lag 0          1.0000000
## Lag 1024       0.5878483
## Lag 2048       0.4206496
## Lag 3072       0.3096527
## Lag 4096       0.2487994
## Lag 5120       0.1811628
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
##            edges nodematch.Office    nodecov.Years  gwesp.fixed.0.6
##          -1.0556          -0.4626          -0.7383          -1.0443
##  gwdeg.fixed.0.6
##          -1.3035
##
## Individual P-values (lower = worse):
##            edges nodematch.Office    nodecov.Years  gwesp.fixed.0.6
##        0.2911665        0.6436366        0.4603290        0.2963563
##  gwdeg.fixed.0.6
##        0.1923937
## Joint P-value (lower = worse):  0.2936189 .

##
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

## 5.2 Goodness of fit diagnostics

If the estimated ERGM is a good fit to the observed data, then networks simulated $$y_1, \dots, y_m$$ from its MLE should resemble the connectivity structure of the observed data $$y$$.

To do this, $$m = 100$$ graphs are simulated from the maximum likelihood estimate of the parameter vector $$\hat{\theta}$$ and compared to the observed graph in terms of high-level network statistics which are not modelled explicitly:

model.2.gof <- gof(MLE.2 ~ degree + esp + dsp + distance,
control.gof.formula(nsim = 100))

### 5.2.1 Graphical diagnostics

par(mfrow = c(3, 2))
plot(model.2.gof, main = '')

### 5.2.2 Summaries

model.2.gof
##
## Goodness-of-fit for degree
##
##    obs min mean max MC p-value
## 0    2   0 2.53   8       1.00
## 1    3   0 1.66   6       0.58
## 2    2   0 2.16   8       1.00
## 3    4   0 2.61   7       0.62
## 4    2   0 2.97  10       0.86
## 5    4   0 3.77   9       1.00
## 6    4   0 3.28   7       0.80
## 7    1   1 3.29   7       0.26
## 8    1   0 3.28   8       0.24
## 9    5   0 2.76   7       0.32
## 10   1   0 2.20   6       0.74
## 11   1   0 1.81   6       0.96
## 12   2   0 1.19   6       0.56
## 13   3   0 0.82   4       0.12
## 14   0   0 0.56   3       1.00
## 15   1   0 0.39   2       0.68
## 16   0   0 0.29   2       1.00
## 17   0   0 0.17   1       1.00
## 18   0   0 0.14   2       1.00
## 19   0   0 0.08   1       1.00
## 20   0   0 0.02   1       1.00
## 21   0   0 0.02   1       1.00
##
## Goodness-of-fit for edgewise shared partner
##
##       obs min  mean max MC p-value
## esp0    5   0  3.77  10       0.76
## esp1   16   6 17.07  30       1.00
## esp2   29  16 29.40  42       1.00
## esp3   17  15 28.34  47       0.08
## esp4   23   4 18.51  35       0.58
## esp5   11   1 10.07  23       0.96
## esp6   10   0  4.99  19       0.26
## esp7    4   0  2.20   9       0.52
## esp8    0   0  0.76   5       1.00
## esp9    0   0  0.33   3       1.00
## esp10   0   0  0.13   2       1.00
## esp11   0   0  0.05   1       1.00
## esp12   0   0  0.01   1       1.00
##
## Goodness-of-fit for dyadwise shared partner
##
##       obs min   mean max MC p-value
## dsp0  245 113 242.11 412       1.00
## dsp1  138  84 137.62 215       0.96
## dsp2  106  54 107.06 164       0.92
## dsp3   56  27  70.89 118       0.60
## dsp4   44   4  39.26  74       0.72
## dsp5   22   1  19.06  43       0.78
## dsp6   12   0   8.80  29       0.54
## dsp7    6   0   3.25  14       0.44
## dsp8    1   0   1.21   7       1.00
## dsp9    0   0   0.47   4       1.00
## dsp10   0   0   0.21   2       1.00
## dsp11   0   0   0.05   1       1.00
## dsp12   0   0   0.01   1       1.00
##
## Goodness-of-fit for minimum geodesic distance
##
##     obs min   mean max MC p-value
## 1   115  73 115.63 153       0.96
## 2   275 151 276.03 370       1.00
## 3   148  47 122.36 202       0.46
## 4    21   0  19.34  73       0.76
## 5     2   0   2.39  21       0.62
## 6     0   0   0.19   4       1.00
## Inf  69   0  94.06 252       1.00
##
## Goodness-of-fit for model statistics
##
##                        obs        min       mean        max MC p-value
## edges             115.0000   73.00000  115.63000  153.00000       0.96
## nodematch.Office   85.0000   58.00000   86.13000  118.00000       0.98
## nodecov.Years    3812.0000 2487.00000 3842.88000 5413.00000       0.94
## gwesp.fixed.0.6   171.3841   95.32876  172.30463  239.91877       0.94
## gwdeg.fixed.0.6    57.7071   46.13701   57.94133   64.08262       0.80