Goodness-of-fit

Here, we present violin plots representing how well our simulations capture the distribution of running frequency values across clubs (see Lospinoso and Snijders, 2019 and chapter 5.13 of RSiena manual). For GOF-plots for models with running volume as the behavior variable, see here)


Getting started

clean up

rm (list = ls( ))


general custom functions

  • fpackage.check: Check if packages are installed (and install if not) in R (source)
  • fload.R: function to load R-objects under new names.
fpackage.check <- function(packages) {
    lapply(packages, FUN = function(x) {
        if (!require(x, character.only = TRUE)) {
            install.packages(x, dependencies = TRUE)
            library(x, character.only = TRUE)
        }
    })
}


fload.R  <- function(fileName){
  load(fileName)
  get(ls()[ls() != "fileName"])
}


additional functions

  • GeodesicDistribution function: see here
GeodesicDistribution <- function (i, data, sims, period, groupName,
   varName, levls=c(1:5, Inf), cumulative=TRUE, ...) {
     x <- networkExtraction(i, data, sims, period, groupName, varName)
     require(sna)
     a <- sna::geodist(symmetrize(x))$gdist
     if (cumulative)
     {
       gdi <- sapply(levls, function(i){ sum(a<=i) })
     }
     else
     {
       gdi <- sapply(levls, function(i){ sum(a==i) })
     }
     names(gdi) <- as.character(levls)
     gdi
}


necessary packages

We install and load the packages we need later on: - RSiena

packages = c("RSiena")
fpackage.check(packages)

load data

We read in the sienaFit-objects of our 5 clubs (frequency as behavior variable); we take model 5 (our main model)

# large lists, takes a lot of time to load
# when facing facing storage capacity issues, check the capacity:
#memory.limit()
# we increase the limit
#memory.limit(size=56000)

club1 <-  loadRData("test/sienaFit/sienaFit_club1.RData")
club2 <-  loadRData("test/sienaFit/sienaFit_club2.RData")
club3 <-  loadRData("test/sienaFit/sienaFit_club3.RData")
club4 <-  loadRData("test/sienaFit/sienaFit_club4.RData")
club5 <-  loadRData("test/sienaFit/sienaFit_club5.RData")

# list main model (5)
list <- list(club1[[5]], club2[[5]],  club3[[5]], club4[[5]], club5[[5]])

# remove the excess data
rm(club1, club2, club3, club4, club5)

calculate GOF

we calculate GOF (outdegree, indegree, geodesic distance, behavior distribution) for all clubs

for (i in 1:5) {
  # calculate GOF diagnostics
  gofi <- sienaGOF(list[[i]], #i
                 IndegreeDistribution, 
                 verbose = TRUE,
                 join = TRUE, 
                 varName = "kudonet")
  gofo <- sienaGOF(list[[i]], 
                 OutdegreeDistribution, 
                 verbose = TRUE,
                 join = TRUE, 
                 varName = "kudonet")
  gofgeo <- sienaGOF(list[[i]], 
                 GeodesicDistribution, 
                 verbose = TRUE,
                 join = TRUE, 
                 varName = "kudonet")
  goft <- sienaGOF(list[[i]], 
                 TriadCensus, 
                 verbose = TRUE,
                 join = TRUE, 
                 varName = "kudonet")
  gofbeh <- sienaGOF(list[[i]],
                   BehaviorDistribution, levls=0:7,
                   verbose=TRUE, join=TRUE,
                   varName="freq_run")

  # put statistic in list
  goflist <- list (gofi, gofo, gofgeo, goft, gofbeh)
  # save list
  save(goflist, file = paste0("test/GOF/GOF_club", i, ".RData"))
}


Violin plot

We produce violin plots for each club.

Club 1

load("test/GOF/GOF_club1.RData")
plot(goflist[[1]])

plot(goflist[[2]])

plot(goflist[[3]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: Inf.

plot(goflist[[4]])

plot(goflist[[5]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: 7.

Club 2

load("test/GOF/GOF_club2.RData")
plot(goflist[[1]])

plot(goflist[[2]])

plot(goflist[[3]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: Inf.

plot(goflist[[4]])

plot(goflist[[5]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: 7.

Club 3

load("test/GOF/GOF_club3.RData")
plot(goflist[[1]])

plot(goflist[[2]])

plot(goflist[[3]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: Inf.

plot(goflist[[4]])

plot(goflist[[5]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: 7.

Club 4

load("test/GOF/GOF_club4.RData")
plot(goflist[[1]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistics: 7 8.

plot(goflist[[2]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistics: 6 7 8.

plot(goflist[[3]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: Inf.

plot(goflist[[4]])

plot(goflist[[5]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistics: 5 6 7.

Club 5

load("test/GOF/GOF_club5.RData")
plot(goflist[[1]])

plot(goflist[[2]])

plot(goflist[[3]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: Inf.

plot(goflist[[4]])

plot(goflist[[5]])
#> Note: some statistics are not plotted because their variance is 0.
#> This holds for the statistic: 7.


References

Lospinoso, J., and T. A. B. Snijders. 2019. “Goodness of Fit for Stochastic Actor-Oriented Models.” Methodological Innovations 12 (3). https://doi.org/10.1177/2059799119884282.



Copyright © 2021 Rob Franken