Vignette: IGEs through the lens of neighbor genotypic identity

Background

Indirect genetic effects (IGEs) are ubiquitous in nature and agriculture. To enable genome-wide association study (GWAS) and genomic prediction (GP) of IGEs, we have developed a multi-kernel mixed model incorporating neighbor genotypes. This offers flexible implementation of multiple models as follows

Model 1-1: Original Neighbor GWAS model

The first model corresponds to the original model of Neighbor GWAS (Sato et al. 2021), which explains the trait of \(i\)-th individual within a local space \(k\), \(y_{k_i}\), by own genetic effects at \(q\)-th SNP marker \(x_{q,k_i}\) and its mean similarity with \(j\)-th neighboring individual (\(x_{q,k_i} x_{q,k_j})\) over total \(J\) neighbors as follows.

\[y_{k_i} = \beta_0 + \beta_{q,1} x_{q,k_i} + \beta_{q,2} \sum_{k_j=1}^{J_k}{(x_{q,k_i} x_{q,k_j})}/J + u_{k_i} + e_{k_i}~~~\text{(Eq. 1-1a)}\] where \(\beta_{q,1}\) and \(\beta_{q,2}\) indicate self and neighbor genotypic effects, respectively. \(\beta_0\), \(u_{k_i}\), and \(e_{k_i}\) indicate the intercept, random effects, and residuals. Eq. 1-1a is also expressed as follows.

\[\mathbf{y} = \mathbf{X}_0\mathbf{\beta}_{0} + \mathbf{X}_\text{1}\mathbf{\beta}_\text{1} + \mathbf{X}_\text{2}\mathbf{\beta}_\text{2} + \mathbf{Z}\mathbf{u} + \mathbf{e}~~~\text{(Eq. 1-1b)}\]

where the random effects constitute additive effects from self and neighbor genotypes as \(\mathbf{u} \sim N(\mathbf{0},\sigma^2_1\mathbf{K}_1+\sigma^2_2\mathbf{K}_2)\). The former kernel \(\mathbf{K}_1\) is the ordinal genomic relatedness matrix (GRM), while the latter kernel \(\mathbf{K}_2 = \mathbf{X}_\text{2}\mathbf{X}_\text{2}^\mathsf{T}\) represents global allelic similarity whose elements are defined as \((\mathbf{K}_{2})_{k,l} = \sum_{q=1}^Q[(x_{q,k_i}x_{q,l_i}){\sum_{j=1}^{J_k}(x_{q,k_j})\sum_{j=1}^{J_l}(x_{q,l_j})}]/(Q-1)\) between local spaces \(k\) and \(l\).

Model 1-2: The proposed joint model

The second model extends the first model to incorporate covariance between self and neighbor genotypic effects, which enables joint modeling of polygenic and oligogenic IGEs.

\[y_{k_i} = \beta_{q,0} + \beta_{q,1} x_{q,k_i} + \beta_{q,12} \sum_{j_k=1}^{J_k}{x_{q,k_j}}/J_k + \beta_{q,2} \sum_{k_j=1}^J{(x_{q,k_i} x_{q,k_j})}/J_k + u_{k_i,1} + u_{k_i,2} + e_{k_i}\]

\[\mathbf{y} = \mathbf{X}_0\mathbf{\beta}_{0} + \mathbf{X}_1\mathbf{\beta}_{1} + \mathbf{X}_2\mathbf{\beta}_{2} + \mathbf{X}_{12}\mathbf{\beta}_{12} + \mathbf{Z}\mathbf{u}_1 + \mathbf{Z}\mathbf{u}_2 + \mathbf{e}\]

where

\[\left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \end{array} \right) \sim \text{MVN} [ \left(\begin{array}{c} \mathbf{0} \\ \mathbf{0} \end{array} \right), \left(\begin{array}{cc} \sigma^2_1 \mathbf{K}_\text{1} & \rho_{12}\sigma_1\sigma_2 \mathbf{K}_\text{12} \\ \rho_{12}\sigma_1\sigma_2 \mathbf{K}_\text{21} & \sigma^2_2 \mathbf{K}_\text{2} \end{array} \right) ]~~~\text{(Eq. 1).}\]

The random effects follow a multi-variate normal (MVN) distribution that considers the covariance \(\rho_{12}\sigma_1\sigma_2\) between DGEs and IGEs. The off-diagonal kernels \(\mathbf{K}_\text{12} = \mathbf{X}_1\mathbf{X}_2^\mathsf{T}\) and \(\mathbf{K}_\text{21}=\mathbf{X}_2\mathbf{X}_1^\mathsf{T}\) represent asymmetric effects from one allele to another in a neighborhood.

Model 2-1: Conventional IGE model (w/o cov)

Conventional IGE models can be implemented using RAINBOWR package. The third model corresponds to those without DGE-IGE covariance.

\[\mathbf{y} = \mathbf{X}_0\mathbf{\beta}_{0} + \mathbf{Z}_1\mathbf{u}_1 + \mathbf{Z}_2\mathbf{u}_2 + \mathbf{e}\]

where

\[\left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \end{array} \right) \sim \text{MVN} [ \left(\begin{array}{c} \mathbf{0} \\ \mathbf{0} \end{array} \right), \left(\begin{array}{cc} \sigma^2_1 & 0 \\ 0 & \sigma^2_2 \end{array} \right) \otimes \mathbf{K}_\text{1} ]~~~\text{(Eq. 2-2).}\]

Model 2-2: Conventional IGE model (w/ cov)

The fourth model corresponds to a conventional IGE model with DGE-IGE covariance.

\[\mathbf{y} = \mathbf{X}_0\mathbf{\beta}_{0} + \mathbf{Z}_1\mathbf{u}_1 + \mathbf{Z}_2\mathbf{u}_2 + \mathbf{e}\]

where

\[\left(\begin{array}{c} \mathbf{u}_1 \\ \mathbf{u}_2 \end{array} \right) \sim \text{MVN} [ \left(\begin{array}{c} \mathbf{0} \\ \mathbf{0} \end{array} \right), \left(\begin{array}{cc} \sigma^2_1 & \rho_{12}\sigma_1\sigma_2 \\ \rho_{12}\sigma_1\sigma_2 & \sigma^2_2 \end{array} \right) \otimes \mathbf{K}_\text{1} ]~~~\text{(Eq. 2-2).}\]

Model 3-1: Modified Neighbor GWAS model

The fifth model corresponds to the modified Neighbor GWAS model that incorporates asymmetric effects between focal and neighboring genotypes (Sato et al. 2023, 2025).

\[y_{k_i} = \beta_0 + \beta_{q,\text{s}} x_{q,k_i} + \beta_{q,\text{n}} \sum_{k_j=1}^J{x_{q,k_j}}/J_k + \beta_{q,\text{s×n}} \sum_{k_j=1}^J{(x_{q,k_i} x_{q,k_j})}/J_k + u_{k_i} + e_{k_i}~~~\text{(Eq. 3-1a)}\]

\[\mathbf{y} = \mathbf{X}_0\mathbf{\beta}_{0} + \mathbf{X}_\text{s}\mathbf{\beta}_\text{s} + \mathbf{X}_\text{n}\mathbf{\beta}_\text{n} + \mathbf{X}_\text{s×n}\mathbf{\beta}_\text{s×n} + \mathbf{Z}\mathbf{u} + \mathbf{e}~~~\text{(Eq. 3-1b)}\]

This Eq. 3-1 considers a multiplicative interaction term for oligogenic IGEs (a) and its polygenic effects (b). The random effects are assumed to be additive as \(\mathbf{u} \sim N(\mathbf{0},\sigma^2_1\mathbf{K}_1+\sigma^2_2\mathbf{K}_2+\sigma^2_{12}\mathbf{K}_1\odot\mathbf{K}_2)\).

Workflow

In this section, we showcase how to implement the above-mentioned five models using example data.

Preparation

Install packages

We need to install the RAINBOWR and rNeighborGWAS packages. Both are available at CRAN and GitHub as follows.

CRAN (stable ver.)

install.packages("RAINBOWR")
install.packages("rNeighborGWAS")

GitHub (developer ver.)

devtools::install_github("KosukeHamazaki/RAINBOWR")
devtools::install_github("yassato/rNeighborGWAS")

Load packages and example data

First of all, let us load the two packages.

require(RAINBOWR)
require(rNeighborGWAS)

Subsequently, load the example data from the RAINBOWR package. The original data are available at Rice Diversity website (), which include phenotype and genotype data on approximately 400 rice varieties (Zhao et al. 2011).

data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno

Here we obtain the genotype data on 1311 SNP markers for 395 varieties as well as phenotype data on 36 agronomic traits for 413 varieties.

See(Rice_geno_score)

##                  L1        L2        L3        L4        L5        L6
## class     <integer> <integer> <integer> <integer> <integer> <integer>
## id1000223         1         1        -1        -1        -1        -1
## id1000556        -1        -1         1         1        -1         1
## id1000673        -1         1        -1         1        -1         1
## id1000830        -1         1         1         1         1         1
## id1000955        -1         1         1         1        -1         1
## id1001073        -1        -1        -1         1         1        -1
## [1] "class: data.frame"
## [1] "dimension: 1311 x 395"

See(Rice_geno_map)

##              marker       chr       pos
## class      <factor> <integer> <integer>
## id1000223 id1000223         1    420422
## id1000556 id1000556         1    655693
## id1000673 id1000673         1    740153
## id1000830 id1000830         1    913806
## id1000955 id1000955         1   1041748
## id1001073 id1001073         1   1172387
## [1] "class: data.frame"
## [1] "dimension: 1311 x 3"

See(Rice_pheno)

##       Flowering.time.at.Arkansas Flowering.time.at.Faridpur
## class                  <numeric>                  <integer>
## L1                   75.08333333                         64
## L3                          89.5                         66
## L4                          94.5                         67
## L5                          87.5                         70
## L6                   89.08333333                         73
## L7                           105                       <NA>
##       Flowering.time.at.Aberdeen FT.ratio.of.Arkansas.Aberdeen
## class                  <integer>                     <numeric>
## L1                            81                   0.926954733
## L3                            83                   1.078313253
## L4                            93                   1.016129032
## L5                           108                   0.810185185
## L6                           101                   0.882013201
## L7                           158                   0.664556962
##       FT.ratio.of.Faridpur.Aberdeen Culm.habit
## class                     <numeric>  <numeric>
## L1                      0.790123457          4
## L3                      0.795180723        7.5
## L4                      0.720430108          6
## L5                      0.648148148        3.5
## L6                      0.722772277          6
## L7                             <NA>          3
## [1] "class: data.frame"
## [1] "dimension: 413 x 36"

Let us analyze the trait on the first column.

trait.no <- 1
y <- as.matrix(Rice_pheno[, trait.no, drop = FALSE])

To exclude rare variants in advance, we set the cut-off value of minor allele frequency (MAF) at 0.05.

x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map,
                       min.MAF = 0.05)
x <- MAF.cut.res$x
map <- MAF.cut.res$map

Creating the (pseudo)score of neighbor interactions

The original data lack spatial information; therefore, for the example data we assume that varieties on the neighboring rows in the SNP matrix are cultivated in a neighborhood. Below the fixed-effect covariate xAdj indicates the mean allelic similarity between the focal and neighboring individuals as defined by the original Neighbor GWAS model (Sato et al. 2021), while the design matrix ZAdj specifies which individuals interact in a neighborhood as formulated by the ordinal IGE model (e.g., Bijma 2014).

xAdj <- array(data = NA, dim = dim(x), dimnames = dimnames(x))
ZAdj <- array(data = 0, dim = c(nrow(x), nrow(x)),
              dimnames = c(list(rownames(x)), list(rownames(x))))
for (i in 1:nrow(x)) {
  adjs <- (i - 1):(i + 1)
  adjs <- adjs[adjs %in% 1:nrow(x)]
  adjs <- adjs[adjs != i]
  nAdjs <- length(adjs)
  
  xAdj[i, ] <- x[i, , drop = FALSE] *
    apply(X = x[adjs, , drop = FALSE],
          MARGIN = 2, FUN = mean)
  ZAdj[i, adjs] <- 1 / nAdjs
}

The same thing can be made easy using the nei_coval function of rNeighborGWAS package.

smap <- cbind(x=c(1:nrow(x)), y=rep(1,nrow(x))) # spatial map
xAdj2 <- nei_coval(geno=x, smap=smap, scale=1+0.01) # refer to the nearest neighbors only
sum(xAdj2!=xAdj) # all same between this and xAdj above

## [1] 0

Genomic related matrices (GRMs)

Then we calculate a linear kernel composed of the ordinal GRM (i.e., kinship matrix \(\mathbf{K}_1\) that represents direct genetic effects, DGEs) and neighbor-weighted GRM (i.e., \(\mathbf{K}_2\)), which assumes additive effects from polygenic DGEs and IGEs.

K.A <- tcrossprod(x) / ncol(x)
K.Adj <- tcrossprod(xAdj) / ncol(xAdj)  # for neighbor kernel

The phenotypes and genotypes should be matched using a design matrix.

Z <- design.Z(pheno.labels = rownames(y),
              geno.names = rownames(K.A))  ### design matrix for random effects
ZAdj <- ZAdj[rownames(Z), , drop = FALSE]
pheno.mat <- y[rownames(Z), , drop = FALSE]

Three kernel combinations

The following shows combinations among the three kernels including the original Neighbor GWAS model. The first model shows how to implement the original Neighbor GWAS using the RAINBOWR package. This comprises the neighbor-weighted GRM \(\mathbf{K}_2\) in addition to the ordinal GRM \(\mathbf{K}_1\). In the RAINBOWR package, the nested list object specifies kernel combinations as follows.

ZETA1 <- list(A = list(Z = Z, K = K.A),
              Int = list(Z = Z, K = K.Adj))

The second model corresponds to the ordinal IGE model, which defines interacting individuals by the design matrix ZAdj. In this case, IGEs represent not interactive but one-sided effects from neighboring individuals.

ZETA2 <- list(A = list(Z = Z, K = K.A),
              Adj = list(Z = ZAdj, K = K.A))

The third model includes all of DGEs, IGEs, and their interactions. However, this model does not incorporate any covariance (which ranges from \(-\infty\) to \(\infty\)) but variance (>0) for the three kernels.

ZETA3 <- list(A = list(Z = Z, K = K.A),
              Int = list(Z = Z, K = K.Adj),
              Adj = list(Z = Z, K = K.A*K.Adj))

Covariance structure

In addition to the variance, let us include DGE-IGE covariance in the models 1-2 and 2-2. According to Eq. 1 of Sato & Hamazaki (yyyy), here we consider the covariance structure for ZETA1 as follows.

K12 <- tcrossprod(x, xAdj) / sqrt(ncol(x) * ncol(xAdj)) # = (q-1) when ncol(x) = ncol(xAdj)
K21 <- t(K12) # = "tcrossprod(xAdj, x) / sqrt(ncol(x) * ncol(xAdj))" but much faster

Note that K12 corresponds to K21 when transposed as K12 = t(K21).

In RAINBOWR, the covariance structure is specified in the off-diagonal elements of \(m \times m\) list when there are \(m\) kernels.

covList1 <- rep(list(rep(list(NULL), 2)), 2)
covList1[[1]][[2]] <- K12
covList1[[2]][[1]] <- K21

Then let us specify the covariance structure of ZETA2. In the ordinal IGE model, the kinship matrix \(\mathbf{K}_1\) is shared among DGEs and IGEs while the design matrix was different between DGEs and IGEs. So, we assign K.A to the two off-diagonal elements before specifying the design matrices Z and ZAdj.

covList2 <- rep(list(rep(list(NULL), 2)), 2)
covList2[[1]][[2]] <- K.A
covList2[[2]][[1]] <- K.A

For the model 3-1, we do not consider covariance for ZETA3 as it includes interaction terms instead of covariance.

Genomic prediction (GP) based on the five models

Model 1-1 & 1-2

The first and second model adopt ZETA1 as a multi-kernel: The former corresponds to the original Neighbor GWAS model without covariance while the latter presents those with the covariance. Both models incorporate self-genotypic effects K and neighbor genotypic interactions K.Adj as DGEs and IGEs, respectively.

EM3.res11 <- EM3.cpp(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1)
EM3.res12 <- EM3.cov(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1,
                     covList = covList1)

Model 2-1 & 2-2

The third and fourth model adopt ZETA2 as a multi-kernel: The former and latter represent models without and with covariance, respectively. Both models designate DGEs and IGEs by the design matrices Z and Z.Adj.

EM3.res21 <- EM3.cpp(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA2)
EM3.res22 <- EM3.cov(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA2,
                     covList = covList2, forceApproxK = TRUE)

## Error in eval(expr, envir) : K not positive semi-definite.

Note that, if the multi-kernel cannot be positive semi-definite, forceApproxK = TRUE approximates the kernel to the nearest positive-definite matrix.

Model 3-1

The fifth model considers DGE-IGE interactions based on element-wise interaction term between K and K.Adj instead of the covariance parameters. This model adopts ZETA3 including three kernels as K, K.Adj and K*K.Adj, but does not consider covariance among them.

EM3.res31 <- EM3.cpp(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA3)

In EM3.cov function, nIterOptimization option determines the maximum number of iterations until convergence.

Genome-wide association study (GWAS)

To make GWAS efficient, we calculate a weighted kernel \(\mathbf{K}'\) based on the kernel weights: \(\mathbf{K}' = \sigma^2_1\mathbf{K}_1+\sigma^2_2\mathbf{K}_2\) for Model 1-1; \(\mathbf{K}' = w_1\mathbf{K}_1+\rho_{12}\sqrt{w_1w_2}(\mathbf{K}_{12}+\mathbf{K}_{21})+w_2\mathbf{K}_2\) for Model 1-2; and \(\mathbf{K}' = \sigma^2_1\mathbf{K}_1+\sigma^2_2\mathbf{K}_2 + \sigma^2_{12}\mathbf{K}_1\odot\mathbf{K}_2\) for Model 3-1. Note that SNP-marker effects are not formulated in Model 2-1 and 2-2. The line of GWAS workflow are implemented as nei_lmm or covnei_lmm functions, which internally uses gaston package (Perdry & Dandine-Roulland 2023) and its lmm.diago function.

Model 1-1

This workflow corresponds to the original Neighbor GWAS (Sato et al. 2021). See also the vignette of rNeighborGWAS package at https://CRAN.R-project.org/package=rNeighborGWAS.

# excl. NAs
nonNAs <- (is.na(pheno.mat[,1])==FALSE)
pheno <- pheno.mat[nonNAs,,drop=FALSE]
geno <- x[row.names(pheno),]
g_nei <- xAdj[row.names(pheno),]

# run GWAS
GWAS.res11 <- nei_lmm(geno = geno,
                      g_nei = g_nei,
                      pheno = pheno,
                      addcovar = NULL,
                      response = "quantitative")

Model 1-2

This model incorporates the covariance parameter \(\rho_{12}\) into the original Neighbor GWAS. The following covnei_lmm function acts as a wrapper to bridge RAINBOWR and rNeighborGWAS packages, which requires EM3.res12 object of the RAINBOWR package.

covnei_lmm = function(geno, g_nei, pheno, X0=matrix(1,nrow=length(pheno)), EM3.res12, K1=NULL, K2=NULL, K12=NULL, K21=NULL, n_core=1L){
  q <- ncol(geno)
  
  if(is.null(K1)==TRUE){
    selfCross = tcrossprod(geno)
    K1 <- (q+selfCross)/(2*(q-1))
  }
  if(is.null(K2)==TRUE){
    neiCross = tcrossprod(g_nei)
    K2 <- neiCross/(q-1)
  }
  if(is.null(K12)==TRUE){
    K12 <- tcrossprod(geno,g_nei)/(q-1)
  }
  if(is.null(K21)==TRUE){
    K21 <- tcrossprod(g_nei,geno)/(q-1)
  }
  
  NAs <- (is.na(pheno)==FALSE)
  y <- pheno[NAs]
  geno <- geno[NAs,]
  g_nei <- g_nei[NAs,]
  X0 <- as.matrix(X0[NAs,])
  selfK <- K1[NAs,NAs]
  neiK = K2[NAs,NAs]
  K12 <- K12[NAs,NAs]
  K21 <- K21[NAs,NAs]
  
  K_prime <- EM3.res12$weights[1]*selfK + 
    EM3.res12$weights[2]*neiK + 
    EM3.res12$rhosMat[1,2]*sqrt(EM3.res12$weights[1]*EM3.res12$weights[2])*(K12 + K21)
  
  eigenK <- eigen(K_prime)
  lmm0 <- gaston::lmm.diago(Y=y,X=X0,eigenK=eigenK,verbose=FALSE)
  LL0 <- gaston::lmm.diago.profile.likelihood(tau=lmm0$tau,s2=lmm0$sigma2,h2=lmm0$tau/(lmm0$tau+lmm0$sigma2),Y=y,X=X0,eigenK=eigenK)
  
  test_i = function(i){
    X1 <- cbind(X0,geno[,i])
    lmm1 <- gaston::lmm.diago(Y=y,X=X1,eigenK=eigenK,verbose=FALSE)
    LL1 <- gaston::lmm.diago.profile.likelihood(tau=lmm1$tau,s2=lmm1$sigma2,h2=lmm1$tau/(lmm1$tau+lmm1$sigma2),Y=y,X=X1,eigenK=eigenK)
    
    X2 <- cbind(X1,g_nei[,i])
    lmm2 <- gaston::lmm.diago(Y=y,X=X2,eigenK=eigenK,verbose=FALSE)
    LL2 <- gaston::lmm.diago.profile.likelihood(tau=lmm2$tau,s2=lmm2$sigma2,h2=lmm2$tau/(lmm2$tau+lmm2$sigma2),Y=y,X=X2,eigenK=eigenK)
    b2 <- lmm2$BLUP_beta[(length(lmm2$BLUP_beta)-1):length(lmm2$BLUP_beta)]
    
    res_i <- c(b2,2*(LL1[1,1]-LL0[1,1]),2*(LL2[1,1]-LL1[1,1]))
    names(res_i) <- c("beta1","beta2","chisq1","chisq2")
    
    return(res_i)
  }
  
  res <- parallel::mcmapply(test_i,1:ncol(geno),mc.cores=n_core)
  res <- t(res)
  res <- as.data.frame(res)
  
  p1 <- pchisq(res$chisq1,1,lower.tail=FALSE)
  p2 <- pchisq(res$chisq2,1,lower.tail=FALSE)
  res <- data.frame(res,p1,p2)
  
  return(res)
}

GWAS.res12 <- covnei_lmm(geno = geno,
                      g_nei = g_nei,
                      pheno = pheno,
                      EM3.res12 = EM3.res12)

head(data.frame(map,GWAS.res12))

##              marker chr     pos      beta1      beta2    chisq1     chisq2
## id1000223 id1000223   1  420422  1.0950853  0.7386768 0.8519919 0.98647091
## id1000556 id1000556   1  655693  0.8596550 -0.2852650 0.8444014 0.18157486
## id1000673 id1000673   1  740153 -0.6728780 -0.1065715 0.5689475 0.02304063
## id1000830 id1000830   1  913806  0.6822974 -0.7380687 0.1687383 0.84352713
## id1000955 id1000955   1 1041748 -1.3783543  0.5487375 2.6608008 0.61958091
## id1001073 id1001073   1 1172387  1.1562109 -0.5314973 1.5818525 0.46410230
##                  p1        p2
## id1000223 0.3559894 0.3206064
## id1000556 0.3581410 0.6700233
## id1000673 0.4506774 0.8793515
## id1000830 0.6812356 0.3583900
## id1000955 0.1028489 0.4312031
## id1001073 0.2084941 0.4957130

Model 3-1

This model tests asymmetric neighbor effects \(\beta_{12}\) in addition to \(\beta_2\), which can be performed with asym=TRUE option in nei_lmm function.

GWAS.res31 <- nei_lmm(geno = geno,
                      g_nei = g_nei,
                      pheno = pheno,
                      addcovar = NULL,
                      response = "quantitative",
                      asym=TRUE)

head(data.frame(map,GWAS.res31))

##              marker chr     pos  beta_self    beta_nei   beta_sxn    p_self
## id1000223 id1000223   1  420422  1.1310357  0.62706894  0.2393227 0.3370861
## id1000556 id1000556   1  655693  0.9250039 -0.34678550  0.2827820 0.3398849
## id1000673 id1000673   1  740153 -0.6363843 -0.09520183 -0.7126124 0.4729718
## id1000830 id1000830   1  913806  0.7148453 -0.82373807  0.1175711 0.7004013
## id1000955 id1000955   1 1041748 -1.4105745  0.68318349 -0.6269288 0.1023959
## id1001073 id1001073   1 1172387  1.0603056 -0.40513604  0.2873576 0.2598081
##               p_nei     p_sxn
## id1000223 0.3424974 0.7435387
## id1000556 0.6001774 0.7090938
## id1000673 0.9358733 0.2921232
## id1000830 0.3365268 0.9182487
## id1000955 0.4144168 0.3650179
## id1001073 0.4969403 0.7290837

Output

Genomic prediction (GP)

To evaluate the five models, we put them all into a single list object.

modelNames <- c("M11", "M12", "M21", "M22", "M31")
EM3.res.list <- list(EM3.res11, EM3.res12, EM3.res21,
                     EM3.res22, EM3.res31)
names(EM3.res.list) <- modelNames

Log-likelihood

Here shows the log-likelihood of the five models. Although neighbors were haphazardly assigned in these examples, the models including the covariance \(\rho_{12}\) had a higher likelihood than those without \(\rho_{12}\).

(LLs <- sapply(X = EM3.res.list,
               FUN = function(x) {
                 x$LL
               }))

##       M11       M12       M21       M22       M31 
## -1289.823 -1289.744 -1289.823 -1285.941 -1289.823

Kernel weights

Then we see the estimated kernel weight \(w_1\) and \(w_2\) for the five models. As these examples involve toy data, \(w_2\) is estimated to be very small.

(weights <- sapply(X = EM3.res.list,
                   FUN = function(x) {
                     x$weights
                   }))

## $M11
## [1] 1 0
## 
## $M12
## [1] 0.996814328 0.003185672
## 
## $M21
## [1] 1 0
## 
## $M22
## [1] 0.97026118 0.02973882
## 
## $M31
## [1] 1 0 0

Covariance parameters

The covariance parameters \(\rho_{12}\) can be shown for Models 1-2 and 2-2. As \(\rho_{12}\) is unconstrained, its estimates do not necessarily range from -1 to +1.

(rhos <- sapply(X = EM3.res.list[c(2, 4)],
                FUN = function(x) {
                  x$rhosMat[2, 1]
                }))

##      M12      M22 
## 1.055870 1.028071

Total genetic variance

The fraction of total genetic variation is as follows.

(herits <- sapply(X = EM3.res.list,
                  FUN = function(x) {
                    x$Vu / (x$Vu + x$Ve)
                  }))

##       M11       M12       M21       M22       M31 
## 0.8429155 0.8423811 0.8429155 0.8603067 0.8429155

The proportion of phenotypic variation explained (PVE) by DGEs, IGEs, and their covariance can be calculated for Model 1-2 and 2-2 as follows.

# PVE by DGE
(pve1 <- sapply(X = EM3.res.list,
                  FUN = function(x) {
                    x$weights[1]*x$Vu / (x$Vu + x$Ve)
                  }))

##       M11       M12       M21       M22       M31 
## 0.8429155 0.8396976 0.8429155 0.8347222 0.8429155

# PVE by IGE
(pve2 <- sapply(X = EM3.res.list,
                  FUN = function(x) {
                    x$weights[2]*x$Vu / (x$Vu + x$Ve)
                  }))

##        M11        M12        M21        M22        M31 
## 0.00000000 0.00268355 0.00000000 0.02558451 0.00000000

# PVE by the covariance for Models 1-2 and 2-2
(pve12 <- sapply(X = EM3.res.list[c(2, 4)],
                FUN = function(x) {
                  x$rhosMat[2, 1]*sqrt(x$weights[1]*x$weights[2])*x$Vu / (x$Vu + x$Ve)
                }))

##        M12        M22 
## 0.05012179 0.15023903

Predicted phenotypes

Predicted phenotype values are given as follows.

y.preds <- sapply(X = EM3.res.list,
                  FUN = function(x) {
                    x$y.pred
                  })
cor(y.preds)

##           M11       M12       M21       M22       M31
## M11 1.0000000 0.9998845 1.0000000 0.9948186 1.0000000
## M12 0.9998845 1.0000000 0.9998845 0.9948154 0.9998845
## M21 1.0000000 0.9998845 1.0000000 0.9948186 1.0000000
## M22 0.9948186 0.9948154 0.9948186 1.0000000 0.9948186
## M31 1.0000000 0.9998845 1.0000000 0.9948186 1.0000000

psych::pairs.panels(y.preds)

The five models do not show large differences in their predicted values.

Estimates of random effects

We can also view random effects for polygenic DGEs and IGEs. As these involve toy data, IGE estimates are almost zero.

u.eachs <- sapply(X = EM3.res.list,
                  FUN = function(x) {
                    x$u.each
                  })
See(u.eachs$M11, fh = TRUE)

##                        V1
## class           <numeric>
## K_1_L1  -16.0262667714759
## K_1_L2   -2.1387771256543
## K_1_L3  -4.32658478952284
## K_1_L4  0.440184604114526
## K_1_L5   0.17124852050821
## K_1_L6 -0.453247418475304
## [1] "class: matrix & array"
## [1] "dimension: 790 x 1"

See(u.eachs$M11, fh = FALSE)

##                 V1
## class    <numeric>
## K_2_L649         0
## K_2_L650         0
## K_2_L651         0
## K_2_L652         0
## K_2_L654         0
## K_2_L41          0
## [1] "class: matrix & array"
## [1] "dimension: 790 x 1"

Genome-wide association study (GWAS)

For GWAS, we can depict Manhattan plots as follows.

Model 1-1

The following shows Manhattan plot of neighbor genotypic effects \(\beta_2\) for Model 1-1.

head(data.frame(map,GWAS.res11))

##              marker chr     pos  beta_self    beta_nei    p_self     p_nei
## id1000223 id1000223   1  420422  1.1043131  0.67056936 0.3370861 0.3424974
## id1000556 id1000556   1  655693  0.9093023 -0.36706116 0.3398849 0.6001774
## id1000673 id1000673   1  740153 -0.6318407 -0.05512477 0.4729718 0.9358733
## id1000830 id1000830   1  913806  0.7040809 -0.77643255 0.7004013 0.3365268
## id1000955 id1000955   1 1041748 -1.4309101  0.55525842 0.1023959 0.4144168
## id1001073 id1001073   1 1172387  1.0330929 -0.52400706 0.2598081 0.4969403

out11 <- data.frame(chr=map$chr,pos=map$pos,p=GWAS.res11$p_nei)
gaston::manhattan(x=out11,main="GWAS.res11; beta_2",las=1)

Model 1-2

The following shows Manhattan plot of neighbor genotypic effects \(\beta_2\) for Model 1-2.

head(data.frame(map,GWAS.res12))

##              marker chr     pos      beta1      beta2    chisq1     chisq2
## id1000223 id1000223   1  420422  1.0950853  0.7386768 0.8519919 0.98647091
## id1000556 id1000556   1  655693  0.8596550 -0.2852650 0.8444014 0.18157486
## id1000673 id1000673   1  740153 -0.6728780 -0.1065715 0.5689475 0.02304063
## id1000830 id1000830   1  913806  0.6822974 -0.7380687 0.1687383 0.84352713
## id1000955 id1000955   1 1041748 -1.3783543  0.5487375 2.6608008 0.61958091
## id1001073 id1001073   1 1172387  1.1562109 -0.5314973 1.5818525 0.46410230
##                  p1        p2
## id1000223 0.3559894 0.3206064
## id1000556 0.3581410 0.6700233
## id1000673 0.4506774 0.8793515
## id1000830 0.6812356 0.3583900
## id1000955 0.1028489 0.4312031
## id1001073 0.2084941 0.4957130

out12 <- data.frame(chr=map$chr,pos=map$pos,p=GWAS.res12$p2)
gaston::manhattan(x=out12,main="GWAS.res12; beta_2",las=1)

plot(-log10(out11$p),-log10(out12$p),las=1)

Here, Models 1-1 and 1-2 yielded similar p-values, indicating a limited role of the covariance parameter \(\rho_{12}\) in improving GWAS.

Model 3-1

Additionally, we can see asymmetric neighbor effects \(\beta_{12}\). The following shows Manhattan plot of \(\beta_{12}\) for Model 3-1.

head(data.frame(map,GWAS.res31))

##              marker chr     pos  beta_self    beta_nei   beta_sxn    p_self
## id1000223 id1000223   1  420422  1.1310357  0.62706894  0.2393227 0.3370861
## id1000556 id1000556   1  655693  0.9250039 -0.34678550  0.2827820 0.3398849
## id1000673 id1000673   1  740153 -0.6363843 -0.09520183 -0.7126124 0.4729718
## id1000830 id1000830   1  913806  0.7148453 -0.82373807  0.1175711 0.7004013
## id1000955 id1000955   1 1041748 -1.4105745  0.68318349 -0.6269288 0.1023959
## id1001073 id1001073   1 1172387  1.0603056 -0.40513604  0.2873576 0.2598081
##               p_nei     p_sxn
## id1000223 0.3424974 0.7435387
## id1000556 0.6001774 0.7090938
## id1000673 0.9358733 0.2921232
## id1000830 0.3365268 0.9182487
## id1000955 0.4144168 0.3650179
## id1001073 0.4969403 0.7290837

out31 <- data.frame(chr=map$chr,pos=map$pos,p=GWAS.res31$p_sxn)
gaston::manhattan(x=out31,main="GWAS.res31; beta_12",las=1)

Tips

Initial parameters of the kernel weights

In the default setting, initial parameters are given to the kernel weights as (1 / no. of kernels) in EM3.cov function. These initial parameters can be defined by users with parInitForWeights option.

EM3.res11 <- EM3.cpp(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1)
w <- EM3.res11$weights
EM3.res12 <- EM3.cov(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1,
                     parInitForWeights = w,
                     covList = covList1)

print(w); print(EM3.res12$weights) # weights remain the same

## [1] 1 0

## [1] 1 0

print(EM3.res12$rhosMat[1,2]) # covariance estimated as 0 due to w = c(1,0)

## [1] 0

Covariance with fixed kernel weights

In case \(\rho_{12}\) is difficult to estimate, we may fix the kernel weights by setting optimizeWeights = FALSE (whose default is TRUE) as follows.

EM3.res11 <- EM3.cpp(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1)
w <- EM3.res11$weights
EM3.res12 <- EM3.cov(y = pheno.mat,
                     X0 = NULL,
                     ZETA = ZETA1,
                     optimizeWeights = FALSE,
                     parInitForWeights = w,
                     covList = covList1)

print(EM3.res12$rhosMat[1,2]) # the same as above due to w = c(1,0)

## [1] 0

Skip kernel calculations in GWAS

We can set pre-defined kernels for covnei_lmm function. This enables us to skip kernel calculations and thus speed it up when repeating GWAS for multiple traits. Default setting is K1 = NULL, K2 = NULL, K12 = NULL, and K21 = NULL, which takes plenty of time to run tcrossprod function internally.

GWAS.res12.list <- c()
for(i in 1:2){
  pheno <- as.matrix(Rice_pheno[rownames(x), i, drop = FALSE])
  GWAS.res12 <- covnei_lmm(geno = x,
                      g_nei = xAdj,
                      pheno = pheno,
                      EM3.res12 = EM3.res12,
                      K1 = K.A, K2 = K.Adj, 
                      K12 = K12, K21 = K21)
  GWAS.res12.list <- c(GWAS.res12.list, list(GWAS.res12))
}

head(data.frame(map,GWAS.res12.list[[1]]))

##              marker chr     pos      beta1       beta2    chisq1      chisq2
## id1000223 id1000223   1  420422  1.1047248  0.67085261 0.9184187 0.904102913
## id1000556 id1000556   1  655693  0.9096194 -0.36720733 0.9211814 0.304358147
## id1000673 id1000673   1  740153 -0.6317598 -0.05508234 0.5087679 0.003905352
## id1000830 id1000830   1  913806  0.7043789 -0.77653272 0.1378291 0.952550528
## id1000955 id1000955   1 1041748 -1.4307693  0.55528082 2.7406396 0.637505998
## id1001073 id1001073   1 1172387  1.0327485 -0.52390255 1.2654538 0.459201209
##                   p1        p2
## id1000223 0.33789052 0.3416839
## id1000556 0.33716496 0.5811631
## id1000673 0.47567271 0.9501704
## id1000830 0.71044876 0.3290709
## id1000955 0.09782561 0.4246154
## id1001073 0.26062128 0.4979975

head(data.frame(map,GWAS.res12.list[[2]]))

##              marker chr     pos      beta1      beta2     chisq1    chisq2
## id1000223 id1000223   1  420422 -0.2173537 -0.5249002 0.05926873 0.6593742
## id1000556 id1000556   1  655693 -0.2984151 -0.6536274 0.11584653 1.1833809
## id1000673 id1000673   1  740153  0.6683126 -0.5511518 0.89083349 0.7975076
## id1000830 id1000830   1  913806  0.4731345 -1.1611040 0.01844195 2.4346423
## id1000955 id1000955   1 1041748 -0.6181426 -0.8968470 1.15383137 2.1684414
## id1001073 id1001073   1 1172387 -1.3236568 -0.4987488 1.99477788 0.5320323
##                  p1        p2
## id1000223 0.8076555 0.4167810
## id1000556 0.7335837 0.2766688
## id1000673 0.3452515 0.3718396
## id1000830 0.8919785 0.1186810
## id1000955 0.2827485 0.1408689
## id1001073 0.1578422 0.4657537

References

Bijma, P. (2014). The quantitative genetics of indirect genetic effects: A selective review of modelling issues. Heredity, 112(1), 61-69. https://doi.org/10.1038/hdy.2013.15
Hamazaki, K., & Iwata, H. (2020). RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2), e1007663. https://doi.org/10.1371/journal.pcbi.1007663
Perdry, H., & Dandine-Roulland, C. (2023). gaston: Genetic Data Handling (QC, GRM, LD, PCA) & Linear Mixed Models. R package version 1.6, https://CRAN.R-project.org/package=gaston.
Sato, Y., Yamamoto, E., Shimizu, K. K., & Nagano, A. J. (2021). Neighbor GWAS: Incorporating neighbor genotypic identity into genome-wide association studies of field herbivory. Heredity, 126(4), 597-614. https://doi.org/10.1038/s41437-020-00401-w
Sato, Y., Takahashi, Y., Xu, C., & Shimizu, K. K. (2023). Detecting frequency-dependent selection through the effects of genotype similarity on fitness components. Evolution, 77(4), 1145-1157. https://doi.org/10.1093/evolut/qpad028
Sato, Y. (2025). Negative frequency-dependent selection underlies overyielding through neighbor genotypic effects in Arabidopsis thaliana. bioRxiv. https://doi.org/10.1101/2025.05.14.654149
Zhao, K., Tung, C.-W., Eizenga, G. C., Wright, M. H., Ali, M. L., Price, A. H., Norton, G. J., Islam, M. R., Reynolds, A., Mezey, J., McClung, A. M., Bustamante, C. D., & McCouch, S. R. (2011). Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature Communications, 2(1), 467. https://doi.org/10.1038/ncomms1467

Faculty of Environmental Earth Science, Hokkaido University (email:yassato@ees.hokudai.ac.jp)↩︎
Institute for Advanced Academic Research (IAAR), Chiba University (email:hamazaki@ut-biomet.org)↩︎