1 Introduction
An important objective in spatial analyses is to make predictions with estimates of uncertainty of one or more spatial processes at unobserved locations given data collected at a set of observed locations. In traditional geostatistics, the set of observed locations, e.g., the sample, is finite while the population, consisting of all the locations within a continuous field that could be observed or sampled, is infinite. In some spatial analyses, however, the population is also finite. For example, there are a finite number of groundwater wells across central California to access underground aquifers or, in the American Community Survey, each census tract consists of a finite number of households. Even with a finite population, the entire population is rarely observed and the set of unobserved locations could be unknown. Sampling designs, such as systematic designs, stratified designs, or spatiallybalanced designs, aim to produce sample data that are representative of the population in order to infer population characteristics. The sampling design for groundwater wells might aim to optimize spatial coverage across the region whereas the sampling design of the American Community Survey is stratified such that tracts with fewer households are sampled at a higher rate. The sampling design has been shown to impact spatial prediction estimates at unobserved locations (Gelfand et al., 2012; Pati et al., 2011; Diggle et al., 2010). Here, we study the effects of the sampling design in spatial Gaussian process parameter estimation, spatial prediction, and uncertainty estimation given a finite population.
We begin with a general definition of informative sampling designs. Let denote the process of interest. Next, assume that is the set of population locations and is the set of sample locations. Informative sampling assumes that the distribution of the variable of interest, , conditional on the sample locations, , is not the same as that depending on the whole population, . Let
denote the sampling probability process. In survey statistics,
is considered known for all and provides information regarding the sampling design. It is often referred to as the sample inclusion probability for location . Informative sampling designs in spatial statistics have been studied primarily in the context of geostatistics under the notion of preferential sampling, as first introduced by Diggle et al. (2010). Preferential sampling is defined such that the joint distribution
. Let denote the location distribution function of . Under preferential sampling, a joint model is specified for and since the sample design is typically unknown.Spatial prediction, or kriging, is often approached as a two stage procedure. At the first stage, spatial dependence in is modeled by computing the empirical semivariogram. Under noninformative or nonpreferential sampling, the empirical semivariogram ordinates are unbiased estimates of the theoretical semivariogram (Zimmerman and Stein, 2010)
. Scatter plots of the empirical and theoretical estimates can be used to suggest parametric semivariogram functions for the data. Customary methods for estimating the semivariogram parameters are ordinary least squares, weighted least squares, maximum likelihood, and restricted maximum likelihood
(Zimmerman and Zimmerman, 1991). More recently, composite likelihood methods for semivariogram parameter estimation were proposed (Lele, 1997).Given semivariogram parameter estimates obtained from any of these approaches, the second stage is to use kriging equations (Cressie, 1993) to obtain prediction means and variances of the process at unobserved locations. Whereas the kriging mean is a function of the observed process, the kriging variance is only indirectly a function of the observed process through the semivariogram parameter estimates. Specifically, given a semivariogram function and a set of parameter estimates, the kriging variance is a function of only the locations for which the data were observed and the locations where predictions are desired.
Sampling designs can impact one or both of the stages of spatial kriging. Diggle et al. (2010)
illustrate bias in variogram estimation under preferential sampling. Specifically, under nonpreferential sampling designs that are either spatially random or clustered, empirical semivariogram estimates are shown to be unbiased. Under preferential sampling, the bias can be severe under either sampling design. This bias is the result of the reduction in the range of values of the response variable of interest, which, in turn, reduces the expected pairwise squared differences for a given spatial lag. For semivariogram estimation, a biascorrected empirical semivarigram estimate has been suggested to account for and detect informative or preferential sampling where weights are defined as the inverse sampling intensity
(Rathbun, 2010). This approach stems from the mark variogram estimator under preferential sampling (Ho and Stoyan, 2008), where parameter estimates can be obtained using method of moments.
Gelfand et al. (2012) argue that the impact of preferential sampling is more important for prediction surfaces than parameter estimation. They illustrate remarkable differences between the spatial prediction surfaces when the observations are chosen randomly versus chosen preferentially. In geostatistical modeling, Bayesian approaches have been developed to account for preferential sampling (e.g., Pati et al., 2011; Diggle et al., 2010; Gelfand et al., 2012). These approaches specify a spatial point process model (e.g., logGaussian Cox process) for the sample location distribution, , such that the models for and have a shared spatial random effect to adjust for the spatial intensity of the sampling design in the process of interest. The focus of this specification is on the first order effects for improved inference and prediction rather than uncertainty estimation.
Kriging variance estimates can vary dramatically across the spatial region based on different sample designs. To illustrate, we turn to nitrate concentration data collected in wells in California. Note that these data are discussed and analyzed more thoroughly in Section 5 and similar data were analyzed in ChanGolston et al. (2020). The population data consist of approximately 7000 wells in central California. The spatial covariance model is assumed fixed and known. We generate two sample realizations to illustrate the impact of sampling design on variance estimation. The two designs include a simple random sample and a spatially stratified sample. Given each set of sample locations, we obtain kriging variance estimates throughout the region. Figure 1 shows the ratio of the kriging variance estimates from the stratified sample relative to the simple random sample. Regions in red (blue) indicate kriging variances that are greater (less than) those obtained under simple random sampling. These deviations in variance estimates vary across the region as well as vary between the different sampling designs. Understanding these impacts of sampling design on spatial kriging uncertainty is the focus of this work.
In survey data, informative sampling designs are accounted for in unitlevel modeling and inference through the use of sampling weights, as done in the HorvitzThompson estimator for population inference (Horvitz and Thompson, 1952). A general approach for likelihoodbased inference with survey data and informative sampling designs is the use of pseudolikelihoods, where each unit’s contribution to the likelihood is weighted by its survey weight (Lumley and Scott, 2017). We note that the term “pseudolikelihood” was first introduced by Besag (1975) in the context of analyzing data having spatial dependence. His approach leveraged Markovian properties of autonormal models and approximated the likelihood as the product of marginal densities. While the pseudolikelihood is not the true likelihood function except in the case of independence, it is a computationally simpler method for parameter estimation than traditional likelihood estimation. See Parker et al. (2019) for a more detailed review of modern methods for unitlevel modeling under informative sampling.
The contribution of this work is the development of approaches for incorporating sampling designs in (1) the semivariogram parameter estimation stage and (2) the kriging equations for variance estimation. We turn to the composite likelihood approach of Lele (1997) for semivariogram estimation. To account for the sampling design, we define a weighted composite likelihood where the weights are either based on the survey weight or sampling intensity. Our weighted composite likelihood approach provides a general framework for semivarioagram parameter estimation under informative and preferential sampling. Given a semivariogram function and a set of parameter estimates, we then develop three schemes for obtaining kriging variance estimates to address the impacts of the sampling design.
The remainder of this paper is outlined as follows. In Section 2, we briefly review semivariogram estimation and then introduce our weighted composite likelihood approach to account for sampling designs. In Section 3 we propose three approaches for obtaining kriging variance estimates under informative sampling designs. We evaluate our approaches in the two stage spatial prediction procedure using a simulation study in Section 4. In Section 5 we apply our approach using two common sampling designs to estimate nitrate concentrations in wells across central California. Lastly, we conclude with a discussion and directions for future work in Section 6.
2 Variogram parameter estimation
Classical approaches for semivariogram estimation are method of moments, maximum likelihood (ML), and restricted maximum likelihood (REML). See Zimmerman and Zimmerman (1991) for a complete review and comparison of these approaches. Method of moments (Matheron, 1963) is likely the most common approach to semivariogram estimation where parameter estimates are obtained by ordinary least squares or weighted least squares (Cressie, 1993). It benefits from being robust and requires no strong distributional assumptions about the process of interest. In addition, plotting the empirical semivariogram is a valuable exploratory tool for investigating spatial dependence as a function of distance. Unfortunately, the method of moments approach also has the significant drawback of requiring subjective specification of distance lags and lag tolerance regions. Maximum likelihood and restricted maximum likelihood approaches are consistent estimators and benefit from not having to make these specifications. Yet, they require a fully specified probabilistic model for the process and are often computationally costly by involving large matrix inversion.
Lele (1997) proposed composite likelihood methods as an alternative approach for semivariogram estimation. The composite likelihood (CL) approach is computationally and mathematically simpler and less reliant on strong distributional assumptions. Specifically, the benefits of the composite approach over the approaches previously mentioned are that it (i) eliminates subjectively specifying the distance lags and lag tolerance regions, (ii) eliminates needing to specify a probabilistic model, and (iii) is computationally efficient. We begin with a brief review of the CL approach of Lele (1997) and then extend to the weighted composite likelihood approach to account for sampling designs.
Let denote the spatial process of interest where represents a spatial coordinate in the domain , often assumed to be twodimensional. Under ordinary kriging (Cressie, 1993), it is assumed that . The semivariogram for the spatial process, is given by
Stationarity in the process implies the semivariogram reduces to a function of the difference between the two locations, . In that case, the covariance between and can be defined as
(1) 
Let denote Euclidean distance between and . When the semivariogram depends only on the distance between locations such that , it is called isotropic. As such, we can write (1) as
(2) 
For , the set of contrasts is defined where . Then,
where represents the parameters of the semivariogram and . Lele (1997) defines the composite likelihood as the product of the marginal densities of the contrasts. Namely,
where each component of the composite likelihood, is a likelihood since is restricted to the class of valid density functions (Lindsay, 1988).
Curriero and Lele (1999) show that the composite likelihood estimate is unbiased as long as . That is, the consistency of the estimator is not reliant on the correct specification of the marginal distribution of the contrasts, . This is not true for maximum likelihood or restricted maximum likelihood approaches where misspecified joint distributions result in biased estimates. Following Curriero and Lele (1999) and for simplicity, we let such that
Then, the logcomposite likelihood is written
(3) 
and composite likelihood estimates of the parameters of are those that maximize (3).
Similar to restricted maximum likelihood, the composite likelihood approach eliminates the nuisance parameter by considering the contrasts . That is, the approach is based solely on the semivariogram parameters of interest. The composite likelihood approach and objective function in (3) is computationally simpler than the maximum likelihood and restricted maximum likelihood objective functions as it requires no matrix inversion.
Bevilacqua et al. (2012) developed an extension to this composite likelihood approach for spacetime covariance functions. They proposed a weighted version of the composite likelihood approach where cutoff weights, , are based on distance in space and time. That is, pairs of observations that are far apart in time and space are downweighted such that they contribute less to the estimation of the semivariogram parameters. The log weighted composite likelihood is written
(4) 
The weighted composite likelihood has been shown to have both computational advantages as well as improved statistical efficiency.
Here, we propose the weighted composite likelihood approach to semivariogram estimation as a way to incorporate sampling designs (e.g., survey weights, preferential sampling) into the likelihood to improve parameter estimation. We offer two different weighting schemes, the first stemming from the survey sampling literature and the second from geostatistics under preferential sampling.
2.1 Survey design weighted composite likelihood estimation
Our first approach is to use survey weights in the weighted composite likelihood in (4) where we define as the weight assigned to the pair of observations at locations and . In the survey literature, Binder (1983) and Skinner (1989) proposed pseudolikelihood analysis as an approach for incorporating survey weights into the likelihood to get designconsistent parameter estimates. For example, the pseudolog likelihood can be written
where is the specified density function of the data having parameters , denotes the set of sampled observations from some known population, and is the inverse sampling probability associated with observation . Here, we propose incorporating survey weights into the weighted composite likelihood. Assuming sampling is done independently, we might define the weights to be the product of the inverse sampling probabilities from the survey design for observation and . That is, . As such, our first log weighted composite likelihood is
(5) 
Parameter estimates of can be obtained by maximizing (5).
2.2 Sampling design weighted composite likelihood estimation
The second approach stems from geostatistics under preferential sampling. The biascorrected estimator for the mark semivariogram under preferential sampling (Ho and Stoyan, 2008; Rathbun, 2010) is
where is a kernel density function with bandwidth , and is the intensity function of the sample locations. This is a weighted version of the method of moments estimator where the weights are a function of the intensity, , and the kernel density function specification corresponds to the subjectivity in the lag tolerance regions discussed above.
For our second weighted composite likelihood, we propose using the inverse intensity estimates as our weights. That is,
(6) 
To estimate the semivariogram parameters, we must first estimate the intensity function for the spatial domain, . Estimates of can be obtained by plugging in estimates of and maximizing (6). Note that while our focus is on informative sampling designs where the sampling rate is known, this approach can also be used when the sampling design is unknown. While this approach does not take uncertainty quantification associated with estimates of into account, simulations showed minimal sensitivity to using the estimates rather than the true intensity. Methods for incorporating uncertainty in the estimation of into (6) is the subject of future research.
3 Kriging variance estimates
Next, we turn to spatial prediction given variogram parameter estimates. We propose three approaches for quantifying the effects of the sampling design on uncertainty estimation in spatial prediction for finite populations. Under each approach, the goal is to obtain an estimate of the kriging variance had the population been observed. We study the effects of the sampling design by comparing the kriging estimates obtained from the sample data to that which would have been obtained had we observed the entire population. The sampling design is assumed known, however, we consider both the case when the unobserved locations within the population are known and unknown.
We begin by reviewing ordinary kriging. Let denote an unobserved location for which we would like to predict the spatial process . That is, we want to predict given observations at population locations . Let denote the length
crosscovariance vector with
th element . Similarly, let denote the covariance matrix between all and for . As shown in Cressie (1993), the generalized least squares estimator of iswhere is a length vector of 1s. In addition, the ordinary kriging predictor is
(7) 
Here, we focus on the ordinary kriging variance, which we denote, . Given , the ordinary kriging variance is
(8) 
Note that the first two terms are the simple kriging variance and the last term is the penalty for estimating (Cressie, 1993). The ordinary kriging variance is lower for locations that are close to observation locations. The variance increases as distance increases between the prediction location and the observations.
The kriging variance estimate of in (8) assumes the population set of all locations is observed. The kriging variance is minimized when the entire population is observed. That is, for a sample of size where , the kriging variance is greater than that based on all locations. Without loss of generality, let denote the set of observed locations in the sample, and the remaining denote the unobserved locations in the population. Dropping the dependence on for ease of notation, we partition into the set of observed variables, , and unobserved variables, . We partition into the crosscovariances between and the variables at the observed set of locations and those unobserved. Similarly, let
denote the covariance matrix between all and for , which again we partition into the covariances between the observed set of locations and those unobserved. Now, the ordinary kriging predictor and variance estimate given only the observed locations are
(9) 
and
(10) 
Given a known covariance function and parameters, only when such that all of the locations in the population are in the sample.
If the unsampled locations are known, the first and preferred approach is to use them when computing the kriging variance estimate. This approach is optimal in the sense that the kriging variance estimates will not be impacted by the sampling design since (8) can be computed directly given . The other two approaches, which we detail below, assume that the unsampled locations are unknown. As mentioned above, the difference between the kriging variance computed using the sample and population decreases as sample size increases and the proximity of the sample locations to the prediction location decreases. Therefore, our two approaches aim to adjust the distance (scaling approach) between the observations and the prediction location and the number (simulation approach) of sampled locations.
3.1 Scaling approach
The scaling approach aims to adjust the distance in the spatial covariance function to account for the sample design. Assuming a stationary and isotropic covariance function, the scaling is done on the distance between the two locations. Alternatively, it can be thought of as a scaling of the spatial range of the covariance function. The approach is motivated by the function (Ripley, 1977) and function (Baddeley et al., 2015), measures commonly used in spatial point processes to detect deviations from complete spatial randomness (CSR). Recall that, given lag distance , is defined as the expected number of events within of an event in the point pattern, divided by the average intensity of the point pattern. Alternatively, defines the distribution of the distances from an event to its nearest event.
Let the random variable
denote the number of points within radius of the location . Assuming a homogenous point process (CSR) with intensity ,If we set this equal to some value , we can solve for and obtain
Thus, we expect points within a neighborhood with radius around . Additionally, under complete spatial randomness, . Using these results, we outline the scaling approach under both simple random sampling and informative sampling.
3.1.1 Simple random sampling
Assume the sampling design is a simple random sample (SRS) with known rate where . The intensity of the point pattern of the sample data is . Let the random variable denote the number of points in the sample within radius of . We compute the expected value of as
That is, the expected number of points in the sample within radius of is a function of the sampling rate . If we set and solve for , we get
suggesting a distance scaling of to relate the sample to the population. Under SRS, we can derive the analogous function using the sample data . Setting and solving for results in
Again, this suggests that the distance from an event to its nearest event in the population is times the distance from an event to its nearest event in the sample. Based on these two measures, we propose incorporating distance scaling into the covariance function when doing spatial kriging of using sample data.
Assume a stationary and isotropic covariance function, . Letting denote the covariance function given the sampling rate , we propose
(11) 
That is, given the sample data, the covariance between and is a function of the distance between the two locations scaled by . To obtain the kriging variance of using the sample data, we propose using the scale covariance function in (11) in place of the original covariance function in (10).
3.1.2 Informative sampling
The scaling approach can also be employed under informative sampling. Let the sampling rate now vary as a function of location, where is known for all . Recall that our goal is to obtain the kriging variance at location . In order to use the scaling approach, we must first estimate unless it is known. Assuming there exists spatial dependence in the sampling rate, we suggest using a kernel based smoother of to estimate . Other modelbased spatial prediction approaches, such as kriging on the log scale, are also available. In our simulation and application below, we use a Gaussian kernel smoother to obtain estimates of . To obtain the kriging variance of , we replace the covariance function in (10) with . Algorithm 3 outlines the scaling approach for estimating the kriging variance under both informative and noninformative sampling.
3.2 Simulation approach
As discussed above, the kriging variance estimate at the prediction location is a function of the covariance parameters and the observed locations. Our third approach for estimating the kriging variance is simulation based, where we randomly locate unsampled locations to mimic the population in order to mitigate the variance inflation that results from the sample size being less than the population.
Recall that in the survey designs considered here, we assume the total population size is known, but the unsampled locations are unknown. Under either informative or noninformative sampling, the simulation approach requires randomly locating events, which we will refer to as pseudoobservation locations, to be used when computing the kriging variance. That is, we generate pseudoobservation locations and then treat them as observed locations when computing the kriging variance in (10). For a stationary and isotropic covariance function, the closer the collection of pseudoobservation locations are to the set of the unsampled locations in terms of the distribution of their distances to the prediction location, the more accurate the simulation approach will be to estimating the populationbased kriging variance.
3.2.1 Simple random sampling
Under simple random sampling, the point pattern of sample locations offers a thinned representation of the population point pattern. In fact, independently thinning a spatial point pattern by randomly assigning each point to either the training or test set is a common approach for model validation as it preserves the properties of the point process (Leininger and Gelfand, 2017)
. As such, with a noninformative sampling design, the set of sampled locations can be used to approximate the intensity function of the population, which can be used to generate the pseudoobservation locations. For example, we might use kernel density estimation to obtain an estimate of the density surface, and then randomly locate the
pseudoobservations independently across the spatial region using rejection sampling (Gelfand and Schliep, 2018). That is, we randomly sample locations within the domain and retain them independently with probability proportional to the density estimate at the location. This procedure is repeated until pseudoobservation locations are collected. Whereas the population point process might exhibit interaction between points, as in Gibbs processes with attractive (clustering) or repulsive (inhibition) interactions, we found through simulation that this second order structure can be captured well by the first order intensity for the purposes of locating pseudoobservations (results not shown).3.2.2 Informative sampling
Under an informative sampling design, we must account for the fact that our kernel density estimate obtained from the point pattern of sampled locations is a biased representation of the population. That is, where is the population location distribution. Therefore, in the acceptreject step of the procedure above, the probability of retaining the randomly sampled point as a pseudoobservation location is scaled by the reciprocal of the sample rate at the location. Note that this, again, requires spatial estimation (e.g., kernel smoothing, kriging) of the sample rates discussed in the scaling approach in Section 3.1 to every proposed pseudoobservation location. Algorithm 5 outlines the scaling approach for estimating the kriging variance under both informative and noninformative sampling.
4 Simulation study
The objective of our simulation study is to assess the impact of varying sampling designs on kriging estimates. The simulation procedure consists of the following steps: (i) simulate the population data as well as prediction locations, (ii) sample from the population under a specified sample design, (iii) obtain variogram parameter estimates, and (iv) obtain kriging estimates at the prediction locations. Each of these steps is described below. We compare each of the approaches for parameter estimation and kriging variance proposed in Sections 2 and 3.
4.1 Simulating the population
We simulate data within the unit square using the following model specifications and parameterizations. Let and for be two independent mean 0 Gaussian processes with stationary and isotropic covariance functions. For both processes, we assume an exponential covariance function with spatial variance , and range parameter . The response variable of interest, , is defined as
(12) 
where represents independent measurement error. We assume and ) where .
The set of population locations are simulated under two different scenarios. The first assumes that the location distribution is independent of the process of interest whereas the second assumes the location distribution and process of interest to be dependent. Let and denote the intensity functions assumed to be generating the population locations. We assume logGaussian Cox processes (LGCPs) for both and . Thus, conditional on the intensity, the event locations are independent. Define
and
In our simulations, .
We simulate from each LGCP to obtain the two sets of population locations, , and . Our simulation resulted in population sizes of and . For each population, we simulate a realization of the process of interest according to (12). Let and denote the observations of populations 1 and 2, respectively. Lastly, we simulate prediction locations independently of and
from a uniform distribution over the domain for which we are interested in obtaining kriging estimates.
For each population, estimates of the exponential variogram parameters were obtained by maximizing the composite likelihood. Using these variogram parameter estimates, kriging means and variances were obtained for the set of prediction locations. The parameter estimates, means, and variances obtained using the population data are used as the metric of comparison for each of the sampling designs and estimation procedures described below.
4.2 Sampling designs
We propose three different sampling designs (a), (b), and (c), for each population with varying degrees of being preferential and informative. For population , let , , and denote the sampling rate processes for the three sampling designs. Define the sampling location distribution for sampling design (a) and population 1 as, . That is, the sampling location distribution is a product of the population location distribution and the sampling rate. The sampling location distribution for all sampling designs and populations can be defined analogously. The sampling designs for each population are given in Table 1.
Population 1  Population 2  

Sampling rate  Pref./Inf.  Sampling rate  Pref./Inf. 
(a)  N/N  (a)  Y/N 
(b)  Y/Y  (b)  Y/Y 
(c)  Y/Y  (c)  N/Y 
For some constant with , sampling design (a) for both populations is noninformative. For population 1, the sampling location distribution is independent of the variable of interest, , meaning it is nonpreferential. For population 2, since and are dependent, and , this sampling is preferential. The specification of sampling design (b) induces dependence between the sampling distribution and for both populations making it informative and preferential. Lastly, sampling design (c) results in a sampling location distribution of and for some constant . For population 1, this sampling design is informative and preferential, yet results in the sampling location distribution to be independent of the population location distribution. For population 2, the sampling location distribution is nonpreferential but the sampling design is informative.
4.3 Variogram estimation
For each sampling design, 100 independent realizations of sample data are obtained. Then, estimates of the exponential variogram parameters are obtained using the composite likelihood (CL) approach as well as both weighted versions of the composite likelihood. The first weighted version (WCL1) assigns weights using the sampling rates given in Table 1. The second weighted version (WCL2) assigns weights using the sampling location distribution.
We compare the parameter estimates under each approach for each population and sampling design. Figures 2 and 3 show the distribution of the estimates of the exponential variogram parameters for both populations 1 and 2, respectively, obtained using CL, WCL1, and WCL2. In addition, the estimates obtained from the population are indicated by the horizontal line in each panel. For sampling design (a) that is noninformative, the variogram estimates obtained from the CL approach show very little bias relative to the population estimates. Recall that for population 2, sampling design (a) is preferential. The CL estimates obtained for sampling designs (b) and (c) are biased for both populations. These designs are informative for both populations but for population 2, sampling design (c) is nonpreferential. The WCL1 estimates provide a reduction in bias for all three sampling designs, (a), (b), and (c) for both populations. The WCL2 estimates show improvement over the CL estimates for sampling designs that are both preferential and informative. On average, the WCL2 approach is less accurate than the WCL1 approach when compared to the populationbased estimates. For sampling design (a) for population 2 that is preferential and noninformative, the WCL2 estimates are more biased than the CL estimates. This is because, while the sampling location distribution and response variable are highly correlated, the sample is a simple random sample. Thus, the weights in (6) induce bias in the parameter estimates.
Given the additional information on the sampling design, the simulation study suggests that the WCL1 approach is the preferred approach as it shows accurate spatial Gaussian process parameter estimation over all combinations of preferential and informative sampling designs considered. The WCL2 approach is preferred over the CL approach only when the sampling design is known to be both preferential and informative.
4.4 Kriging estimation
Using the variogram parameter estimates for each of the 100 simulations obtained using CL, WCL1, and WCL2, we computed the kriging prediction means and variances for each population and sampling design. In general, we detected little difference in the outofsample performance for the three parameter estimation approaches in terms of the generalized least squares estimator of or mean square prediction error. As anticipated, bias was highest under preferential and informative sampling designs and very little bias is detected under noninformative or nonpreferential designs. Recall that our primary focus is on kriging variance estimation under varying sampling designs. See Diggle et al. (2010), Pati et al. (2011), and Gelfand et al. (2012) for methods for reducing bias in kriging prediction estimates under preferential sampling.
Figures 4 and 5 present the distribution of the ratio of the kriging variances obtained when using the sample data and population data for each design and population. Here, we compare the variance ratio between the sample and population using the CL, WCL1, and WCL2 parameter estimates. The kriging variance estimates based on the WCL1 and WCL2 parameter estimates are equal to or better at than those based on the CL for almost all population and sampling design combinations. One exception is population 2 under sampling design (c) which was informative but nonpreferential.
Kriging variances were also obtained using the scaling and simulation approaches proposed in Section 3 given the WCL1 and WCL2 parameter estimates. For both populations and all sampling designs, the variance ratio is close to 1 when using either the scaling or simulation procedures and the WCL1 parameter estimates, illustrating that these approaches can be used to obtain the population kriging variance estimates from sample data. Using the WCL2 parameter estimates, the scaling and simulation procedures underestimated the population kriging variance for sampling designs (a) and (c) for population 2. Recall that these sampling designs were preferential and noninformative and informative and nonpreferential, respectively, for which the WCL2 parameter estimates were biased. For all other populations and sampling designs, the scaling and simulation approaches resulted in estimates of the kriging variance similar to those obtained from the population data.
5 Nitrate concentration in wells in California
We apply our methods to investigate nitrate concentration in wells in California. Similar data were analyzed by ChanGolston et al. (2020) in studying various one and two stage sampling designs. In our analyses, we obtained nitrate concentrations from approximately 7000 wells. These data were collected between 2011 and 2021. Like in the analysis of ChanGolston et al. (2020), the nitrate concentrations were logtransformed and modeled using an exponential with nugget covariance function.
Figure 6 shows a kernel density estimate of the well locations across the region. There are clear areas of high concentrations of wells in the central part of the region and spanning the region from the northwest to the southeast. Also shown in Figure 6 are the log nitrate concentrations for the region. High concentrations in nitrate are found in the central part of the region. In general, these concentration levels appear positively correlated with the intensity surface (e.g., high concentration of nitrate in areas having high density of wells).
We consider two different sampling scenarios to represent plausible sampling schemes employed by those collecting nitrate samples throughout the region. The first is a simple random sample (SRS). The second is an informative sampling design using stratification based on the 69 subcounties making up the region. The SRS assumed a sampling rate of 0.21. For the stratified sampling, the sampling rates within each subcounty were assumed constant and were specified inversely proportional to the number of wells in the subcounty. Specifically, let denote the number of wells in the subcounty containing location . Then, for , for , for , for , for . These sampling rates were chosen such that the expected number of points were approximately equal for each sampling design. For each sampling scenario, the sampling rate for each observation in the sample was assumed known. One realization from each of these sampling designs was generated for this analysis. The random number of locations for the two sampling designs were 1432 and 1409, respectively.
Variogram estimates were obtained using the population data. In addition, for each sampling design, variogram estimates were obtained using the CL approach and the preferred WCL1 approach. For each design, the variogram estimates using the WCL1 approach are closer to those obtained using the population than those obtained by the CL approach.
Population Estimates  0.572  0.523  23.30 

CL  
SRS  0.576  0.534  24.60 
Stratified  0.601  0.516  20.85 
WCL1  
SRS  0.576  0.534  24.60 
Stratified  0.569  0.523  24.59 
Using the variogram parameter estimates from the WCL1, we obtained kriging variance estimates for each sampling design for 100 outofsample well locations. Variance estimates were obtained using the traditional kriging equations as well as using the scaling approach. The average kriging variance ratio using the sample data relative to the population data for the two sampling designs was 1.10 and 1.09, respectively. The ratio of the kriging variance estimates from the sample relative to the population are shown spatially in Figure 7 (left). Using the scaling approach, the average variance ratio for each sampling design was 1.02 and 1.01, respectively. Again, these are shown spatially in Figure 7 (right). The kriging variance estimates obtained using the scaling are very close to those obtained from the population (ratio 1) showing that this approach is able to estimate the kriging variances that one would have obtained had the entire population been sampled.
6 Discussion
Sampling designs can impact spatial prediction by producing biased variogram estimates and/or kriging estimates. We proposed using a weighted composite likelihood approach that incorporates sampling designs when estimating variogram parameters. These weights can be a function of either the known sampling rate or the estimated intensity function of the sample locations. Given the variogram estimates, we propose three approaches for estimating the populationbased kriging variances using the sample data. The first approach assumes the locations of the unsampled data are known and can be used directly in defining the covariance matrices of the kriging equation. When the locations of the unsampled locations are unknown, we developed the scaling and simulation approaches for kriging variance estimation. Through simulation, we showed that the scaling and simulation approaches in combination with the weighted composite likelihood parameter estimates can mitigate the impacts of sampling design on kriging.
The primary contribution of this work is the development of methods for correcting spatial Gaussian process parameter estimation under informative sampling designs. Under informative sampling designs, we found the weighted composite likelihood with weights a function of the sampling rates to be the best approach for covariance parameter estimation. This approach performed as well as or better than either the composite likelihood or sampling intensity weighted composite likelihood approach under any of the population and sampling design location and process distributions. Whereas the sampling intensity weighted composite likelihood approach was not uniformly better than the unweighted approach, parameter estimation was improved when the sampling design was both informative and preferential. Therefore, in practice we recommend testing for preferential sampling (Schlather et al., 2004; Guan and Afshartous, 2007; Watson, 2021) prior to using this approach. Under preferential and informative sampling, combining our approaches for correcting spatial parameter estimation with the joint modeling approaches (Diggle et al., 2010; Pati et al., 2011; Gelfand et al., 2012) that account for preferential sampling for improved prediction and inference is a direction of future work. Since the kriging mean is a function of the spatial covariance parameters, we anticipate improved prediction when accounting for preferential sampling in estimating both the first and second order structure.
The scaling approach has computational advantages over the simulation approach. Recall that the kriging equations involve matrix inversion. The scaling approach results in a covariance matrix having the same dimension of the sample data, . Conversely, the simulation approach produces a covariance matrix of dimension , which may preclude this approach for large . The advantage of the simulation approach is that it is able to capture fine scale variation in the sampling process since the pseudoobservations are obtained as a function of , which can be highly variable across the region. On the other hand, the scaling approach assumes is constant near location . That is, a constant scaling, is applied to the covariance function between and for all . An investigation into the possible bias in the scaling approach under varying specifications of the sampling rate process, , is the subject of future work.
In our analysis, we focused on spatial kriging under informative sampling for finite population. We assumed throughout that the spatial process was stationary and that the sampling rate was known for the set of sample locations. Ongoing work includes investigating whether these results translate to nonstationary processes. In addition, it is common in observational studies that one or both the sampling rates and population will be unknown. With unknown sampling rate and population size, the sampling intensity weighted composite likelihood approach can still be used for variogram estimation. For a finite population, the simulation approach for kriging could be employed where the pseudoobservation locations would be retained only as a function of the sample location intensity. Future research is necessary to investigate the effects of informative and preferential sampling on spatial prediction for unknown populations.
Acknowledgement
The work of the authors was supported by the National Institute of Statistical Science.
References
 Baddeley et al. (2015) Baddeley, A., Rubak, E., and Turner, R. (2015). Spatial point patterns: methodology and applications with R. CRC press.
 Besag (1975) Besag, J. (1975). Statistical analysis of nonlattice data. Journal of the Royal Statistical Society: Series D (The Statistician), 24(3):179–195.
 Bevilacqua et al. (2012) Bevilacqua, M., Gaetan, C., Mateu, J., and Porcu, E. (2012). Estimating space and spacetime covariance functions for large data sets: a weighted composite likelihood approach. Journal of the American Statistical Association, 107(497):268–280.
 ChanGolston et al. (2020) ChanGolston, A. M., Banerjee, S., and Handcock, M. S. (2020). Bayesian inference for finite populations under spatial process settings. Environmetrics, 31(3):e2606.
 Cressie (1993) Cressie, N. (1993). Statistics for Spatial Data. Wiley, New York, NY.
 Curriero and Lele (1999) Curriero, F. C. and Lele, S. (1999). A composite likelihood approach to semivariogram estimation. Journal of Agricultural, Biological, and Environmental Statistics, pages 9–28.
 Diggle et al. (2010) Diggle, P. J., Menezes, R., and Su, T.l. (2010). Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(2):191–232.
 Gelfand et al. (2012) Gelfand, A. E., Sahu, S. K., and Holland, D. M. (2012). On the effect of preferential sampling in spatial prediction. Environmetrics, 23(7):565–578.
 Gelfand and Schliep (2018) Gelfand, A. E. and Schliep, E. M. (2018). Bayesian inference and computing for spatial point patterns. In NSFCBMS Regional Conference Series in Probability and Statistics, volume 10, pages i–125. JSTOR.
 Guan and Afshartous (2007) Guan, Y. and Afshartous, D. R. (2007). Test for independence between marks and points of marked point processes: a subsampling approach. Environmental and Ecological Statistics, 14(2):101–111.
 Ho and Stoyan (2008) Ho, L. P. and Stoyan, D. (2008). Modelling marked point patterns by intensitymarked Cox processes. Statistics & Probability Letters, 78(10):1194–1199.
 Horvitz and Thompson (1952) Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663–685.

Leininger and Gelfand (2017)
Leininger, T. J. and Gelfand, A. E. (2017).
Bayesian inference and model assessment for spatial point patterns using posterior predictive samples.
Bayesian Analysis, 12(1):1–30.  Lele (1997) Lele, S. (1997). Estimating functions for semivariogram estimation. Lecture NotesMonograph Series, pages 381–396.
 Lindsay (1988) Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80(1):221–239.
 Lumley and Scott (2017) Lumley, T. and Scott, A. (2017). Fitting regression models to survey data. Statistical Science, pages 265–278.
 Matheron (1963) Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8):1246–1266.
 Parker et al. (2019) Parker, P. A., Janicki, R., and Holan, S. H. (2019). Unit level modeling of survey data for small area estimation under informative sampling: A comprehensive overview with extensions. arXiv preprint arXiv:1908.10488.
 Pati et al. (2011) Pati, D., Reich, B. J., and Dunson, D. B. (2011). Bayesian geostatistical modelling with informative sampling locations. Biometrika, 98(1):35–48.
 Rathbun (2010) Rathbun, S. L. (2010). Discussion on the paper by Diggle, Menesez, and Su. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59(2):191–232.
 Ripley (1977) Ripley, B. D. (1977). Modelling spatial patterns. Journal of the Royal Statistical Society: Series B (Methodological), 39(2):172–192.
 Schlather et al. (2004) Schlather, M., Ribeiro Jr, P. J., and Diggle, P. J. (2004). Detecting dependence between marks and locations of marked point processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):79–93.

Watson (2021)
Watson, J. (2021).
A perceptron for detecting the preferential sampling of locations and times chosen to monitor a spatiotemporal process.
Spatial Statistics, 43:100500.  Zimmerman and Stein (2010) Zimmerman, D. L. and Stein, M. (2010). Classical geostatistical methods. In Gelfand, A. E., Diggle, P., Guttorp, P., and Fuentes, M., editors, Handbook of Spatial Statistics, pages 29–44. CRC Press, Boca Raton, FL.
 Zimmerman and Zimmerman (1991) Zimmerman, D. L. and Zimmerman, M. B. (1991). A comparison of spatial semivariogram estimators and corresponding ordinary kriging predictors. Technometrics, 33(1):77–91.
Comments
There are no comments yet.