If this is your first time using the spsurvey package, run
install.packages("spsurvey")
to install the package. You only need to run this code once per version of R.
After the spsurvey package is installed, load it into R each new R session by running
library(spsurvey)
If you used spsurvey in your work, please cite it. You can view the most recent citation by running
citation("spsurvey")
#>
#> To cite the spsurvey package in publications use:
#>
#> Dumelle, Michael., Kincaid, T. M., Olsen, A. R., and Weber, M. H.
#> (2022). spsurvey: Spatial Sampling Design and Analysis. R package
#> version 5.3.0.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {spsurvey: Spatial Sampling Design and Analysis},
#> author = {Michael Dumelle and Thomas M. Kincaid and Anthony R. Olsen and Marc H. Weber},
#> year = {2022},
#> note = {R package version 5.3.0},
#> }
spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. There are a few terms helpful to define before we move forward with spsurvey, as these terms will be used throughout the vignettes and documentation:
There are three additional vignettes in spsurvey:
sp_summary()
and sp_plot()
functions to summarize and visualize sampling frames, design sites, and analysis datavignette("EDA", "spsurvey")
grts()
function to implement the Generalized Random Tessellation Stratified (GRTS) algorithm (Stevens and Olsen, 2004) to select spatially balanced samplesvignette("sampling", "spsurvey")
.vignette("analysis", "spsurvey")
.These vignettes cover some of the core functions (and arguments within those functions) in spsurvey. To learn more about features of spsurvey that are not covered in these vignettes, we encourage you to read spsurvey’s documentation available for download here. Help files for a particular function are viewable by running ?function_name
after loading spsurvey. For example, to learn more about the grts()
function, run ?grts
.
The version 5.0.0 update to spsurvey implemented many significant changes to existing functions. As a result, some of your old code may not run properly while using version 5.0.0. Though we recommend adapting your code to work with the version 5.0.0, you may also install a previous version of spsurvey. For information regarding the installation of previous version of R packages, please see the RStudio support page here. Additionally, old versions of spsurvey are also available for download in the release tags section of our GitHub repository here.
sf
objectsThe sampling functions in spsurvey (grts()
and irs()
) require that your sampling frame is an sf
object. An sf
object (shorthand for a “simple features” object) is an R object with a unique structure used to conveniently store spatial data. sf
objects are constructed using the sf package (Pebesma, 2018). The sf package is loaded and installed alongside the spsurvey package, so you do not need to run install.packages("sf")
or library(sf)
to access the sf package if spsurvey is already installed and loaded. For more on the sf package, see here.
Next we discuss a few ways to construct sf
objects in R. The first is to read a shapefile directly into R using sf::read_sf()
. The second is to use the sf::st_sf()
function or the sf::st_as_sf()
function to combine an appropriate R object (most commonly a data frame) and an appropriate geometry object into an sf
object. To illustrate one approach for turning a data frame into an sf
object, we start with NE_Lakes_df
, a data frame in spsurvey that contains variables and geographic coordinates (latitude and longitude coordinates) for lakes in the Northeastern United States. To turn NE_Lakes_df
into NE_Lakes_geo
, an sf
object with geographic coordinates, run
NE_Lakes_geo <- st_as_sf(NE_Lakes_df, coords = c("XCOORD", "YCOORD"), crs = 4326)
NE_Lakes_geo
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -73.64778 ymin: 41.07065 xmax: -69.96715 ymax: 42.73616
#> Geodetic CRS: WGS 84
#> First 10 features:
#> AREA AREA_CAT ELEV ELEV_CAT geometry
#> 1 10.648825 large 264.69 high POINT (-72.08896 42.55508)
#> 2 2.504606 small 557.63 high POINT (-73.18199 42.36727)
#> 3 3.979199 small 28.79 low POINT (-71.14074 42.15596)
#> 4 1.645657 small 212.60 high POINT (-73.06726 41.783)
#> 5 7.489052 small 239.67 high POINT (-72.2602 42.36255)
#> 6 86.533725 large 195.37 high POINT (-71.74634 41.87624)
#> 7 1.926996 small 158.96 high POINT (-73.48408 41.34238)
#> 8 6.514217 small 29.26 low POINT (-73.25487 41.20551)
#> 9 3.100221 small 204.62 high POINT (-72.20897 42.12512)
#> 10 1.868094 small 78.77 low POINT (-72.70233 42.18012)
The coords
argument to sf::st_as_sf
specifies the columns in NE_Lakes_df
that are the x-coordinates and y-coordinates. The crs
argument specifies the coordinate reference system, which we discuss in more detail next.
Spatial data and sf
objects rely on coordinate reference systems. A coordinate reference system (CRS) provides a structure by which to identify unique locations on the Earth’s surface. CRSs are either geographic or projected. A geographic CRS uses longitude (east-west direction) and latitude (north-south direction) coordinates to represent location with respect to a specific ellipsoid or spheroid surface. Geographic CRSs are measured in degrees, not units like meters or feet – this has important consequences. For example, a one degree difference in latitude is different at different longitudes. Projected CRSs are measured in standard Cartesian coordinates with respect to a flat surface. They have x and y locations, an origin, and a unit of measurement (like meters or feet).
You can move between coordinate systems using sf::st_transform()
. For example, we can transform NE_Lakes_geo
(which uses a geographic CRS) to NE_Lakes
(which uses a projected CRS) by running
NE_Lakes <- st_transform(NE_Lakes_geo, crs = 5070)
NE_Lakes
#> Simple feature collection with 195 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 1834001 ymin: 2225021 xmax: 2127632 ymax: 2449985
#> Projected CRS: NAD83 / Conus Albers
#> First 10 features:
#> AREA AREA_CAT ELEV ELEV_CAT geometry
#> 1 10.648825 large 264.69 high POINT (1930929 2417191)
#> 2 2.504606 small 557.63 high POINT (1849399 2375085)
#> 3 3.979199 small 28.79 low POINT (2017323 2393723)
#> 4 1.645657 small 212.60 high POINT (1874135 2313865)
#> 5 7.489052 small 239.67 high POINT (1922712 2392868)
#> 6 86.533725 large 195.37 high POINT (1977163 2350744)
#> 7 1.926996 small 158.96 high POINT (1852292 2257784)
#> 8 6.514217 small 29.26 low POINT (1874421 2247388)
#> 9 3.100221 small 204.62 high POINT (1933352 2368181)
#> 10 1.868094 small 78.77 low POINT (1892582 2364213)
CRSs in R have traditionally been stored using EPSG codes or proj4string
values. This meant that in order to transform your coordinates from one CRS to another, you needed two EPSG codes or proj4string
values, one for each CRS. Recent updates to R’s handling of spatial data follow GDAL and PROJ (more information available here), and CRSs in sf
objects are stored in R as lists with two components: input
, which contains information regarding the EPSG code and proj4string
; and wkt
, an open geospatial standard format. For more information on CRSs and EPSG codes, see Pebesma (2018) and Lovelace et al. (2019). To search for various CRSs and EPSG codes, see here and here.
spsurvey will use the CRS from your sf
object, so it is your responsibility to make sure the sf
object has an appropriate CRS. If the CRS is not specified correctly, you may get misleading results.
Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R. CRC Press.
Pebesma, E., (2018). Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10 (1):439-446. https://doi.org/10.32614/RJ-2018-009
Stevens Jr, D. L. and Olsen, A. R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465):262-278.