Context
My questions are related to the changes induced by the upgrade from PROJ4 to PROJ6
and the consequences in various R spatial packages (sp
, sf
, raster
).
We receive now a lot of warnings about “Discarded datum” that looks a bit worrying and I’m a little puzzled about what I should do with that. I can see that this can have dire consequences in some situations and that they can be ignored in other circumstances.
It seems that I’m not the only one to be a little lost (see here). I hope that my questions with a specific reproducible example will help us to better understand the topic.
I understand that we can remove the warnings, and I have read the context : r-spatial blog post, Migration to PROJ6/GDAL3 and these workshop notes (mapview seems to handle this differently in more recent versions)
Questions
Question 1 :
Probably a naive question :
I understand that there is a need for a new notation/format (WKT) implemented in PROJ6 (eg because of a need of higher precision) but I don’t understand why there is a need to remove the datum part form the old proj4 string notation. Why not just keep it as it has always been (and implement the new features in the new WKT format/notation)
Question 2 :
It seems that we have 3 cases concerning the drop of the datum in the old proj4 format :
- no warning : the datum remains as in the old proj4string notation (
sf
défault ?) - we have a warning “Discarded datum (…) but +towgs84= values preserved”
- we have a warning “Discarded datum (…)” (in that case the “+towgs84=” part of the
string is discarded/dropped –>
sp
default ?? )
The example below illustrate different cases where we have these warnings.
Why do we have these 3 different cases on the same CRS (here EPSG 31370)?
What are the consequences of the removal of the datum and/or +towgs84
part ?
Should I be less worried by the second warning than by the third ?
Question 3 :
In the reproducible example below I try to extract the values from a raster corresponding to 3 points, the raster and the points having a different CRS. However depending on the approach used I get different results. I have the impression that this is related to this PROJ4 –> PROJ6 changes and datum dropping but I might be wrong. I created this example only because I wanted to understand the consequences of these “datum dropping” in the crs
I use the function raster::extract
and 3 different general approaches (each time for both
sf
and sp
objects for the points) from which I would expect the same output :
- no manual reprojection, I let
raster::extract
do the job to match the crs of the points and raster - reprojecting manually the points toward the crs of the raster
- reprojecting manually the raster toward the crs of the points
With the 3rd approach, I get 2 different set of values for sp
or sf
objects and
I get yet a third set of values with the second (and first approach) (then the values are the same if
I use sp
or sf
objects).
- Which values are the “correct” ones ? (probably
351.7868 236.4216 309.0073
) - Why are some approaches failing (is this related to the warning messages / datum stuff ?) ?
- When I do a projection of a spatial object and I have these warnings, how can I check that my spatial objects have been “correctly” projected ?
Question 4 :
Is it possible to provide general recommendations about what should be done when we have these warning messages ?
For example :
- If possible do not use the old proj4 string notation but use the WKT one instead
(but this is not always possible. Here for example - if I’m not wrong I’m forced to use
the old proj4 notation because this is what
raster
uses) - If I knew the epsg code of the raster used here, I guess the safest appoach would be
to reproject the points using that code with
sf
, eg :st_transform(SF, crs = xxxx)
Reproducible example
Create points and raster spatial data + check how the CRS is handled
SF points spatial object
Briefly : the CRS is mainly stored in WKT format.
The old proj4string is available on request and do not drop the datum/towgs84
part
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
library(sp)
library(raster)
# create 3 points
coo <- data.frame(x = c(246000, 247000, 246500), y = c(184000, 186000, 185000))
# create an sf spatial object
SF <- st_as_sf(coo, coords = c("x", "y"), crs = 31370)
# Check the CRS :
# the proj4string includes the datum/+towgs84 information - no warning
st_crs(SF)$proj4string
#> [1] "+proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-99.059,53.322,-112.486,0.419,-0.83,1.885,-1 +units=m +no_defs"
# Same value with `raster::crs` but with a
# Warning "Discarded datum (...) but +towgs84= values preserved"
raster::crs(SF)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition,
#> but +towgs84= values preserved
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
#> +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
#> +towgs84=-99.059,53.322,-112.486,0.419,-0.83,1.885,-1 +units=m +no_defs
# WKT
st_crs(SF)
#> Coordinate Reference System:
#> User input: EPSG:31370
#> wkt:
#> BOUNDCRS[
#> SOURCECRS[
#> PROJCRS["Belge 1972 / Belgian Lambert 72",
#> BASEGEOGCRS["Belge 1972",
#> DATUM["Reseau National Belge 1972",
#> ELLIPSOID["International 1924",6378388,297,
#> LENGTHUNIT["metre",1]]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433]],
#> ID["EPSG",4313]],
#> CONVERSION["Belgian Lambert 72",
#> METHOD["Lambert Conic Conformal (2SP)",
#>
#> (...)
#>
#> ID["EPSG",1609],
#> REMARK["Scale difference is given by information source as 0.999999. Given in this record in ppm to assist application usage. Very similar parameter values (to slightly less precision) used for BD72 to ETRS89: see code 1652."]]]
cat(raster::wkt(SF)) # does not work with sf
#> Error in raster::wkt(SF): tentative d'obtenir le slot "crs" d'un objet (classe "sf") qui n'est pas un objet S4
SP points spatial object
Briefly : the CRS is mainly stored in proj4 string format and drops the datum and towgs84
part
unlike sf
).
the new WKT notation is stored as a comment in the CRS object but is different from sf
SP <- coo
coordinates(SP) <- ~x+y
proj4string(SP) <- CRS("+init=epsg:31370")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Reseau_National_Belge_1972 in CRS definition
# the proj4 string do not contain the `towgs84` part
# Warning "Discarded datum (...)"
CRS("+init=epsg:31370")
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Reseau_National_Belge_1972 in CRS definition
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
#> +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
#> +no_defs
# With `raster::crs` same proj4string but no warning
raster::crs(SP)
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
#> +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
#> +no_defs
# WKT notation + a warning: this WKT is indeed different from the SF one (no datum here ?)
cat(comment(CRS("+init=epsg:31370")))
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Reseau_National_Belge_1972 in CRS definition
#> PROJCRS["Belge 1972 / Belgian Lambert 72",
#> BASEGEOGCRS["Belge 1972",
#> DATUM["Reseau National Belge 1972",
#> ELLIPSOID["International 1924",6378388,297,
#> LENGTHUNIT["metre",1]]],
#>
#> (...)
#>
#> USAGE[
#> SCOPE["unknown"],
#> AREA["Belgium - onshore"],
#> BBOX[49.5,2.5,51.51,6.4]]]
# Same output without warning
cat(raster::wkt(SP))
#> PROJCRS["Belge 1972 / Belgian Lambert 72",
#> BASEGEOGCRS["Belge 1972",
#> DATUM["Reseau National Belge 1972",
#> ELLIPSOID["International 1924",6378388,297,
#> LENGTHUNIT["metre",1]]],
#>
#> (...)
#>
#> USAGE[
#> SCOPE["unknown"],
#> AREA["Belgium - onshore"],
#> BBOX[49.5,2.5,51.51,6.4]]]
Raster
In raster
, both the old proj4 notation and the new WKT notation seem to be stored.
r <- raster::raster(system.file("external/test.grd", package="raster"))
raster::crs(r)
#> CRS arguments:
#> +proj=sterea +lat_0=52.1561605555556 +lon_0=5.38763888888889
#> +k=0.9999079 +x_0=155000 +y_0=463000 +datum=WGS84 +units=m +no_defs
cat(raster::wkt(r))
#> PROJCRS["unknown",
#> BASEGEOGCRS["unknown",
#> DATUM["World Geodetic System 1984",
#> ELLIPSOID["WGS 84",6378137,298.257223563,
#> LENGTHUNIT["metre",1]],
#> ID["EPSG",6326]],
#>
#> (...)
#>
#> AXIS["(N)",north,
#> ORDER[2],
#> LENGTHUNIT["metre",1,
#> ID["EPSG",9001]]]]
This dataset do not contain +towgs84
part in the proj4string. But when you do read
a raster with a +towgs84
part in the proj4string it seems to be dropped.
Non reproducible example :
GISfolder <- "/my/path"
tmp <- raster(paste0(GISfolder, 'my_file.tif'))
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
raster::crs(tmp)
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=49.8333339
#> +lat_2=51.1666672333333 +x_0=150000.01256 +y_0=5400088.4378 +ellps=intl
#> +units=m +no_defs
cat(raster::wkt(tmp))
#> PROJCRS["unknown",
#> BASEGEOGCRS["unknown",
#> DATUM["Unknown based on International 1909 (Hayford) ellipsoid",
#> ELLIPSOID["International 1909 (Hayford)",6378388,297,
#> LENGTHUNIT["metre",1,
#> ID["EPSG",9001]]]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433],
#> ID["EPSG",8901]]],
#>
#> (...)
#>
#> AXIS["(N)",north,
#> ORDER[2],
#> LENGTHUNIT["metre",1,
#> ID["EPSG",9001]]]]
I should probably also explore what happens when we use the stars
package instead of
raster
but this question
is already quite long (and i have a lot of code build on the raster package)
Extract the values from the raster
Using the automatic reprojection from raster::extract
# extract the values from the raster,
# the function extract reprojects the points
# in the same crs as the raster layer
extract(r, SF)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition,
#> but +towgs84= values preserved
#> Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
#> Raster
#> [1] 351.7868 236.4216 309.0073
extract(r, SP)
#> Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
#> Raster
#> [1] 351.7868 236.4216 309.0073
Reproject manually the points toward the crs of the raster
SF with proj4string from the rasterSF_proj <- st_transform(SF, crs = raster::crs(r))
extract(r, SF_proj)
#> [1] 351.7868 236.4216 309.0073
SF with WKT from the raster
SF_proj <- st_transform(SF, crs = raster::wkt(r))
extract(r, SF_proj)
#> [1] 351.7868 236.4216 309.0073
SP with proj4string from the raster
SP_proj <- spTransform(SP, raster::crs(r))
extract(r, SP_proj)
#> [1] 351.7868 236.4216 309.0073
SP with WKT from the raster
The wkt fromat is not accepted by sp::spTransform
–> does not work
# error
SP_proj <- spTransform(SP, raster::wkt(r))
#> Error in CRS(CRSobj): PROJ4 argument-value pairs must begin with +: PROJCRS["unknown",
#> BASEGEOGCRS["unknown",
#> DATUM["World Geodetic System 1984",
#> ELLIPSOID["WGS 84",6378137,298.257223563,
#> LENGTHUNIT["metre",1]],
#> ID["EPSG",6326]],
#>
#> (...)
#>
#> AXIS["(N)",north,
#> ORDER[2],
#> LENGTHUNIT["metre",1,
#> ID["EPSG",9001]]]]
# extract(r, SP_proj)
Reproject manually the raster toward the crs of the points
Using the proj4string of SF (with datum)–> results different from the previous attempts
# EPSG 31370 proj4 string with the datum:
lambert72 <- sf::st_crs(31370)$proj4string
lambert72
#> [1] "+proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +towgs84=-99.059,53.322,-112.486,0.419,-0.83,1.885,-1 +units=m +no_defs"
# there is a warning when we project the raster but the full string seems to be used
r2 <- raster::projectRaster(r, crs = lambert72)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition,
#> but +towgs84= values preserved
raster::crs(r2)
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
#> +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl
#> +towgs84=-99.059,53.322,-112.486,0.419,-0.83,1.885,-1 +units=m +no_defs
extract(r2, SP)
#> [1] 341.6399 222.1028 301.2286
Using the WKT of SF
–> does not work because raster::projectRaster
does not accept WKT format
for its crs
argument
lambert72 <- sf::st_crs(31370)
lambert72
#> Coordinate Reference System:
#> User input: EPSG:31370
#> wkt:
#> BOUNDCRS[
#> SOURCECRS[
#> PROJCRS["Belge 1972 / Belgian Lambert 72",
#> BASEGEOGCRS["Belge 1972",
#> DATUM["Reseau National Belge 1972",
#> ELLIPSOID["International 1924",6378388,297,
#> LENGTHUNIT["metre",1]]],
#> PRIMEM["Greenwich",0,
#>
#> (...)
#>
#> AREA["Belgium - onshore"],
#> BBOX[49.5,2.5,51.51,6.4]],
#> ID["EPSG",1609],
#> REMARK["Scale difference is given by information source as 0.999999. Given in this record in ppm to assist application usage. Very similar parameter values (to slightly less precision) used for BD72 to ETRS89: see code 1652."]]]
r2 <- raster::projectRaster(r, crs = lambert72)
#> Error in wkt(projto): tentative d'obtenir le slot "crs" d'un objet (classe "crs") qui n'est pas un objet S4
Using the proj4string of SP (without datum)
–> results different from the previous attempts
# EPSG 31370 proj4 string without the datum:
lambert72 <- sp::CRS("+init=epsg:31370")@projargs
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Reseau_National_Belge_1972 in CRS definition
lambert72
#> [1] "+proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333 +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m +no_defs"
# warning
r3 <- raster::projectRaster(r, crs = lambert72)
#> Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO"): Discarded
#> datum Unknown_based_on_International_1909_Hayford_ellipsoid in CRS definition
raster::crs(r3)
#> CRS arguments:
#> +proj=lcc +lat_0=90 +lon_0=4.36748666666667 +lat_1=51.1666672333333
#> +lat_2=49.8333339 +x_0=150000.013 +y_0=5400088.438 +ellps=intl +units=m
#> +no_defs
extract(r3, coo)
#> [1] 348.5775 329.1199 277.2260
sessionInfo
sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#>
#> (...)
#>
#> other attached packages:
#> [1] raster_3.3-13 sp_1.4-2 sf_0.9-5 knitr_1.29
#>
#> (...)
#>
Used with :
GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
Created on 2020-09-03 by the reprex package (v0.3.0)
raster::projectRaster
function that fails to project correctly the raster whatever the format of the crs you provide (as CRS sp object or of course as proj4 string). I tried to clarify why each option works or fails in a separate answer (please correct me if I'm wrong) – Griffen