How to back transform normal score transformed data
Asked Answered
M

1

6

I have daily rainfall from 61 gauging stations for 12 years in a catchment(8000 Km2).

The goal is create 5Km and 25 Km resolution gridded daily rainfall product. As the no of stations are small and not all stations have rain even in rainy season, i choose to go with climatological variogram.

The rain gauge measurement for a typical day(irain) are as below with few missing values denoted by NA.

7.8  4.4 15.4 19.1  5.8    0   42  6.4   21    21     0     0     0  15.6 0     0    10     5   1.2     0  14.4    NA    25  13.2     0   9.2 2   6.6   7.8  13.2  15.4    NA     9     0  15.5     0  18.6     6 0   4.8  10.6     0    9     0  12.4    NA    12     0     3    14 10    10     0    68  21.8    18  14.8   5.4     7     0    NA

As the daily rainfall are skewed, to transform i tested with cube root transform and log transform (log1p) for each day individually. However for all days both the transforms does'nt fit as i tested with shapiro wilk test. So,i choose normal score score transform(NCR) as suggested by Grimes & Pardo(2009) Geostatistical analysis of rainfall. and used Prof. Ashton Shortridge code

The below code is used for generating a climaogical variogram for the monsoon season. Note i had used days where more than 30% of the stations reported rain. Did not find any references for this. Chose 30% as i get about 65% days to pass the threshold.

lag = 3
bins.vario = seq(0, 75, lag)
nb.bins = length(bins.vario) - 1 

nb.classes = numeric(nb.bins)  
vario.emp = array(0,c(nb.bins,6))
variance.emp = array(NA,c(10000,nb.bins))
vario.emp = as.data.frame(vario.emp)
class(vario.emp) = c("gstatVariogram","data.frame")
names(vario.emp) = c("np","dist","gamma","dir.hor","dir.ver","id")


nRows = nrow(stn.rain.subset)
for (i in 1:nRows) 
{   
  irain = stn.rain.subset[i,]
  isMissing = is.na(irain)
  isZero = (irain == 0)
  irain = irain[!isMissing & !isZero]
  irain = as.numeric(irain)
  rain.mean[[i]] = mean(irain); rain.var[[i]] = var(irain);

  # Testing with log-transformation
  # irain.logtt = log1p(irain)
  # # qqPlot(irain.logtt,distribution = "norm")
  # res = shapiro.test(irain.logtt)
  # pval[[i]] = res$p.value

  if (var(irain)>0)
  {
    # Scaling of the rain on each day. rain in mm. After scaling this is unitless
    irain.scaled = irain/sqrt(var(irain))
    irain.nsc = nscore(irain)
    score  = irain.nsc$nscore   

    easting = lon.UTM[!isMissing & !isZero] # Removing the stn.locs with NA values
    northing = lat.UTM[!isMissing & !isZero]

    rain = data.frame(easting,northing,score)
    names(rain) = c("easting","northing","rain")
    coordinates(rain) = c("easting","northing")  
    proj4string(rain) = UTM

    v = variogram(rain~1, data = rain,boundaries = bins.vario)

    bin.nb = ceiling(v$dist/lag)
    nb.classes[bin.nb] = nb.classes[bin.nb]+1
    vario.emp[bin.nb,] = vario.emp[bin.nb,]+v

}

The climatological variogram for non-zero rain:

Similarly, i had generated the indicator variogram.

The problem now is how to do kriging with the climatogical variogram.

"model" "psill" "range" "kappa" "ang1"  "ang2"  "ang3"  "anis1" "anis2"
"Nug"   0.446609415762287   0   0   0   0   0   1   1
"Sph"   0.533499909345213   51.7206027419321    0.5 0   0   0   1   1

Similary for kriging read each day non-zero, i had scaleed daily rain and transform it. Not sure if the below approach is correct or not?

rain = data.frame(easting,northing,score)
# Multiplying the nugget and sill with var(rain) for each day.
clim.vrmod$psill = clim.vrmod$psill * var(irain)
krige.ok = krige(rain[,3]~1,locations =~easting+northing,data = rain,newdata=output.grd,model = clim.vrmod) 
krige.ok$var1.pred.bt = (backtr(krige.ok$var1.pred,irain.nsc, tails='separate'))*sqrt(var(irain))
krige.ok$var1.se = (krige.ok$var1.var)

The confusions i have are as below:

  1. Is the var1.var the standard-error(mm) or the variance(mm2)?

  2. Should the climatogical variogram (nugget and sill) be scaled with the variance?

  3. Back-transformation should be done with seperate or none option?

Thanks for the help in advance.

Marrakech answered 8/5, 2017 at 14:23 Comment(3)
You need to break your question into pieces. Otherwise you won't get an answer. I am working in the same area and after some lines your question became overwhelming for me as your problem is specific to you and it's more like a thing that you faced in your research. Expecting everyone to understand that through some lines of text is not realistic.Indent
@ Masoud. Thanks for bringing up the problem with the questions i posed. As a researcher we feel i tried my best and asked a question as simple as possible, and why it seems complicated. I will break it down to 2 questions or more.Marrakech
Actually, it would help if you gave us an example of what you want to do and what you tried without having to rely on the theory behind. But your questions seem focused on theory, so you better try at stats.stackexchange.comGroth
T
0

A.E. Journel (Mathematical Geology, circa 1980) provides a detailed discussion of kriging of lognormal transformed data vs lognormal kriging. It shows the difficulties in using any non-linear transformation of the data prior to kriging. Note that the journal has changed names several times, it is now Mathematical Geosciences

With rainfall you also need to take into account whether the data represent accumulations over a time period (an hour, a day, a week, etc)

If the region of interest includes much variation in elevation then you need to incorporate the elevations of the data locations. In the literature there are at least two ways to do this; (1) model the mean of the rainfall as a function

Teenybopper answered 29/10, 2020 at 22:30 Comment(1)
Please provide bibliographic data of the article you mentioned.Hammerskjold

© 2022 - 2024 — McMap. All rights reserved.