Main function CorMID
Idea
The problem in GC-APCI-MS that we try to overcome is the formation of fragments forming superimposed MIDs. The ones we identified so far are [M+H], [M+], [M+H]-H2 and [M+H]+H2O-CH4. If we assume [M+H] to be generally the most abundant and hence use it as our fix point (base MID, shift = 0), than we observe superimposed MIDs starting at -2, -1 and +2 relative to [M+H] for [M+], [M+H]-H2 and [M+H]+H2O-CH4 respectively.
The basic idea of the correction is that we measure a superimposed/composite MID of one to several fragments all derived from the same base MID. This base MID (or correct MID, corMID) is exactly what we are looking for. Estimating it is complicated because we do not know the distribution of fragments, i.e. the amount of the individually occurring fragments or their ratios to each other respectively. Hence, we have to estimate corMID and the ratio vector r giving the distribution of present fragments, which together represent our measurement data optimally.
Example
Lets start with an artificial Glucose spectrum where 10% is M6 labeled:
<- "C21Si5"
fml <- CalcTheoreticalMDV(fml = fml, nbio=6, nmz=8)
td1 <- c(0.9,rep(0,5),0.1)
bMID <- apply(td1*bMID,2,sum)
md1 round(md1,4)
#> M+0 M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8
#> 0.4761 0.2318 0.1327 0.0424 0.0132 0.0030 0.0606 0.0253 0.0149
to obtain the measure distribution md1 which is our measured intensity values expressed relative. Please note that the measured MID contains additional peaks at M+7 and M+8, being the natural abundant isotopes of carbon atomes attached during derivatization. Now we may use CorMID
to decompose this back:
CorMID(int=md1, fml=fml, r=unlist(list("M+H"=1)))
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84375 0.00000 0.00000 0.00000 0.00000 0.00000 10.15625
#> attr(,"err")
#> err
#> 0.002743298
#> attr(,"ratio")
#> M+H
#> 1
#> attr(,"ratio_status")
#> [1] "fixed"
#> attr(,"mid_status")
#> [1] "estimated"
Notice, that we allowed only [M+H] to be present in option r. The result is a labeled vector representing the corrected or baseMID together with information on the fitting error err and regarding the options used during the function call as attributes ratio, ratio_status and mid_status with mid being estimated and ratio being fixed during the function call.
We could achieve something similar testing for all currently defined fragments by omitting the r option:
CorMID(int=md1, fml=fml)
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84375 0.00000 0.00000 0.00000 0.00000 0.00000 10.15625
#> attr(,"err")
#> err
#> 0.002743298
#> attr(,"ratio")
#> M+H M+ M-H M+H2O-CH4
#> 1 0 0 0
#> attr(,"ratio_status")
#> [1] "estimated"
#> attr(,"mid_status")
#> [1] "estimated"
We essentially get the same result as before (except for ratio related attributes) because there is no superimposition in our test data. Now lets generate more difficult composite data to be fit by including a 20% proton loss…
<- unlist(list("M-1"=0,0.8*md1)) + c(0.2*md1,0)
md2 round(md2,4)
#> M-1 M+0 M+1 M+2 M+3 M+4 M+5 M+6 M+7 M+8
#> 0.0952 0.4273 0.2120 0.1147 0.0365 0.0112 0.0145 0.0535 0.0232 0.0119
and let CorMID
decompose this back…
CorMID(int=md2, fml=fml)
#> M0 M1 M2 M3 M4 M5 M6
#> 89.84375 0.00000 0.00000 0.00000 0.00000 0.00000 10.15625
#> attr(,"err")
#> err
#> 0.002466178
#> attr(,"ratio")
#> M+H M+ M-H M+H2O-CH4
#> 0.8 0.2 0.0 0.0
#> attr(,"ratio_status")
#> [1] "estimated"
#> attr(,"mid_status")
#> [1] "estimated"
which is pretty close to the truth. :)
Function Details
Let’s look into the details of the function. Appart from some sanity checks and data preparation steps done by the wrapper function CorMID
the main idea is to model a theoretical distribution based on a provided sum formula and fit a base MID and fragment ratios according to measurement data by function FitMID
which is discussed in the following. The approach is brute force using two nested estimators for r and corMID seperately. It builds on the idea to test a crude grid of parameters first, identify the best solution and iteratively approache the true value by minimizing the grid.
The grid is set by an internal function poss_local
. Basically, if we have a two carbon molecule we expect a corMID of length=3 {M0, M1, M2}. Let’s assume that corMID = {0.9, 0, 0.1}. Using a wide grid we would than test the following possibilities:
:::poss_local(vec=c(0.5,0.5,0.5), d=0.5, length.out=3)
CorMID#> Var1 Var2 Var3
#> 3 1.0 0.0 0.0
#> 5 0.5 0.5 0.0
#> 7 0.0 1.0 0.0
#> 11 0.5 0.0 0.5
#> 13 0.0 0.5 0.5
#> 19 0.0 0.0 1.0
and identify {1, 0, 0} as best match after subjecting to a testing function. We decrease the step size of the grid by 50% and test in the next iteration:
:::poss_local(vec=c(1,0,0), d=0.25, length.out=3)
CorMID#> Var1 Var2 Var3
#> 3 1.000 0.000 0.000
#> 5 0.875 0.125 0.000
#> 7 0.750 0.250 0.000
#> 11 0.875 0.000 0.125
#> 13 0.750 0.125 0.125
#> 19 0.750 0.000 0.250
and will get closer to the truth and find {0.875, 0, 0.125} to give the lowest error.
In summary, using this approach we can approximate the optimal vectors of corMID and r in a finite number of iterations to reach a desired precision <0.1%. We can nest MID fitting inside ratio fitting and thereby do both in parallel.
The error function currently employed is simply the square root of the summed squared errors, comparing the provided measurement data and a reconstructed MID based on a specific corMID and r.