rule of thumb for splitting infants from deaths under 5

Given the log crude death rate for ages 0-5, \(log(M(0,5)\) there is a rough 2-slope linear relationship with the log of the proportion of deaths under 5 that occur in the first year of life. This rule uses this two-slope linear relationship to estimate the proportion of deaths in age 0, and then split these from deaths under 5. The relationship is slightly different between males and females. This method is only to be invoked if neither population nor deaths are available in a separate tabulation for single age 0.

lt_rule_4m0_D0(D04, M04, P04, Sex = c("m", "f"))

Arguments

D04	numeric. Deaths under age 5.
M04	numeric. Death rate under age 5.
P04	numeric. Exposure under age 5. A population estimate will do.
Sex	character, either `"m"` or `"f"`.

Value

Estimated deaths in age 0

Details

This is an elsewhere-undocumented relationship derived from the whole of the HMD. We used the segmented package to fit a 2-slope linear model. This can (and should) be reproduced using data from a more diverse collection, and even as-is the data should be subset to only those observations where deaths and populations were not split using HMD methods. You can reproduce the analysis given a data set in the format shown (but not executed) in the examples and following the code steps shown there.

Regarding argument specification, D04 is required, in which case either M04 or P04 can be given to continue.

References

Muggeo VM (2008). “segmented: an R Package to Fit Regression Models with Broken-Line Relationships.” R News, 8(1), 20--25. https://cran.r-project.org/doc/Rnews/.

Human Mortality Database

Examples


# to reproduce the coefficient estimation
# that the method is based on:
if (FALSE) {
# get data in this format:
  # dput(head(Dat))
Dat <- structure(list(
    lDR0 = c(-0.182459515740971, -0.147321312595521,
             -0.138935222455608, -0.156873832361186,
         -0.134782094278661, -0.135213007819874),
    lM5 = c(-4.38578221044468, -4.56777854985643,
        -4.58851248767896, -4.57684656734645,
        -4.62854681692805, -4.61294106314254)),
        .Names = c("lDR0", "lM5"),
        class = c("data.frame"),
    row.names = c(NA, -6L))
 # where lDRO is log(D0 / D0_4)
 # i.e. lof of proportion of deaths < 5 in first year of life
 # and lM5 is log(M0_4)
 # i.e. log of death rate in first 5 years of life

 # then first fit a linear model:
  obj  <- lm(lDR0~lM5, data = Dat)
 # use segmented package:
  seg  <- segmented::segmented(obj)
 # breakpoint:
  seg$psi[2]     # brk
 # first intercept:
  seg$coef[1]    # int1
 # first slope:
  seg$coef[2]    # s1
 # difference in slope from 1st to second:
  seg$coef[3]    # ds1
 # make Dat come from some other dataset and you'll get different coefs,
 # it'd be possible to have these in families maybe, and in any case
 # different for males and females. This is just a rough start, to be
 # replaced if someone offers a superior method. These

}
D0_4 <- 2e4
M0_4 <- 5/1000
P0_4 <- 4e6
# function usage straightforward, also vectorized.
D0   <- lt_rule_4m0_D0(D0_4, M0_4, Sex = "m")
# deaths in ages 1-4 are a separate step.
D1_4 <- D0_4 - D0

# to get M0_4 it's best to follow these steps:
# 1) get M0 using lt_rule_4m0_m0(M0_4),
M0   <- lt_rule_4m0_m0(M0_4)
# 2) get denom using P0 = D0 / M0
P0   <- D0 / M0
# 3) get denom P1_4  as P0_4 - P0
P1_4 <- P0_4 - P0
# 4) M1_4 = D1_4 / P1_4.
M1_4 <- D1_4 / P1_4

if (FALSE) {
plot(NULL, type = 'n', xlim = c(0, 5), ylim = c(1e-3, .025), log = "y",
    xlab = "Age", ylab = "log(rate)")
segments(0, M0_4, 5, M0_4)
segments(0, M0, 1, M0)
segments(1, M1_4, 5, M1_4)
text(1, c(M0, M1_4, M0_4), c("M0", "M1_4", "M0_4"), pos = 3)
}