Package 'Frames2'

Title: Estimation in Dual Frame Surveys
Description: Point and interval estimation in dual frame surveys. In contrast to classic sampling theory, where only one sampling frame is considered, dual frame methodology assumes that there are two frames available for sampling and that, overall, they cover the entire target population. Then, two probability samples (one from each frame) are drawn and information collected is suitably combined to get estimators of the parameter of interest.
Authors: Antonio Arcos <[email protected]>, Maria del Mar Rueda <[email protected]>, Maria Giovanna Ranalli <[email protected]> and David Molina <[email protected]>
Maintainer: David Molina <[email protected]>
License: GPL (>= 2)
Version: 0.2.1
Built: 2025-02-13 04:12:57 UTC
Source: https://github.com/cran/Frames2

Help Index


Bankier-Kalton-Anderson estimator

Description

Produces estimates for population total and mean using the Bankier-Kalton-Anderson estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

BKA(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, 
conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

BKA estimator of population total is given by

Y^BKA=isAd~iAyi+isBd~iByi\hat{Y}_{BKA} = \sum_{i \in s_A}\tilde{d}_i^Ay_i + \sum_{i \in s_B}\tilde{d}_i^By_i

where d~iA={diAif ia(1/diA+1/diB)1if iab\tilde{d}_i^A =\left\{\begin{array}{lcc} d_i^A & \textrm{if } i \in a\\ (1/d_i^A + 1/d_i^B)^{-1} & \textrm{if } i \in ab \end{array} \right. and d~iB={diBif ib(1/diA+1/diB)1if iba\tilde{d}_i^B =\left\{\begin{array}{lcc} d_i^B & \textrm{if } i \in b\\ (1/d_i^A + 1/d_i^B)^{-1} & \textrm{if } i \in ba \end{array} \right. being diAd_i^A and diBd_i^B the design weights, obtained as the inverse of the first order inclusion probabilities, that is, diA=1/πiAd_i^A = 1/\pi_i^A and diB=1/πiBd_i^B = 1/\pi_i^B.

To estimate variance of this estimator, one uses following approach proposed by Rao and Skinner (1996)

V^(Y^BKA)=V^(isAz~iA)+V^(isBz~iB)\hat{V}(\hat{Y}_{BKA}) = \hat{V}(\sum_{i \in s_A}\tilde{z}_i^A) + \hat{V}(\sum_{i \in s_B}\tilde{z}_i^B)

with z~iA=δi(a)yi+(1δi(a))yiπiA/(πiA+πiB)\tilde{z}_i^A = \delta_i(a)y_i + (1 - \delta_i(a))y_i\pi_i^A/(\pi_i^A + \pi_i^B) and z~iB=δi(b)yi+(1δi(b))yiπiB/(πiA+πiB)\tilde{z}_i^B = \delta_i(b)y_i + (1 - \delta_i(b))y_i\pi_i^B/(\pi_i^A + \pi_i^B), being δi(a)\delta_i(a) and δi(b)\delta_i(b) the indicator variables for domain aa and domain bb, respectively. If both first and second order probabilities are known, variances and covariances involved in calculation of β^\hat{\beta} and V^(Y^FB)\hat{V}(\hat{Y}_{FB}) are estimated using functions VarHT and CovHT, respectively. If only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression

Cov^(X^,Y^)=V^(X+Y)V^(X)V^(Y)2\widehat{Cov}(\hat{X}, \hat{Y}) = \frac{\hat{V}(X + Y) - \hat{V}(X) - \hat{V}(Y)}{2}

Value

BKA returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Bankier, M. D. (1986) Estimators Based on Several Stratified Samples With Applications to Multiple Frame Surveys. Journal of the American Statistical Association, Vol. 81, 1074 - 1079.

Kalton, G. and Anderson, D. W. (1986) Sampling Rare Populations. Journal of the Royal Statistical Society, Ser. A, Vol. 149, 65 - 82.

Rao, J. N. K. and Skinner, C. J. (1996) Estimation in Dual Frame Surveys with Complex Designs. Proceedings of the Survey Method Section, Statistical Society of Canada, 63 - 68.

Skinner, C. J. and Rao, J. N. K. (1996) Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 433, 349 - 356.

See Also

JackBKA

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate BKA estimator for population total for variable Leisure
BKA(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain)

#Now, let calculate BKA estimator and a 90% confidence interval for population 
#total for variable Feeding considering only first order inclusion probabilities
BKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, 
DatB$ProbA, DatA$Domain, DatB$Domain, 0.90)

DF calibration estimator

Description

Produces estimates for population totals and means using the DF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

CalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, 
xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of length nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of length nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A.

N_B

(Optional) A numeric value indicating the size of frame B.

N_ab

(Optional) A numeric value indicating the size of the overlap domain.

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

DF calibration estimator of population total is given by

Y^CalDF=Y^a+η^Y^ab+Y^b+(1η^)Y^ba\hat{Y}_{CalDF} = \hat{Y}_a + \hat{\eta}\hat{Y}_{ab} + \hat{Y}_b + (1 - \hat{\eta})\hat{Y}_{ba}

where Y^a=isad~iyi,Y^ab=isabd~iyi\hat{Y}_a = \sum_{i \in s_a}\tilde{d}_i y_i, \hat{Y}_{ab} = \sum_{i \in s_{ab}}\tilde{d}_i y_i, Y^b=isbd~iyi\hat{Y}_b = \sum_{i \in s_b}\tilde{d}_i y_i and Y^ba=isbad~iyi\hat{Y}_{ba} = \sum_{i \in s_{ba}}\tilde{d}_i y_i, with d~i\tilde{d}_i calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are all known and no other auxiliary information is available, calibration constraints are

isad~i=Na,isabd~i=Nab,isbad~i=Nba,isbd~i=Nb\sum_{i \in s_a}\tilde{d}_i = N_a, \sum_{i \in s_{ab}}\tilde{d}_i = N_{ab}, \sum_{i \in s_{ba}}\tilde{d}_i = N_{ba}, \sum_{i \in s_b}\tilde{d}_i = N_b

Optimal value for η^\hat{\eta} to minimice variance of the estimator is given by V^(N^ba)/(V^(N^ab)+V^(N^ba))\hat{V}(\hat{N}_{ba})/(\hat{V}(\hat{N}_{ab}) + \hat{V}(\hat{N}_{ba})). If both first and second order probabilities are known, variances are estimated using function VarHT. If only first order probabilities are known, variances are estimated using Deville's method.

Function covers following scenarios:

  • There is not any additional auxiliary variable

    • NA,NBN_A, N_B and NabN_{ab} unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NA,NBN_A, N_B and NabN_{ab} known

  • At least, information about one additional auxiliary variable is available

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NA,NBN_A, N_B and NabN_{ab} known

To obtain an estimator of the variance for this estimator, one can use Deville's expression

V^(Y^CalDF)=11ksak2ks(1πk)(ekπklsalelπl)2\hat{V}(\hat{Y}_{CalDF}) = \frac{1}{1-\sum_{k\in s} a_k^2}\sum_{k\in s}(1-\pi_k)\left(\frac{e_k}{\pi_k} - \sum_{l\in s} a_{l} \frac{e_l}{\pi_l}\right)^2

where ak=(1πk)/ls(1πl)a_k=(1-\pi_k)/\sum_{l\in s} (1-\pi_l) and eke_k are the residuals of the regression with auxiliary variables as regressors.

Value

CalDF returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]

Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382

See Also

JackCalDF

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate DF calibration estimator for variable Feeding, without
#considering any auxiliary information
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate DF calibration estimator for variable Clothing when the frame
#sizes and the overlap domain size are known
CalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate DF calibration estimator and a 90% confidence interval
#for population total for variable Feeding, considering Income as auxiliary variable in 
#frame A and Metres2 as auxiliary variable in frame B and with frame sizes and overlap 
#domain size known.
CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, 
xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, 
conf_level = 0.90)

SF calibration estimator

Description

Produces estimates for population totals and means using the SF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

CalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A = NULL,
N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, 
xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", 
conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A

N_B

(Optional) A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

SF calibration estimator of population total is given by

Y^CalSF=Y^a+Y^ab+Y^b\hat{Y}_{CalSF} = \hat{Y}_a + \hat{Y}_{ab} + \hat{Y}_b

where Y^a=isad~iyi,Y^ab=i(sabsba)d~iyi\hat{Y}_a = \sum_{i \in s_a}\tilde{d}_i y_i, \hat{Y}_{ab} = \sum_{i \in (s_{ab} \cup s_{ba})}\tilde{d}_i y_i and Y^b=isbd~iyi\hat{Y}_b = \sum_{i \in s_b} \tilde{d}_i y_i, with d~i\tilde{d}_i calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are known and no other auxiliary information is available, calibration constraints are

isad~i=Na,isabsbad~i=Nab,isbad~i=Nba\sum_{i \in s_a}\tilde{d}_i = N_a, \sum_{i \in s_{ab} \cup s_{ba}}\tilde{d}_i = N_{ab}, \sum_{i \in s_{ba}}\tilde{d}_i = N_{ba}

Function covers following scenarios:

  • There is not any additional auxiliary variable

    • NA,NBN_A, N_B and NabN_{ab} unknown

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

  • At least, information about one additional auxiliary variable is available

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

To obtain an estimator of the variance for this estimator, one can use Deville's expression

V^(Y^CalSF)=11ksak2ks(1πk)(ekπklsalelπl)2\hat{V}(\hat{Y}_{CalSF}) = \frac{1}{1-\sum_{k\in s} a_k^2}\sum_{k\in s}(1-\pi_k)\left(\frac{e_k}{\pi_k} - \sum_{l\in s} a_{l} \frac{e_l}{\pi_l}\right)^2

where ak=(1πk)/ls(1πl)a_k=(1-\pi_k)/\sum_{l\in s} (1-\pi_l) and eke_k are the residuals of the regression with auxiliary variables as regressors.

Value

CalSF returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]

Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382

See Also

JackCalSF

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate SF calibration estimator for variable Clothing, without
#considering any auxiliary information
CalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain)

#Now, let calculate SF calibration estimator for variable Leisure when the frame
#sizes and the overlap domain size are known
CalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, 
DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate SF calibration estimator and a 90% confidence interval
#for population total for variable Feeding, considering Income and Metres2 as auxiliary 
#variables and with frame sizes and overlap domain size known.
CalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, 
DatB$Domain, N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, 
xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, 
XA = 4300260, XB = 176553, conf_level = 0.90)

Summary of estimators

Description

Returns all possible estimators that can be computed according to the information provided

Usage

Compare(ysA, ysB, pi_A, pi_B, domains_A, domains_B, pik_ab_B = NULL, pik_ba_A = NULL, 
N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL,  
xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, met = "linear", 
conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of length nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of length nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

pik_ab_B

(Optional) A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

(Optional) A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sBs_B.

N_A

(Optional) A numeric value indicating the size of frame A.

N_B

(Optional) A numeric value indicating the size of frame B.

N_ab

(Optional) A numeric value indicating the size of the overlap domain.

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

Compare(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

Covariance estimator between two Horvitz - Thompson estimators

Description

Computes the covariance estimator between two Horvitz - Thompson estimators of population total from survey data obtained from a single stage sampling design

Usage

CovHT(y, x, pikl)

Arguments

y

A numeric vector of size n containing information about first variable of interest in the sample

x

A numeric vector of size n containing information about second variable of interest in the sample

pikl

A square numeric matrix of dimension n containing first and second order inclusion probabilities for units included in the sample

Details

Covariance estimator between two Horvitz - Thompson estimators of population total is given by

Cov^(Y^HT,X^HT)=kslsπklπkπlπklykπkxlπl\hat{Cov}(\hat{Y}_{HT}, \hat{X}_{HT}) = \sum_{k \in s}\sum_{l \in s} \frac{\pi_{kl} - \pi_k \pi_l}{\pi_{kl}}\frac{y_k}{\pi_k}\frac{x_l}{\pi_l}

Value

A numeric value representing covariance estimator between two Horvitz - Thompson estimators for population total for considered values

References

Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685 @references Sarndal, C. E., Swensson, B. and Wretman, J. (1992) Model Assisted Survey Sampling. Springer-Verlag. New York.

See Also

HT VarHT

Examples

##########   Example 1   ##########
Indicators <- c(1, 2, 3, 4, 5)
X <- c(13, 18, 20, 14, 9)
Y <- c(2, 0.5, 1.2, 3.3, 2)
#Let draw two simple random samples without replacement of size 2
s <- sample(Indicators, 2)
sX <- X[s]
sY <- Y[s]
#Now, let calculate the associated probability matrix with first and
#second order inclusion probabilities
Ps <- matrix(c(0.4,0.2, 0.2,0.4), 2, 2)
CovHT(sX, sY, Ps)

##########   Example 2   ##########
data(DatA)
attach(DatA)
data(PiklA)
#Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A.
HT(Clo, ProbA)
#Let calculate Horvitz - Thompson estimator for total of variable Feeding in Frame A.
HT(Feed, ProbA)
#And now, let compute the covariance between the previous estimators
CovHT(Clo, Feed, PiklA)

Joint sample database

Description

This dataset contains some variables coming from a real dual frame survey conducted in 2013 in Andalusia (Spain) by a scientific institute specialized in social topics. With this dataset it is intented to show how to properly split a joint dual frame sample into subsamples, so functions of Frame2 can be used.

Usage

Dat

Format

Drawnby

Indicates whether individual was selected in the landline sample(1) or in the cell phone sample(2).

Stratum

Indicates the stratum each individual belongs to. For individuals selected in cell phone sample, value of this variable is NA.

Opinion

Response of the individual to the question: Do you think that immigrants currently living in Andalusia are quite a lot? 1 represents "yes" and 0 represents "no".

Landline

Indicates whether individual has a landline (1) or not (0).

Cell

Indicates whether individual has a cell phone(1) or not(0).

ProbLandline

First order inclusion probability of reaching the individual by landline.

ProbCell

First order inclusion probability of reaching the individual by cell phone.

Income

Monthly income (in euros) of the individual.

Details

The survey was based on two frames: a landline frame and a cell phone frame. Landline frame was stratified by province and simple random sampling without replacement was considered in cell phone frame. The size of the whole sample was n=2402n = 2402. Total of the variable Income in the whole population is XIncome=12686232063X_{Income} = 12686232063.

Examples

data(Dat)
attach(Dat)

#We are going to split dataset Dat into two new datasets, each 
#one corresponding to a frame: frame containing individuals
#using landline and frame containing individuals using cell phone.

FrameLandline <- Dat[Landline == 1,]
FrameCell <- Dat[Cell == 1,]

#Equally, we can split the original dataset in three new different 
#datasets, each one corresponding to one domain: first domain containing
#individuals using only landline, second domain containing individuals
#using only cell phone and the third domain containing individuals
#using both landline and cell phone.

DomainLandline <- Dat[Landline == 1 & Cell == 0,]
DomainCell <- Dat[Landline == 0 & Cell == 1,]
DomainBoth <- Dat[Landline == 1 & Cell == 1,]

#From the domain datasets, we can build frame datasets

FrameLandline <- rbind(DomainLandline, DomainBoth)
FrameCell <- rbind(DomainCell, DomainBoth)

Database of household expenses for frame A

Description

This dataset contains some variables regarding household expenses for a sample of 105 households selected from a list of landline phones (let say, frame A) in a particular city in a specific month.

Usage

DatA

Format

Domain

A string indicating the domain each household belongs to. Possible values are "a" if household belongs to domain a or "ab" if household belongs to overlap domain.

Feed

Feeding expenses (in euros) at the househould

Clo

Clothing expenses (in euros) at the household

Lei

Leisure expenses (in euros) at the household

Inc

Household income (in euros). Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.

Tax

Household municipal taxes (in euros) paid. Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.

M2

Square meters of the house. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.

Size

Household size. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.

ProbA

First order inclusion probability in frame A. This probability is 0 for households included in domain b.

ProbB

First order inclusion probability in frame B. This probability is 0 for households included in domain a.

Stratum

A numeric value indicating the stratum each household belongs to.

Details

The sample, of size nA=105n_A = 105, has been drawn from a population of NA=1735N_A = 1735 households with landline phone according to a stratified random sampling. Population units were divided in 6 different strata. Population sizes of these strata are NAh=(727,375,113,186,115,219)N_A^h = (727, 375, 113, 186, 115, 219). Nab=601N_{ab} = 601 of the households composing the population have, also, mobile phone. On the other hand, frame totals for auxiliary variables in this frame are XIncomeA=4300260X_{Income}^A = 4300260 and XTaxesA=215577X_{Taxes}^A = 215577.

See Also

PiklA

Examples

data(DatA)
attach(DatA)
#Let perform a brief descriptive analysis for the three main variables
param <- data.frame(Feed, Clo, Lei)
summary (param)
hist (Feed)
hist (Clo)
hist (Lei)

Database of household expenses for frame B

Description

This dataset contains some variables regarding household expenses for a sample of 135 households selected from a list of mobile phones (let say, frame B) in a particular city in a specific month.

Usage

DatB

Format

Domain

A string indicating the domain each household belongs to. Possible values are "b" if household belongs to domain b or "ba" if household belongs to overlap domain.

Feed

Feeding expenses (in euros) at the househould

Clo

Clothing expenses (in euros) at the household

Lei

Leisure expenses (in euros) at the household

Inc

Household income (in euros). Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.

Tax

Household municipal taxes (in euros) paid. Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.

M2

Square meters of the house. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.

Size

Household size. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.

ProbA

First order inclusion probability in frame A. This probability is 0 for households included in domain b.

ProbB

First order inclusion probability in frame B. This probability is 0 for households included in domain a.

Details

The sample, of size nB=135n_B = 135, has been drawn from a population of NB=1191N_B = 1191 households with mobile phone according to a simple random sampling without replacement design. Nab=601N_{ab} = 601 of these households have, also, landline phone. On the other hand, frame totals for auxiliary variables in this frame are XMetres2B=176553X_{Metres2}^B = 176553 and XSizeB=3529X_{Size}^B = 3529

See Also

PiklB

Examples

data(DatB)
attach(DatB)
#Let perform a brief descriptive analysis for the three main variables
param <- data.frame(Feed, Clo, Lei)
summary (param)
hist (Feed)
hist (Clo)
hist (Lei)

Database of students' program choice for frame A

Description

This dataset contains some variables regarding the program choice for a sample of 180 students included in the sampling frame A.

Usage

DatMA

Format

Id_Pop

An integer from 1 to NN, with NN the number of students in the whole population, identifying the student within the population.

Id_Frame

An integer from 1 to NAN_A, with NAN_A the number of students in the frame, identifying the student within the frame.

Prog

A factor with three categories (academic, general and vocation) indicating the program choice of the student.

Ses

An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.

Read

A number indicating the mark of the student in a reading test.

Write

A number indicating the mark of the student in a writing test.

Sch_Size

A number indicating the size of the school the students belongs to.

Domain

A string indicating the domain each student belongs to. Possible values are "a" if student belongs to domain a or "ab" if student belongs to overlap domain.

ProbA

First order inclusion probability in frame A.

ProbB

First order inclusion probability in frame B. This probability is 0 for students included in domain a.

Details

The sample, of size nA=180n_A = 180, has been drawn from a population of NA=5500N_A = 5500 students according to a proportional-to-size sampling desing according to the size of the school. So, students attending bigger schools have a higher probability of being selected in the sample. Nab=2000N_{ab} = 2000 of the students composing the population belongs also to frame B.

See Also

DatPopM

Examples

data(DatMA)
attach(DatMA)
#Let perform a brief descriptive analysis for the main variable
summary (Prog)
#And let do the same for the numerical auxiliary variables Read and Write
summary(Read)
summary(Write)

Database of students' program choice for frame B

Description

This dataset contains some variables regarding the program choice for a sample of 232 students included in the sampling frame B.

Usage

DatMB

Format

Id_Pop

An integer from 1 to NN, with NN the number of students in the whole population, identifying the student within the population.

Id_Frame

An integer from 1 to NBN_B, with NBN_B the number of students in the frame, identifying the student within the frame.

Prog

A factor with three categories (academic, general and vocation) indicating the program choice of the student.

Ses

An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.

Read

A number indicating the mark of the student in a reading test.

Write

A number indicating the mark of the student in a writing test.

Sch_Size

A number indicating the size of the school the students belongs to.

Domain

A string indicating the domain each student belongs to. Possible values are "b" if student belongs to domain b or "ba" if student belongs to overlap domain.

ProbA

First order inclusion probability in frame A. This probability is 0 for students included in domain b.

ProbB

First order inclusion probability in frame B.

Details

The sample, of size nB=232n_B = 232, has been drawn from a population of NB=6500N_B = 6500 students according to a simple random sampling design. Nab=2000N_{ab} = 2000 of the students composing the population belongs also to frame A.

See Also

DatPopM

Examples

data(DatMB)
attach(DatMB)
#Let perform a brief descriptive analysis for the main variable
summary (Prog)
#And let do the same for the numerical auxiliary variables Read and Write
summary(Read)
summary(Write)

Database of auxiliary information for the whole population of students

Description

This dataset contains population information about the auxiliary variables of the population of students

Usage

DatPopM

Format

Ses

An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.

Read

A number indicating the mark of the student in a reading test.

Write

A number indicating the mark of the student in a writing test.

Domain

A string indicating the domain each student belongs to. Possible values are "a" if student belongs to domain a, "b" if student belongs to domain b or "ab" if student belongs to overlap domain.

Details

The population size is N=10000N = 10000.

See Also

DatMA DatMB

Examples

data(DatPopM)
attach(DatPopM)
#Let perform a brief descriptive analysis for the three auxiliary variables
summary (Ses)
summary(Read)
summary(Write)

Domains

Description

Given a main vector, an auxiliary one and a value of the latter, identifies positions of the auxiliary vector corresponding to values other than the given one. Then, turns zero values of the main vector corresponding to these positions.

Usage

Domains (y, domains, value)

Arguments

y

A numeric main vector of size n

domains

A numeric/character/logic auxiliary vector of size n

value

A value of the auxiliary vector

Value

A numeric vector, copy of y, with some values turned zero depending on values of domains and value

Examples

##########   Example 1   ##########
U <- c(13, 18, 20, 14, 9)
#Let build an auxiliary vector indicating whether values in U are above or below the mean.
aux <- c("Below", "Above", "Above", "Below", "Below")
#Now, only values below the mean remain, the other ones are turned zero.
Domains (U, aux, "Below")

##########   Example 2   ##########
data(DatA)
attach(DatA)
#Let calculate total feeding expenses corresponding to households in domain a.
sum (Domains (Feed, Domain, "a"))

Fuller-Burmeister estimator

Description

Produces estimates for population totals and means using the Fuller - Burmeister estimator from survey data obtained from a dual frame sampling desing. Confidence intervals are also computed, if required.

Usage

FB(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals.

Details

Fuller-Burmeister estimator of population total is given by

Y^FB=Y^aA+β1^Y^abA+(1β1^)Y^abB+Y^bB+β2^(N^abAN^abB)\hat{Y}_{FB} = \hat{Y}_a^A + \hat{\beta_1}\hat{Y}_{ab}^A + (1 - \hat{\beta_1})\hat{Y}_{ab}^B + \hat{Y}_b^B + \hat{\beta_2}(\hat{N}_{ab}^A - \hat{N}_{ab}^B)

where optimal values for β^\hat{\beta} to minimize variance of the estimator are:

(β^1β^2)=(V^(Y^abAY^abB)Cov^(Y^abAY^abB,N^abAN^abB)Cov^(Y^abAY^abB,N^abAN^abB)V^(N^abAN^abB))1×\left( \begin{array}{c} \hat{\beta}_1\\ \hat{\beta}_2 \end{array} \right) = - \left( \begin{array}{cc} \hat{V}(\hat{Y}_{ab}^A - \hat{Y}_{ab}^B) & \widehat{Cov}(\hat{Y}_{ab}^A - \hat{Y}_{ab}^B, \hat{N}_{ab}^A - \hat{N}_{ab}^B)\\ \widehat{Cov}(\hat{Y}_{ab}^A - \hat{Y}_{ab}^B, \hat{N}_{ab}^A - \hat{N}_{ab}^B) & \hat{V}(\hat{N}_{ab}^A - \hat{N}_{ab}^B) \end{array} \right)^{-1} \times

(Cov^(Y^aA+Y^bB+Y^abB,Y^abAY^abB)Cov^(Y^aA+Y^bB+Y^abB,N^abAN^abB))\left( \begin{array}{c} \widehat{Cov}(\hat{Y}_a^A + \hat{Y}_b^B + \hat{Y}_{ab}^B, \hat{Y}_{ab}^A - \hat{Y}_{ab}^B)\\ \widehat{Cov}(\hat{Y}_a^A + \hat{Y}_b^B + \hat{Y}_{ab}^B, \hat{N}_{ab}^A - \hat{N}_{ab}^B) \end{array} \right)

Due to Fuller-Burmeister estimator is not defined for estimating population sizes, estimation of the mean is computed as Y^FB/N^H\hat{Y}_{FB} / \hat{N}_H, where N^H\hat{N}_H is the estimation of the population size using Hartley estimator. Estimated variance for the Fuller-Burmeister estimator can be obtained through expression

V^(Y^FB)=V^(Y^aA)+V^(Y^B)+β^1[Cov^(Y^aA,Y^abA)Cov^(Y^B,Y^abB)]\hat{V}(\hat{Y}_{FB}) = \hat{V}(\hat{Y}_a^A) + \hat{V}(\hat{Y}^B) + \hat{\beta}_1[\widehat{Cov}(\hat{Y}_a^A, \hat{Y}_{ab}^A) - \widehat{Cov}(\hat{Y}^B, \hat{Y}_{ab}^B)]

+β^2[Cov^(Y^aA,N^abA)Cov^(Y^B,N^abB)]+ \hat{\beta}_2[\widehat{Cov}(\hat{Y}_a^A, \hat{N}_{ab}^A) - \widehat{Cov}(\hat{Y}^B, \hat{N}_{ab}^B)]

If both first and second order probabilities are known, variances and covariances involved in calculation of β^\hat{\beta} and V^(Y^FB)\hat{V}(\hat{Y}_{FB}) are estimated using functions VarHT and CovHT, respectively. If only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression

Cov^(X^,Y^)=V^(X+Y)V^(X)V^(Y)2\widehat{Cov}(\hat{X}, \hat{Y}) = \frac{\hat{V}(X + Y) - \hat{V}(X) - \hat{V}(Y)}{2}

Value

FB returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Fuller, W.A. and Burmeister, L.F. (1972). Estimation for Samples Selected From Two Overlapping Frames ASA Proceedings of the Social Statistics Sections, 245 - 249.

See Also

Hartley JackFB

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate Fuller-Burmeister estimator for variable Clothing
FB(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate Fuller-Burmeister estimator and a 90% confidence interval
#for variable Leisure, considering only first order inclusion probabilities
FB(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, 
DatB$Domain, 0.90)

Hartley estimator

Description

Produces estimates for population totals and means using Hartley estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

Hartley(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals.

Details

Hartley estimator of population total is given by

Y^H=Y^aA+θ^Y^abA+(1θ^)Y^abB+Y^bB\hat{Y}_H = \hat{Y}_a^A + \hat{\theta}\hat{Y}_{ab}^A + (1 - \hat{\theta})\hat{Y}_{ab}^B + \hat{Y}_b^B

where θ^[0,1]\hat{\theta} \in [0, 1]. Optimum value for θ^\hat{\theta} to minimize variance of the estimator is

θ^opt=V^(Y^abB)+Cov^(Y^bB,Y^abB)Cov^(Y^aA,Y^abA)V^(Y^abA)+V^(Y^abB)\hat{\theta}_{opt} = \frac{\hat{V}(\hat{Y}_{ab}^B) + \widehat{Cov}(\hat{Y}_b^B, \hat{Y}_{ab}^B) - \widehat{Cov}(\hat{Y}_a^A, \hat{Y}_{ab}^A)}{\hat{V}(\hat{Y}_{ab}^A) + \hat{V}(\hat{Y}_{ab}^B)}

Taking into account the independence between sAs_A and sBs_B, an estimator for the variance of the Hartley estimator can be obtained as follows:

V^(Y^H)=V^(Y^aA+θ^Y^abA)+V^((1θ^)Y^abB+Y^bB)\hat{V}(\hat{Y}_H) = \hat{V}(\hat{Y}_a^A + \hat{\theta}\hat{Y}_{ab}^A) + \hat{V}((1 - \hat{\theta})\hat{Y}_{ab}^B + \hat{Y}_b^B)

If both first and second order probabilities are known, variances and covariances involved in calculation of θ^opt\hat{\theta}_{opt} and V^(Y^H)\hat{V}(\hat{Y}_H) are estimated using functions VarHT and CovHT, respectively. If only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression

Cov^(X^,Y^)=V^(X+Y)V^(X)V^(Y)2\widehat{Cov}(\hat{X}, \hat{Y}) = \frac{\hat{V}(X + Y) - \hat{V}(X) - \hat{V}(Y)}{2}

Value

Hartley returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Hartley, H. O. (1962) Multiple Frames Surveys. Proceedings of the American Statistical Association, Social Statistics Sections, 203 - 206.

Hartley, H. O. (1974) Multiple frame methodology and selected applications. Sankhya C, Vol. 36, 99 - 118.

See Also

JackHartley

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate Hartley estimator for variable Feeding
Hartley(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate Hartley estimator and a 90% confidence interval
#for variable Leisure, considering only first order inclusion probabilities
Hartley(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, 
DatB$Domain, 0.90)

Horvitz - Thompson estimator

Description

Computes the Horvitz - Thompson estimator

Usage

HT(y, pik)

Arguments

y

A numeric vector of size n containing information about variable of interest

pik

A numeric vector of size n containing first order inclusion probabilities for units included in y

Details

Horvitz - Thompson estimator of population total is given by

Y^HT=ksykπk\hat{Y}_{HT} = \sum_{k \in s} \frac{y_k}{\pi_k}

Value

A numeric value representing Horvitz - Thompson estimator for population total for considered values

References

Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685

See Also

VarHT

Examples

##########   Example 1   ##########
U <- c(13, 18, 20, 14, 9)
#A simple random sample of size 2 without replacement is drawn from population
s <- sample(U, 2)
ps <- c(0.4, 0.4)
HT(s, ps)

##########   Example 2   ##########
data(DatA)
attach(DatA)
#Let estimate population total for variable Feeding in frame A
HT(Feed, ProbA)

Confidence intervals for Bankier-Kalton-Anderson estimator based on jackknife method

Description

Calculates confidence intervals for Bankier-Kalton-Anderson estimator using jackknife procedure

Usage

JackBKA(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, 
conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL,
clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAnA containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBnB containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

BKA

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Clothing,
#supposing a stratified sampling in frame A and a simple random sampling without
#replacement  in frame B with no finite population correction factor in any frame.
JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB,
DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs",
strA = DatA$Stratum)

#Let check how interval estimation varies when a finite 
#population correction factor is considered in both frames.
JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB,
DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs", 
strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)

Confidence intervals for dual frame calibration estimator based on jackknife method

Description

Calculates confidence intervals for dual frame calibration estimator using jackknife procedure

Usage

JackCalDF(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, 
xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", 
conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL,
clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A

N_B

(Optional) A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

xsAFrameA

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsAπjB/nB\overline{\pi}_B = \sum_{j \in s_A}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

CalDF

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Clothing,
#with frame sizes and overlap domain size known, supposing a stratified
#sampling in frame A and a simple random sampling without replacement 
#in frame B with no finite population correction factor in any frame.
JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, 
DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95,
sdA = "str", sdB = "srs", strA = DatA$Stratum)

#Finally, let consider a finite population correction factor in both frames.
JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, 
DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95,
sdA = "str", sdB = "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)

Confidence intervals for SF calibration estimator based on jackknife method

Description

Produces estimates for variance of SF calibration estimator using Jackknife procedure

Usage

JackCalSF(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, 
N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, 
xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL,  
X = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, 
strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAnA containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBnB containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A

N_B

(Optional) A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

xsAFrameA

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

CalSF

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Clothing,
#with frame sizes and overlap domain size known, supposing a stratified
#sampling in frame A and a simple random sampling without replacement 
#in frame B with no finite population correction factor in any frame
JackCalSF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, 
DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, 
N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs",
strA = DatA$Stratum)

Confidence intervals for Fuller-Burmeister estimator based on jackknife method

Description

Calculates confidence intervals for Fuller-Burmeister estimator using jackknife procedure

Usage

JackFB(ysA, ysB, piA, piB, domainsA, domains_B, conf_level, sdA = "srs", 
sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, 
fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

FB

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Clothing,
#supposing a stratified sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction factor
#in any frame.
JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, 
DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum)

#Let check how interval estimation varies when a finite
#population correction factor is considered in both frames.
JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain,
DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum,
fcpA = TRUE, fcpB = TRUE)

Confidence intervals for Hartley estimator based on jackknife method

Description

Calculates confidence intervals for Hartley estimator using jackknife procedure

Usage

JackHartley(ysA, ysB, piA, piB, domainsA, domainsB, conf_level, sdA = "srs", 
sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, 
fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

Hartley

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a stratified sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain,
DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum)

#Let check how interval estimation varies when a finite 
#population correction factor is considered in both frames.
JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain,
DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE,
fcpB = TRUE)

Confidence intervals for MLCDF estimator based on jackknife method

Description

Calculates confidence intervals for MLCDF estimator using jackknife procedure

Usage

JackMLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, 
ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level, sdA = "srs", 
sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, 
fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

xA

A numeric vector or length NAN_A or a numeric matrix or data frame of dimensions NAN_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information for the units in frame A.

xB

A numeric vector or length NBN_B or a numeric matrix or data frame of dimensions NBN_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information for the units in frame B.

ind_samA

A numeric vector of length nAn_A containing the identificators of units of the frame A (from 1 to NAN_A) that belongs to sAs_A.

ind_samB

A numeric vector of length nBn_B containing the identificators of units of the frame B (from 1 to NBN_B) that belongs to sBs_B.

ind_domA

A character vector of length NAN_A indicating the domain each unit from frame A belongs to. Possible values are "a" and "ab".

ind_domB

A character vector of length NBN_B indicating the domain each unit from frame B belongs to. Possible values are "b" and "ba".

N

A numeric value indicating the size of the population.

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLCDF

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

N <- nrow(DatPopM)
levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba")
DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba"


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, 
DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, 
DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, 
conf_level = 0.95, sdA = "pps", sdB = "srs")

Confidence intervals for MLCDW estimator based on jackknife method

Description

Calculates confidence intervals for MLCDW estimator using jackknife procedure

Usage

JackMLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, 
 ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level, sdA = "srs", 
 sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, 
 fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of an estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLCDW

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",])
N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",])


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, 
DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")

Confidence intervals for MLCSW estimator based on jackknife method

Description

Calculates confidence intervals for MLCSW estimator using jackknife procedure

Usage

JackMLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, 
 domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", 
 conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, 
 clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of an estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLCSW

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",])
N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",])


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, 
DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, 
IndSample, N_FrameA, N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")

Confidence intervals for MLDF estimator based on jackknife method

Description

Calculates confidence intervals for MLDF estimator using jackknife procedure

Usage

JackMLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, 
ind_samB, ind_domA, ind_domB, N, conf_level, sdA = "srs", sdB = "srs", strA = NULL, 
strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

xA

A numeric vector or length NAN_A or a numeric matrix or data frame of dimensions NAN_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information for the units in frame A.

xB

A numeric vector or length NBN_B or a numeric matrix or data frame of dimensions NBN_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information for the units in frame B.

ind_samA

A numeric vector of length nAn_A containing the identificators of units of the frame A (from 1 to NAN_A) that belongs to sAs_A.

ind_samB

A numeric vector of length nBn_B containing the identificators of units of the frame B (from 1 to NBN_B) that belongs to sBs_B.

ind_domA

A character vector of length NAN_A indicating the domain each unit from frame A belongs to. Possible values are "a" and "ab".

ind_domB

A character vector of length NBN_B indicating the domain each unit from frame B belongs to. Possible values are "b" and "ba".

N

A numeric value indicating the size of the population.

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLDF

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

N <- nrow(DatPopM)
levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba")
DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba"


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, 
DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, 
DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, 0.95, 
"pps", "srs")

Confidence intervals for MLDW estimator based on jackknife method

Description

Calculates confidence intervals for MLDW estimator using jackknife procedure

Usage

JackMLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, 
ind_sam, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, 
clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLDW

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, 
DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95, 
"pps", "srs")

Confidence intervals for MLSW estimator based on jackknife method

Description

Calculates confidence intervals for MLSW estimator using jackknife procedure

Usage

JackMLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, 
 domains_B, xsA, xsB, x, ind_sam, conf_level, sdA = "srs", sdB = "srs", 
 strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

MLSW

Examples

data(DatMA)
data(DatMB)
data(DatPopM)

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)


#Let obtain a 95% jackknife confidence interval for variable Feeding,
#supposing a pps sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackMLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, 
DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, 
IndSample, 0.95, "pps", "srs")

Confidence intervals for the pseudo empirical likelihood estimator based on jackknife method

Description

Calculates confidence intervals for pseudo empirical likelihood estimator using jackknife procedure

Usage

JackPEL(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, 
XA = NULL, XB = NULL, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, 
clusA = NULL,clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A

N_B

(Optional) A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

xsAFrameA

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

PEL


Confidence intervals for the pseudo maximum likelihood estimator based on jackknife method

Description

Calculates confidence intervals for pseudo maximum likelihood estimator using jackknife procedure

Usage

JackPML(ysA, ysB, piA, piB, domainsA, domainsB, N_A, N_B, conf_level, 
sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL,  
fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

PML

Examples

data(DatA)
data(DatB)

#Let obtain a 95% jackknife confidence interval for variable Leisure,
#supposing a stratified sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction
#factor in any frame.
JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, 
DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum)

#Let check how interval estimation varies when a finite 
#population correction factor is considered in both frames.
JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, 
DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum,
fcpA = TRUE, fcpB = TRUE)

Confidence intervals for raking ratio estimator based on jackknife method

Description

Calculates confidence intervals for raking ratio estimator using jackknife procedure

Usage

JackSFRR(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, N_A, 
N_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL,   
clusB = NULL, fcpA = FALSE, fcpB = FALSE)

Arguments

ysA

A numeric vector of length nAnA or a numeric matrix or data frame of dimensions nAnA x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBnB or a numeric matrix or data frame of dimensions nBnB x cc containing information about variable of interest from sBs_B.

piA

A numeric vector of length nAnA or a square numeric matrix of dimension nAnA containing first order or first and second order inclusion probabilities for units included in sAs_A.

piB

A numeric vector of length nBnB or a square numeric matrix of dimension nBnB containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAnA containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBnB containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sBs_B.

domainsA

A character vector of size nAnA indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domainsB

A character vector of size nBnB indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

conf_level

A numeric value indicating the confidence level for the confidence intervals.

sdA

(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

sdB

(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs".

strA

(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A.

strB

(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B.

clusA

(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A.

clusB

(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B.

fcpA

(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE.

fcpB

(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE.

Details

Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size nBln_{Bl} from the NBlN_{Bl} composing the l-th stratum is selected In this context, jackknife variance estimator of a estimator Y^c\hat{Y}_c is given by

vJ(Y^c)=nA1nAisA(Y^cA(i)YcA)2+l=1LnBl1nBlisBl(Y^cB(lj)YcBl)2v_J(\hat{Y}_c) = \frac{n_{A}-1}{n_{A}}\sum_{i\in s_A} (\hat{Y}_{c}^{A}(i) -\overline{Y}_{c}^{A})^2 + \sum_{l=1}^{L}\frac{n_{Bl}-1}{n_{Bl}} \sum_{i\in s_{Bl}} (\hat{Y}_{c}^{B}(lj) -\overline{Y}_{c}^{Bl})^2

with Y^cA(i)\hat{Y}_c^A(i) the value of estimator Y^c\hat{Y}_c after dropping ithi-th unit from ysA and YcA\overline{Y}_{c}^{A} the mean of values Y^cA(i)\hat{Y}_c^A(i). Similarly, Y^cB(lj)\hat{Y}_c^B(lj) is the value taken by Y^c\hat{Y}_c after dropping j-th unit of l-th from sample ysB and YcBl\overline{Y}_{c}^{Bl} is the mean of values Y^cB(lj)\hat{Y}_c^B(lj). If needed, a finite population correction factor can be included in frames by replacing Y^cA(i)\hat{Y}_{c}^{A}(i) or Y^cB(lj)\hat{Y}_{c}^{B}(lj) with Y^cA(i)=Y^c+1πA(Y^cA(i)Y^c)\hat{Y}_{c}^{A*}(i)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_A} (\hat{Y}_{c}^{A}(i) -\hat{Y}_{c}) or Y^cB(lj)=Y^c+1πB(Y^cB(lj)Y^c)\hat{Y}_{c}^{B*}(lj)= \hat{Y}_{c}+\sqrt{1-\overline{\pi}_B} (\hat{Y}_{c}^{B}(lj) -\hat{Y}_{c}), where πA=isAπiA/nA\overline{\pi}_A = \sum_{i \in s_A}\pi_{iA}/nA and πB=jsBπjB/nB\overline{\pi}_B = \sum_{j \in s_B}\pi_{jB}/nB A confidence interval for any parameter of interest, YY can be calculated, then, using the pivotal method.

Value

A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.

References

Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.

See Also

SFRR

Examples

data(DatA)
data(DatB) 

#Let obtain a 95% jackknife confidence interval for variable Leisure,
#supposing a stratified sampling in frame A and a simple random sampling
#without replacement in frame B with no finite population correction 
#factor in any frame.
JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, 
DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs",
strA = DatA$Stratum)

#Let check how interval estimation varies when a finite 
#population correction factor is considered in both frames.
JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, 
DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", 
strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)

Multinomial logistic calibration estimator under dual frame approach with auxiliary information from each frame

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated dual frame approach with a possibly different set of auxiliary variables for each frame. Confidence intervals are also computed, if required.

Usage

MLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, 
 ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xA

A numeric vector or length NAN_A or a numeric matrix or data frame of dimensions NAN_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information for the units in frame A.

xB

A numeric vector or length NBN_B or a numeric matrix or data frame of dimensions NBN_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information for the units in frame B.

ind_samA

A numeric vector of length nAn_A containing the identificators of units of the frame A (from 1 to NAN_A) that belongs to sAs_A.

ind_samB

A numeric vector of length nBn_B containing the identificators of units of the frame B (from 1 to NBN_B) that belongs to sBs_B.

ind_domA

A character vector of length NAN_A indicating the domain each unit from frame A belongs to. Possible values are "a" and "ab".

ind_domB

A character vector of length NBN_B indicating the domain each unit from frame B belongs to. Possible values are "b" and "ba".

N

A numeric value indicating the size of the population.

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic calibration estimator in dual frame using auxiliary information from each frame for a proportion is given by

P^MLCiDF=1N(ksAsBwkzki),i=1,...,m\hat{P}_{MLCi}^{DF} = \frac{1}{N} \left(\sum_{k \in s_A \cup s_B} w_k^{\circ} z_{ki}\right), \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, and ww^{\circ} calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are known, calibration constraints are

ksawk=Na,ksabwk=ηNab,ksbawk=(1η)Nabksbwk=Nb,\sum_{k \in s_a}w_k^{\circ} = N_a, \sum_{k \in s_{ab}}w_k^{\circ} = \eta N_{ab}, \sum_{k \in s_{ba}}w_k^{\circ} = (1 - \eta) N_{ab}\sum_{k \in s_{b}}w_k^{\circ} = N_{b},

ksAwkpkiA=kUapkiA+ηkUabpkiA\sum_{k \in s_A}w_k^\circ p_{ki}^A = \sum_{k \in U_a} p_{ki}^A + \eta \sum_{k \in U_{ab}} p_{ki}^A

and

ksBwkpkiB=kUbpkiB+(1η)kUbapkiB\sum_{k \in s_B}w_k^\circ p_{ki}^B = \sum_{k \in U_b} p_{ki}^B + (1 - \eta) \sum_{k \in U_{ba}} p_{ki}^B

with η(0,1)\eta \in (0,1) and

pkiA=exp(xkβiA)r=1mexp(xkβrA),p_{ki}^A = \frac{exp(x_k^{'}\beta_i^A)}{\sum_{r=1}^m exp(x_k^{'}\beta_r^A)},

being βiA\beta_i^A the maximum likelihood parameters of the multinomial logistic model considering original design weights dAd^A. pkiBp_{ki}^B can be defined similarly.

Value

MLCDF returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

See Also

JackMLCDF

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

N <- nrow(DatPopM)
levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba")
DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba"

#Let calculate proportions of categories of variable Prog using MLCDF estimator
#using Read as auxiliary variable
MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N)

#Let obtain 95% confidence intervals together with the estimations
MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)

Multinomial logistic calibration estimator under dual frame approach with auxiliary information from the whole population

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated dual frame approach with auxiliary information from the whole population. Confidence intervals are also computed, if required.

Usage

MLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, 
 N_B, N_ab = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic calibration estimator in dual frame using auxiliary information from the whole population for a proportion is given by

P^MLCiDW=1N(ksAsBwkzki),i=1,...,m\hat{P}_{MLCi}^{DW} = \frac{1}{N} \left(\sum_{k \in s_A \cup s_B} w_k^{\circ} z_{ki}\right), \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, and ww^{\circ} calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are known, calibration constraints are

ksawk=Na,ksabwk=ηNab,ksbawk=(1η)Nab,ksbwk=Nb\sum_{k \in s_a}w_k^{\circ} = N_a, \sum_{k \in s_{ab}}w_k^{\circ} = \eta N_{ab}, \sum_{k \in s_{ba}}w_k^{\circ} = (1 - \eta) N_{ab}, \sum_{k \in s_{b}}w_k^{\circ} = N_{b}

and

ksAsBwkpki=kUpki\sum_{k \in s_A \cup s_B}w_k^\circ p_{ki}^{\circ} = \sum_{k \in U} p_{ki}^\circ

with η(0,1)\eta \in (0,1) and

pki=exp(xkβi)r=1mexp(xkβr),p_{ki}^{\circ} = \frac{exp(x_k^{'}\beta_i^{\circ})}{\sum_{r=1}^m exp(x_k^{'}\beta_r^{\circ})},

being βi\beta_i^\circ the maximum likelihood parameters of the multinomial logistic model considering weights dk={dkAif kaηdkAif kab(1η)dkBif kbadkBif kbd_k^{\circ} =\left\{\begin{array}{lcc} d_k^A & \textrm{if } k \in a\\ \eta d_k^A & \textrm{if } k \in ab\\ (1 - \eta) d_k^B & \textrm{if } k \in ba \\ d_k^B & \textrm{if } k \in b \end{array} \right..

Value

MLCDW returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

See Also

JackMLCDW

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",])
N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",])
N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",])
#Let calculate proportions of categories of variable Prog using MLCDW estimator
#using Read as auxiliary variable
MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB)

#Now, let suppose that the overlap domian size is known
MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab)

#Let obtain 95% confidence intervals together with the estimations
MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab,
conf_level = 0.95)

Multinomial logistic calibration estimator under single frame approach with auxiliary information from the whole population

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated single frame approach with auxiliary information from the whole population. Confidence intervals are also computed, if required.

Usage

MLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB,
 x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic calibration estimator in single frame using auxiliary information from the whole population for a proportion is given by

P^MLCiSW=1N(ksAsBw~kzki)i=1,...,m\hat{P}_{MLCi}^{SW} = \frac{1}{N} \left(\sum_{k \in s_A \cup s_B} \tilde{w}_k z_{ki}\right) \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, and w~\tilde{w} calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are known, calibration constraints are

ksaw~k=Na,ksabsbaw~k=Nab,ksbaw~k=Nba\sum_{k \in s_a}\tilde{w}_k = N_a, \sum_{k \in s_{ab} \cup s_{ba}}\tilde{w}_k = N_{ab}, \sum_{k \in s_{ba}}\tilde{w}_k = N_{ba}

and

ksAsBw~kp~ki=kUp~ki\sum_{k \in s_A \cup s_B}\tilde{w}_k \tilde{p}_{ki} = \sum_{k \in U} \tilde{p}_{ki}

with

p~ki=exp(xkβi~)r=1mexp(xkβr~),\tilde{p}_{ki} = \frac{exp(x_k^{'}\tilde{\beta_i})}{\sum_{r=1}^m exp(x_k^{'}\tilde{\beta_r})},

being βi~\tilde{\beta_i} the maximum likelihood parameters of the multinomial logistic model considering weights d~k={dkAif ka(1/dkA+1/dkB)1if kabbadkBif kb\tilde{d}_k =\left\{\begin{array}{lcc} d_k^A & \textrm{if } k \in a\\ (1/d_k^A + 1/d_k^B)^{-1} & \textrm{if } k \in ab \cup ba \\ d_k^B & \textrm{if } k \in b \end{array} \right..

Value

MLCSW returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

See Also

JackMLCSW

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",])
N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",])
N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",])
#Let calculate proportions of categories of variable Prog using MLCSW estimator
#using Read as auxiliary variable
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB)

#Now, let suppose that the overlap domian size is known
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB, N_Domainab)

#Let obtain 95% confidence intervals together with the estimations
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB, N_Domainab, conf_level = 0.95)

Multinomial logistic estimator under dual frame approach with auxiliary information from each frame

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model assisted approach with a possibly different set of auxiliary variables for each frame. Confidence intervals are also computed, if required.

Usage

MLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, 
 ind_samB, ind_domA, ind_domB, N, conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xA

A numeric vector or length NAN_A or a numeric matrix or data frame of dimensions NAN_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information for the units in frame A.

xB

A numeric vector or length NBN_B or a numeric matrix or data frame of dimensions NBN_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information for the units in frame B.

ind_samA

A numeric vector of length nAn_A containing the identificators of units of the frame A (from 1 to NAN_A) that belongs to sAs_A.

ind_samB

A numeric vector of length nBn_B containing the identificators of units of the frame B (from 1 to NBN_B) that belongs to sBs_B.

ind_domA

A character vector of length NAN_A indicating the domain each unit from frame A belongs to. Possible values are "a" and "ab".

ind_domB

A character vector of length NBN_B indicating the domain each unit from frame B belongs to. Possible values are "b" and "ba".

N

A numeric value indicating the size of the population.

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic estimator in dual frame using auxiliary information from each frame for a proportion is given by

P^MLiDF=1N(kUapkiA+ηkUabpkiA+(1η)kUbapkiB+kUbpkiB\hat{P}_{MLi}^{DF} = \frac{1}{N} \left(\sum_{k \in U_a} p_{ki}^A + \eta \sum_{k \in U_{ab}} p_{ki}^A + (1 - \eta) \sum_{k \in U_{ba}} p_{ki}^B + \sum_{k \in U_b} p_{ki}^B \right.

+ksadkA(zkipkiA)+ηksabdkA(zkipkiA)+ \sum_{k \in s_a} d_k^A (z_{ki} - p_{ki}^A) + \eta \sum_{k \in s_{ab}} d_k^A (z_{ki} - p_{ki}^A)

+(1η)ksbadkB(zkipkiB)+ksbdkB(zkipkiB)),i=1,...,m\left. + (1 - \eta) \sum_{k \in s_{ba}} d_k^B (z_{ki} - p_{ki}^B) + \sum_{k \in s_b} d_k^B (z_{ki} - p_{ki}^B)\right), \hspace{0.3cm} i = 1,...,m

with η(0,1)\eta \in (0,1), mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, dAd^A and dBd^B the design weights for each frame, defined as the inverse of the first order inclusion probabilities and

pkiA=exp(xkβiA)r=1mexp(xkβrA),p_{ki}^A = \frac{exp(x_k^{'}\beta_i^A)}{\sum_{r=1}^m exp(x_k^{'}\beta_r^A)},

being βiA\beta_i^A the maximum likelihood parameters of the multinomial logistic model considering weights dAd^A. pkiBp_{ki}^B can be defined similarly.

Value

MLDF returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.

See Also

JackMLDF

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

N <- nrow(DatPopM)
levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba")
DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba"

#Let calculate proportions of categories of variable Prog using MLDF estimator
#using Read as auxiliary variable
MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N)

#Let obtain 95% confidence intervals together with the estimations
MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)

Multinomial logistic estimator under dual frame approach with auxiliary information from the whole population

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a dual frame model assisted approach. Confidence intervals are also computed, if required.

Usage

MLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, 
 conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic estimator in dual frame using auxiliary information from the whole population for a proportion is given by

P^MLiDW=1N(kUpki+ksdk(zkipki))i=1,...,m\hat{P}_{MLi}^{DW} = \frac{1}{N} (\sum_{k \in U} p_{ki}^{\circ} + \sum_{k \in s} {d}_k^{\circ} (z_{ki} - p_{ki}^{\circ})) \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, dk={dkAif kaηdkAif kab(1η)dkBif kbadkBif kbd_k^{\circ} =\left\{\begin{array}{lcc} d_k^A & \textrm{if } k \in a\\ \eta d_k^A & \textrm{if } k \in ab\\ (1 - \eta) d_k^B & \textrm{if } k \in ba \\ d_k^B & \textrm{if } k \in b \end{array} \right. with η(0,1)\eta \in (0,1) and

pki=exp(xkβi)r=1mexp(xkβr),p_{ki}^\circ = \frac{exp(x_k^{'}\beta_i^{\circ})}{\sum_{r=1}^m exp(x_k^{'}\beta_r^{\circ})},

being βi\beta_i^{\circ} the maximum likelihood parameters of the multinomial logistic model considering the weights dd^{\circ}.

Value

MLDW returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.

See Also

JackMLDW

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
#Let calculate proportions of categories of variable Prog using MLDW estimator
#using Read as auxiliary variable
MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopM$Read, IndSample)

#Let obtain 95% confidence intervals together with the estimations
MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95)

Multinomial logistic estimator under single frame approach with auxiliary information from the whole population

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design with the same set of auxiliary variables for the whole population. Confidence intervals are also computed, if required.

Usage

MLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, 
x, ind_sam, conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic estimator in single frame using auxiliary information from the whole population for a proportion is given by

P^MLiSW=1N(kUp~ki+ksd~k(zkip~ki))i=1,...,m\hat{P}_{MLi}^{SW} = \frac{1}{N} \left(\sum_{k \in U} \tilde{p}_{ki} + \sum_{k \in s} \tilde{d}_k (z_{ki} - \tilde{p}_{ki})\right) \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, d~k={dkAif ka(1/dkA+1/dkB)1if kabbadkBif kb\tilde{d}_k =\left\{\begin{array}{lcc} d_k^A & \textrm{if } k \in a\\ (1/d_k^A + 1/d_k^B)^{-1} & \textrm{if } k \in ab \cup ba \\ d_k^B & \textrm{if } k \in b \end{array} \right. and

p~ki=exp(xkβi~)r=1mexp(xkβr~),\tilde{p}_{ki} = \frac{exp(x_k^{'}\tilde{\beta_i})}{\sum_{r=1}^m exp(x_k^{'}\tilde{\beta_r})},

being βi~\tilde{\beta_i} the maximum likelihood parameters of the multinomial logistic model considering weights d~\tilde{d}.

Value

PMLSW returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.

See Also

JackMLSW

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
#Let calculate proportions of categories of variable Prog using MLSW estimator
#using Read as auxiliary variable
MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample)

#Let obtain 95% confidence intervals together with the estimations
MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample,
conf_level = 0.95)

Pseudo empirical likelihood estimator

Description

Produces estimates for population totals using the pseudo empirical likelihood estimator from survey data obtained from a dual frame sampling design. Confidence intervals for the population total are also computed, if required.

Usage

PEL(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, 
XA = NULL, XB = NULL, conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A.

N_B

(Optional) A numeric value indicating the size of frame B.

N_ab

(Optional) A numeric value indicating the size of the overlap domain.

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Pseudo empirical likelihood estimator for the population mean is computed as

Yˉ^PEL=NaNYˉ^a+ηNabNYˉ^abA+(1η)NabNYˉ^abB+NbNYˉ^b\hat{\bar{Y}}_{PEL} = \frac{N_a}{N}\hat{\bar{Y}}_a + \frac{\eta N_{ab}}{N}\hat{\bar{Y}}_{ab}^A + \frac{(1 - \eta) N_{ab}}{N}\hat{\bar{Y}}_{ab}^B + \frac{N_b}{N}\hat{\bar{Y}}_b

where Yˉ^a=ksap^akyk,Yˉ^ab=ksabAp^abkAyk,Yˉ^abB=ksabBp^abkByk\hat{\bar{Y}}_a = \sum_{k \in s_a}\hat{p}_{ak}y_k, \hat{\bar{Y}}_{ab} = \sum_{k \in s_{ab}^A}\hat{p}_{abk}^Ay_k, \hat{\bar{Y}}_{ab}^B = \sum_{k \in s_{ab}^B}\hat{p}_{abk}^By_k and Yˉ^b=ksbp^bkyk\hat{\bar{Y}}_b = \sum_{k \in s_b}\hat{p}_{bk}y_k with p^ak,p^abkA,p^abkB\hat{p}_{ak}, \hat{p}_{abk}^A, \hat{p}_{abk}^B and p^bk\hat{p}_{bk} the weights resulting of applying the pseudo empirical likelihood procedure to a determined function under a determined set of constraints, depending on the case. Furthermore, η(0,1)\eta \in (0,1). In this case, NA,NBN_A, N_B and NabN_{ab} have been supposed known and no additional auxiliary variables have been considered. This is not happening in some cases. Function covers following scenarios:

  • There is not any additional auxiliary variable

    • NA,NBN_A, N_B and NabN_{ab} unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

  • At least, one additional auxiliary variable is available

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

Explicit variance of this estimator is not easy to obtain. Instead, confidence intervals can be computed through the bi-section method. This method constructs intervals in the form {θrns(θ)<χ12(α)}\{\theta|r_{ns}(\theta) < \chi_1^2(\alpha)\}, where χ12(α)\chi_1^2(\alpha) is the 1α1 - \alpha quantile from a χ2\chi^2 distribution with one degree of freedom and rns(θ)r_{ns}(\theta) represents the so called pseudo empirical log likelihood ratio statistic, which can be obtained as a difference of two pseudo empirical likelihood functions.

Value

PEL returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Rao, J. N. K. and Wu, C. (2010) Pseudo Empirical Likelihood Inference for Multiple Frame Surveys. Journal of the American Statistical Association, 105, 1494 - 1503.

Wu, C. (2005) Algorithms and R codes for the pseudo empirical likelihood methods in survey sampling. Survey Methodology, Vol. 31, 2, pp. 239 - 243.

See Also

JackPEL

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate pseudo empirical likelihood estimator for variable Feeding, without
#considering any auxiliary information
PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate pseudo empirical estimator for variable Clothing when the frame
#sizes and the overlap domain size are known
PEL(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate pseudo empirical likelihood estimator and a 90% confidence interval
#for population total for variable Feeding, considering Income and Metres2 as auxiliary 
#variables and with frame sizes and overlap domain size known.
PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, 
xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, 
conf_level = 0.90)

Matrix of inclusion probabilities for units selected in sample from frame A

Description

This dataset consists of a square matrix of dimension 105 with the first and second order inclusion probabilities for the units included in sample sAs_A, which has been drawn from a population of size NA=1735N_A = 1735 according to a stratified random sampling with population strata sizes NAh=(727,375,113,186,115,219)N_A^h = (727, 375, 113, 186, 115, 219)

Usage

PiklA

See Also

DatA

Examples

data(PiklA)
#Let choose the submatrix of inclusion probabilities for the first 5 units sA.
PiklA[1:5, 1:5]
#Now, let select only the first order inclusion probabilities
diag(PiklA)

Matrix of inclusion probabilities for units selected in sample from frame B

Description

This dataset consists of a square matrix of dimension 135 with the first and second order inclusion probabilities for the units included in sBs_B, which has been drawn from a population of size NB=1191N_B = 1191 according to a simple random sampling without replacement.

Usage

PiklB

See Also

DatB

Examples

data(PiklB)
#Let choose the submatrix of inclusion probabilities for the first 5 units in sB.
PiklB[1:5, 1:5]
#Now, let select the first order inclusion probabilities
diag(PiklB)

Pseudo Maximum Likelihood estimator

Description

Produces estimates for population totals and means using PML estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.

Usage

PML(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A, N_B, conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Pseudo Maximum Likelihood estimator of population total is given by

Y^PML(θ^)=NAN^ab,PMLN^aY^aA+NBN^ab,PMLN^bY^bB+N^ab,PMLθ^N^abA+(1θ^)N^abB[θ^Y^abA+(1θ^)Y^abB]\hat{Y}_{PML}(\hat{\theta}) = \frac{N_A - \hat{N}_{ab,PML}}{\hat{N}_a}\hat{Y}_a^A + \frac{N_B - \hat{N}_{ab,PML}}{\hat{N}_b}\hat{Y}_b^B + \frac{\hat{N}_{ab,PML}}{\hat{\theta}\hat{N}_{ab}^A + (1 - \hat{\theta})\hat{N}_{ab}^B}[\hat{\theta}\hat{Y}_{ab}^A + (1 - \hat{\theta})\hat{Y}_{ab}^B]

where θ^[0,1]\hat{\theta} \in [0, 1] and N^ab,PML\hat{N}_{ab,PML} is the smaller of the roots of the quadratic equation

[θ^/NB+(1θ^)/NA]x2[1+θ^N^abA/NB+(1θ^)N^abB/NA]x+θ^N^abA+(1θ^)N^abB=0.[\hat{\theta}/N_B + (1 - \hat{\theta})/N_A]x^2 - [1 + \hat{\theta}\hat{N}_{ab}^A/N_B + (1 - \hat{\theta})\hat{N}_{ab}^B/N_A]x + \hat{\theta}\hat{N}_{ab}^A + (1 - \hat{\theta})\hat{N}_{ab}^B=0.

Optimal value for θ^\hat{\theta} is N^aNBV^(N^abB)N^aNBV^(N^abB)+N^bNAV^(N^abA)\frac{\hat{N}_aN_B\hat{V}(\hat{N}_{ab}^B)}{\hat{N}_aN_B\hat{V}(\hat{N}_{ab}^B) + \hat{N}_bN_A\hat{V}(\hat{N}_{ab}^A)}. Variance is estimated according to following expression

V^(Y^PML(θ^))=V^(isAz~iA)+V^(isBz~iB)\hat{V}(\hat{Y}_{PML}(\hat{\theta})) = \hat{V}(\sum_{i \in s_A}\tilde{z}_i^A) + \hat{V}(\sum_{i \in s_B}\tilde{z}_i^B)

where, z~iA=yiY^aN^a\tilde{z}_i^A = y_i - \frac{\hat{Y}_a}{\hat{N}_a} if iai \in a and z~iA=γ^opt(yiY^aN^a)+λ^ϕ^\tilde{z}_i^A = \hat{\gamma}_{opt}(y_i - \frac{\hat{Y}_a}{\hat{N}_a}) + \hat{\lambda} \hat{\phi} if iabi \in ab with

γ^opt=N^aNBV^(N^abB)N^aNBV^(N^abB)+N^b+NA+V^(N^abA)\hat{\gamma}_{opt} = \frac{\hat{N}_a N_B \hat{V}(\hat{N}_{ab}^B)}{\hat{N}_a N_B \hat{V}(\hat{N}_{ab}^B) + \hat{N}_b + N_A + \hat{V}(\hat{N}_{ab}^A)}

λ^=nA/NAY^abA+nB/NBY^abBnA/NAN^abA+nB/NBN^abBY^aN^aY^bN^b\hat{\lambda} = \frac{n_A/N_A \hat{Y}_{ab}^A + n_B/N_B \hat{Y}_{ab}^B}{n_A/N_A \hat{N}_{ab}^A + n_B/N_B \hat{N}_{ab}^B} - \frac{\hat{Y}_a}{\hat{N}_a} - \frac{\hat{Y}_b}{\hat{N}_b}

ϕ^=nAN^bnAN^b+nBN^a\hat{\phi} = \frac{n_A \hat{N}_b}{n_A \hat{N}_b + n_B\hat{N}_a}

Similarly, we define z~iB=yiY^bN^b\tilde{z}_i^B = y_i - \frac{\hat{Y}_b}{\hat{N}_b} if ibi \in b and z~iB=(1γ^opt)(yiY^baN^ab)+λ^(1ϕ^)\tilde{z}_i^B = (1 - \hat{\gamma}_{opt})(y_i - \frac{\hat{Y}_{ba}}{\hat{N}_{ab}}) + \hat{\lambda}(1 - \hat{\phi}) if ibai \in ba

Value

PML returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Skinner, C. J. and Rao, J. N. K. (1996) Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 433, 349 - 356.

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate Pseudo Maximum Likelihood estimator for population total for variable Clothing
PML(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191)

#Now, let calculate Pseudo Maximum Likelihood estimator for population total for variable
#Feeding, using first order inclusion probabilities
PML(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191)

#Finally, let calculate Pseudo Maximum Likelihood estimator and a 90% confidence interval for 
#population total for variable Leisure
PML(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191, 0.90)

Raking ratio estimator

Description

Produces estimates for population total and mean using the raking ratio estimator from survey data obtained from a dual frame sampling desing. Confidence intervals are also computed, if required.

Usage

SFRR(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A, N_B, 
conf_level = NULL)

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sAs_A.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sAs_A belongs to. Possible values are "b" and "ba".

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Raking ratio estimator of population total is given by

Y^SFRR=NAN^ab,rakeN^aAY^aA+NBN^ab,rakeN^bBY^bB+N^ab,rakeN^abSY^abS\hat{Y}_{SFRR} = \frac{N_A - \hat{N}_{ab,rake}}{\hat{N}_a^A}\hat{Y}_a^A + \frac{N_B - \hat{N}_{ab,rake}}{\hat{N}_b^B}\hat{Y}_b^B + \frac{\hat{N}_{ab,rake}}{\hat{N}_{abS}}\hat{Y}_{abS}

where Y^abS=isabAd~iAyi+isabBd~iByi,N^abS=isabAd~iA+isabBd~iB\hat{Y}_{abS} = \sum_{i \in s_{ab}^A}\tilde{d}_i^Ay_i + \sum_{i \in s_{ab}^B}\tilde{d}_i^By_i, \hat{N}_{abS} = \sum_{i \in s_{ab}^A}\tilde{d}_i^A + \sum_{i \in s_{ab}^B}\tilde{d}_i^B and N^ab,rake\hat{N}_{ab,rake} is the smallest root of the quadratic equation N^ab,rakex2[N^ab,rake(NA+NB)+N^aSN^bS]x+N^ab,rakeNANB=0\hat{N}_{ab,rake}x^2 - [\hat{N}_{ab,rake}(N_A + N_B) + \hat{N}_{aS}\hat{N}_{bS}]x + \hat{N}_{ab,rake}N_AN_B = 0, with N^aS=saAd~iB\hat{N}_{aS} = \sum_{s_a^A}\tilde{d}_i^B and N^bS=sbBd~iB\hat{N}_{bS} = \sum_{s_b^B}\tilde{d}_i^B. Weights d~iA\tilde{d}_i^A and d~iB\tilde{d}_i^B are obtained as follows d~iA={diAif ia(1/diA+1/diB)1if iab\tilde{d}_i^A =\left\{\begin{array}{lcc} d_i^A & \textrm{if } i \in a\\ (1/d_i^A + 1/d_i^B)^{-1} & \textrm{if } i \in ab \end{array} \right. and d~iB={diBif ib(1/diA+1/diB)1if iba\tilde{d}_i^B =\left\{\begin{array}{lcc} d_i^B & \textrm{if } i \in b\\ (1/d_i^A + 1/d_i^B)^{-1} & \textrm{if } i \in ba \end{array} \right. being diAd_i^A and diBd_i^B the design weights, obtained as the inverse of the first order inclusion probabilities, that is diA=1/πiAd_i^A = 1/\pi_i^A and diB=1/πiBd_i^B = 1/\pi_i^B.

To obtain an estimator of the variance for this estimator, one has taken into account that raking ratio estimator coincides with SF calibration estimator when frame sizes are known and "raking" method is used. So, one can use here Deville's expression to calculate an estimator for the variance of the raking ratio estimator

V^(Y^SFRR)=11ksak2ks(1πk)(ekπklsalelπl)2\hat{V}(\hat{Y}_{SFRR}) = \frac{1}{1-\sum_{k\in s} a_k^2}\sum_{k\in s}(1-\pi_k)\left(\frac{e_k}{\pi_k} - \sum_{l\in s} a_{l} \frac{e_l}{\pi_l}\right)^2

where ak=(1πk)/ls(1πl)a_k=(1-\pi_k)/\sum_{l\in s} (1-\pi_l) and eke_k are the residuals of the regression with auxiliary variables as regressors.

Value

SFRR returns an object of class "EstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

total and mean estimation for main variable(s).

VarEst

variance estimation for main variable(s).

If parameter conf_level is different from NULL, object includes component

ConfInt

total and mean estimation and confidence intervals for main variables(s).

In addition, components TotDomEst and MeanDomEst are available when estimator is based on estimators of the domains. Component Param shows value of parameters involded in calculation of the estimator (if any). By default, only Est component (or ConfInt component, if parameter conf_level is different from NULL) is shown. It is possible to access to all the components of the objects by using function summary.

References

Lohr, S. and Rao, J.N.K. (2000). Inference in Dual Frame Surveys. Journal of the American Statistical Association, Vol. 95, 271 - 280.

Rao, J.N.K. and Skinner, C.J. (1996). Estimation in Dual Frame Surveys with Complex Designs. Proceedings of the Survey Method Section, Statistical Society of Canada, 63 - 68.

Skinner, C.J. and Rao J.N.K. (1996). Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 443, 349 - 356.

Skinner, C.J. (1991). On the Efficiency of Raking Ratio Estimation for Multiple Frame Surveys. Journal of the American Statistical Association, Vol. 86, 779 - 784.

See Also

JackSFRR

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate raking ratio estimator for population total for variable Clothing
SFRR(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, 
DatB$Domain, 1735, 1191)

#Now, let calculate raking ratio estimator and a 90% confidence interval for 
#population total for variable Feeding, considering only first order inclusion probabilities
SFRR(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain, 1735, 1191, 0.90)

Variance estimator of Horvitz - Thompson estimator

Description

Computes the variance estimator of Horvitz - Thompson estimator of population total

Usage

VarHT(y, pikl)

Arguments

y

A numeric vector of size n containing information about variable of interest

pikl

A square numeric matrix of dimension n containing first and second order inclusion probabilities for units included in y

Details

Variance estimator of Horvitz - Thompson estimator of population total is given by

Var^(Y^HT)=ksyk2πk2(1πk)+ksls,lkykylπkπlπklπkπlπkl\hat{Var}(\hat{Y}_{HT}) = \sum_{k \in s}\frac{y_k^2}{\pi_k^2}(1 - \pi_k) + \sum_{k \in s}\sum_{l \in s, l \neq k} \frac{y_k y_l}{\pi_k \pi_l} \frac{\pi_{kl} - \pi_k \pi_l}{\pi_{kl}}

Value

A numeric value representing variance estimator of Horvitz - Thompson estimator for population total for considered values

References

Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685

Sarndal, C. E., Swensson, B. and Wretman, J. (1992) Model Assisted Survey Sampling. Springer-Verlag. New York.

See Also

HT CovHT

Examples

##########   Example 1   ##########
U <- c(13, 18, 20, 14, 9)
#A simple random sample of size 2 without replacement is drawn from population
s <- sample(U, 2)
#Horvitz - Thompson estimator of population total is calculated.
ps <- c(0.4, 0.4)
HT(s, ps)
#Now, we calculate variance estimator of the Horvitz - Thompson estimator.
Ps <- matrix(c(0.4,0.1, 0.1,0.4), 2 ,2)
VarHT(s, Ps)

##########   Example 2   ##########
data(DatA)
attach(DatA)
data(PiklA)

#Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A.
HT(Clo, ProbA)
#And now, let compute the variance of the previous estimator
VarHT(Clo, PiklA)

g-weights for the dual frame calibration estimator

Description

Computes the g-weights for the dual frame calibration estimator.

Usage

WeightsCalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, 
N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, 
xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear")

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sBs_B.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

domains_A

A character vector of length nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of length nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A.

N_B

(Optional) A numeric value indicating the size of frame B.

N_ab

(Optional) A numeric value indicating the size of the overlap domain.

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

Details

Function provides g-weights in following scenarios:

  • There is not any additional auxiliary variable

    • NA,NBN_A, N_B and NabN_{ab} unknown

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

  • At least, one additional auxiliary variable is available

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

Value

A numeric vector containing the g-weights for the dual frame calibration estimator.

References

Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimationn in dual frame surveys. arXiv:1312.0761 [stat.ME]

Deville, J. C., S\"arndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate g-weights for the dual frame calibration estimator for variable Feeding, 
#without considering any auxiliary information
WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)

#Now, let calculate g-weights for the dual frame calibration estimator for variable Clothing 
#when the frame sizes and the overlap domain size are known
WeightsCalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate g-weights for the dual frame calibration estimator
#for variable Feeding, considering Income as auxiliary variable in frame A
#and Metres2 as auxiliary variable in frame B and with frame sizes and overlap 
#domain size known.
WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, 
N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, 
xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553)

g-weights for the SF calibration estimator

Description

Computes the g-weights for the SF calibration estimator.

Usage

WeightsCalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, 
N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, 
xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, 
met = "linear")

Arguments

ysA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x cc containing information about variable(s) of interest from sAs_A.

ysB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x cc containing information about variable(s) of interest from sAs_A.

pi_A

A numeric vector of length nAn_A or a square numeric matrix of dimension nAn_A containing first order or first and second order inclusion probabilities for units included in sAs_A.

pi_B

A numeric vector of length nBn_B or a square numeric matrix of dimension nBn_B containing first order or first and second order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling desing in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling desing in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

N_A

(Optional) A numeric value indicating the size of frame A

N_B

(Optional) A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

xsAFrameA

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sAs_A.

xsBFrameA

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mAm_A, with mAm_A the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in sBs_B. For units in domain bb, these values are 0.

xsAFrameB

(Optional) A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sAs_A. For units in domain aa, these values are 0.

xsBFrameB

(Optional) A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mBm_B, with mBm_B the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in sBs_B.

xsT

(Optional) A numeric vector of length nn or a numeric matrix or data frame of dimensions nn x mTm_T, with mTm_T the number of auxiliary variables in both frames, containing auxiliary information for all units in the entire sample s=sAsBs = s_A \cup s_B.

XA

(Optional) A numeric value or vector of length mAm_A, with mAm_A the number of auxiliary variables in frame A, indicating the population totals for the auxiliary variables considered in frame A.

XB

(Optional) A numeric value or vector of length mBm_B, with mBm_B the number of auxiliary variables in frame B, indicating the population totals for the auxiliary variables considered in frame B.

X

(Optional) A numeric value or vector of length mTm_T, with mTm_T the number of auxiliary variables in both frames, indicating the population totals for the auxiliary variables considered in both frames.

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

Details

Function provides g-weights in following scenarios:

  • There is not any additional auxiliary variable

    • NA,NBN_A, N_B and NabN_{ab} unknown

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

  • At least, one additional auxiliary variable is available

    • NabN_{ab} known and NAN_A and NBN_B unknown

    • NAN_A and NBN_B known and NabN_{ab} unknown

    • NA,NBN_A, N_B and NabN_{ab} known

Value

A numeric vector containing the g-weights for the SF calibration estimator.

References

Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimationn in dual frame surveys. arXiv:1312.0761 [stat.ME]

Deville, J. C., S\"arndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382

Examples

data(DatA)
data(DatB)
data(PiklA)
data(PiklB)

#Let calculate g-weights for the SF calibration estimator for variable Clothing,
#without considering any auxiliary information
WeightsCalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain)

#Now, let calculate g-weights for the SF calibration estimator for variable Leisure
#when the frame sizes and the overlap domain size are known
WeightsCalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601)

#Finally, let calculate g-weights for the SF calibration estimator
#for variable Feeding, considering Income and Metres2 as auxiliary 
#variables and with frame sizes and overlap domain size known.
WeightsCalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, 
DatA$Domain, DatB$Domain, N_A = 1735, N_B =  1191, N_ab = 601, xsAFrameA = DatA$Inc, 
xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, 
XA = 4300260, XB = 176553)