Title: | Estimation in Dual Frame Surveys |
---|---|
Description: | Point and interval estimation in dual frame surveys. In contrast to classic sampling theory, where only one sampling frame is considered, dual frame methodology assumes that there are two frames available for sampling and that, overall, they cover the entire target population. Then, two probability samples (one from each frame) are drawn and information collected is suitably combined to get estimators of the parameter of interest. |
Authors: | Antonio Arcos <[email protected]>, Maria del Mar Rueda <[email protected]>, Maria Giovanna Ranalli <[email protected]> and David Molina <[email protected]> |
Maintainer: | David Molina <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.1 |
Built: | 2025-02-13 04:12:57 UTC |
Source: | https://github.com/cran/Frames2 |
Produces estimates for population total and mean using the Bankier-Kalton-Anderson estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
BKA(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, conf_level = NULL)
BKA(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
BKA estimator of population total is given by
where
and
being
and
the design weights, obtained as the inverse of the first order inclusion probabilities, that is,
and
.
To estimate variance of this estimator, one uses following approach proposed by Rao and Skinner (1996)
with and
,
being
and
the indicator variables for domain
and domain
, respectively.
If both first and second order probabilities are known, variances and covariances involved in calculation of
and
are estimated using functions
VarHT
and CovHT
, respectively. If
only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression
BKA
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Bankier, M. D. (1986) Estimators Based on Several Stratified Samples With Applications to Multiple Frame Surveys. Journal of the American Statistical Association, Vol. 81, 1074 - 1079.
Kalton, G. and Anderson, D. W. (1986) Sampling Rare Populations. Journal of the Royal Statistical Society, Ser. A, Vol. 149, 65 - 82.
Rao, J. N. K. and Skinner, C. J. (1996) Estimation in Dual Frame Surveys with Complex Designs. Proceedings of the Survey Method Section, Statistical Society of Canada, 63 - 68.
Skinner, C. J. and Rao, J. N. K. (1996) Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 433, 349 - 356.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate BKA estimator for population total for variable Leisure BKA(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate BKA estimator and a 90% confidence interval for population #total for variable Feeding considering only first order inclusion probabilities BKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate BKA estimator for population total for variable Leisure BKA(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate BKA estimator and a 90% confidence interval for population #total for variable Feeding considering only first order inclusion probabilities BKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.90)
Produces estimates for population totals and means using the DF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
CalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)
CalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of length |
domains_B |
A character vector of length |
N_A |
(Optional) A numeric value indicating the size of frame A. |
N_B |
(Optional) A numeric value indicating the size of frame B. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain. |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
DF calibration estimator of population total is given by
where ,
and
, with
calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if
and
are all known and no other auxiliary information is available, calibration constraints are
Optimal value for to minimice variance of the estimator is given by
. If both first and second order probabilities are known, variances are estimated using function
VarHT
.
If only first order probabilities are known, variances are estimated using Deville's method.
Function covers following scenarios:
There is not any additional auxiliary variable
and
unknown
and
known and
unknown
known and
and
unknown
and
known
At least, information about one additional auxiliary variable is available
and
known and
unknown
known and
and
unknown
and
known
To obtain an estimator of the variance for this estimator, one can use Deville's expression
where and
are the residuals of the regression with auxiliary variables as regressors.
CalDF
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]
Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate DF calibration estimator for variable Feeding, without #considering any auxiliary information CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate DF calibration estimator for variable Clothing when the frame #sizes and the overlap domain size are known CalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate DF calibration estimator and a 90% confidence interval #for population total for variable Feeding, considering Income as auxiliary variable in #frame A and Metres2 as auxiliary variable in frame B and with frame sizes and overlap #domain size known. CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate DF calibration estimator for variable Feeding, without #considering any auxiliary information CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate DF calibration estimator for variable Clothing when the frame #sizes and the overlap domain size are known CalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate DF calibration estimator and a 90% confidence interval #for population total for variable Feeding, considering Income as auxiliary variable in #frame A and Metres2 as auxiliary variable in frame B and with frame sizes and overlap #domain size known. CalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
Produces estimates for population totals and means using the SF calibration estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
CalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)
CalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A |
N_B |
(Optional) A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
SF calibration estimator of population total is given by
where
and
, with
calibration weights which are calculated
having into account a different set of constraints, depending on the case. For instance, if
and
are known and no other auxiliary information is available, calibration constraints are
Function covers following scenarios:
There is not any additional auxiliary variable
and
unknown
known and
and
unknown
and
known and
unknown
and
known
At least, information about one additional auxiliary variable is available
known and
and
unknown
and
known and
unknown
and
known
To obtain an estimator of the variance for this estimator, one can use Deville's expression
where and
are the residuals of the regression with auxiliary variables as regressors.
CalSF
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimation in dual frame surveys. arXiv:1312.0761 [stat.ME]
Deville, J. C., Sarndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate SF calibration estimator for variable Clothing, without #considering any auxiliary information CalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate SF calibration estimator for variable Leisure when the frame #sizes and the overlap domain size are known CalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate SF calibration estimator and a 90% confidence interval #for population total for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. CalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate SF calibration estimator for variable Clothing, without #considering any auxiliary information CalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate SF calibration estimator for variable Leisure when the frame #sizes and the overlap domain size are known CalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate SF calibration estimator and a 90% confidence interval #for population total for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. CalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
Returns all possible estimators that can be computed according to the information provided
Compare(ysA, ysB, pi_A, pi_B, domains_A, domains_B, pik_ab_B = NULL, pik_ba_A = NULL, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, met = "linear", conf_level = NULL)
Compare(ysA, ysB, pi_A, pi_B, domains_A, domains_B, pik_ab_B = NULL, pik_ba_A = NULL, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, met = "linear", conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of length |
domains_B |
A character vector of length |
pik_ab_B |
(Optional) A numeric vector of size |
pik_ba_A |
(Optional) A numeric vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A. |
N_B |
(Optional) A numeric value indicating the size of frame B. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain. |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
data(DatA) data(DatB) data(PiklA) data(PiklB) Compare(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)
data(DatA) data(DatB) data(PiklA) data(PiklB) Compare(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain)
Computes the covariance estimator between two Horvitz - Thompson estimators of population total from survey data obtained from a single stage sampling design
CovHT(y, x, pikl)
CovHT(y, x, pikl)
y |
A numeric vector of size n containing information about first variable of interest in the sample |
x |
A numeric vector of size n containing information about second variable of interest in the sample |
pikl |
A square numeric matrix of dimension n containing first and second order inclusion probabilities for units included in the sample |
Covariance estimator between two Horvitz - Thompson estimators of population total is given by
A numeric value representing covariance estimator between two Horvitz - Thompson estimators for population total for considered values
Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685 @references Sarndal, C. E., Swensson, B. and Wretman, J. (1992) Model Assisted Survey Sampling. Springer-Verlag. New York.
########## Example 1 ########## Indicators <- c(1, 2, 3, 4, 5) X <- c(13, 18, 20, 14, 9) Y <- c(2, 0.5, 1.2, 3.3, 2) #Let draw two simple random samples without replacement of size 2 s <- sample(Indicators, 2) sX <- X[s] sY <- Y[s] #Now, let calculate the associated probability matrix with first and #second order inclusion probabilities Ps <- matrix(c(0.4,0.2, 0.2,0.4), 2, 2) CovHT(sX, sY, Ps) ########## Example 2 ########## data(DatA) attach(DatA) data(PiklA) #Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A. HT(Clo, ProbA) #Let calculate Horvitz - Thompson estimator for total of variable Feeding in Frame A. HT(Feed, ProbA) #And now, let compute the covariance between the previous estimators CovHT(Clo, Feed, PiklA)
########## Example 1 ########## Indicators <- c(1, 2, 3, 4, 5) X <- c(13, 18, 20, 14, 9) Y <- c(2, 0.5, 1.2, 3.3, 2) #Let draw two simple random samples without replacement of size 2 s <- sample(Indicators, 2) sX <- X[s] sY <- Y[s] #Now, let calculate the associated probability matrix with first and #second order inclusion probabilities Ps <- matrix(c(0.4,0.2, 0.2,0.4), 2, 2) CovHT(sX, sY, Ps) ########## Example 2 ########## data(DatA) attach(DatA) data(PiklA) #Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A. HT(Clo, ProbA) #Let calculate Horvitz - Thompson estimator for total of variable Feeding in Frame A. HT(Feed, ProbA) #And now, let compute the covariance between the previous estimators CovHT(Clo, Feed, PiklA)
This dataset contains some variables coming from a real dual frame survey conducted in 2013 in Andalusia (Spain) by a scientific institute specialized in social topics.
With this dataset it is intented to show how to properly split a joint dual frame sample into subsamples, so functions of Frame2
can be used.
Dat
Dat
Indicates whether individual was selected in the landline sample(1) or in the cell phone sample(2).
Indicates the stratum each individual belongs to. For individuals selected in cell phone sample, value of this variable is NA
.
Response of the individual to the question: Do you think that immigrants currently living in Andalusia are quite a lot? 1 represents "yes" and 0 represents "no".
Indicates whether individual has a landline (1) or not (0).
Indicates whether individual has a cell phone(1) or not(0).
First order inclusion probability of reaching the individual by landline.
First order inclusion probability of reaching the individual by cell phone.
Monthly income (in euros) of the individual.
The survey was based on two frames: a landline frame and a cell phone frame. Landline frame was stratified by province and simple random sampling without replacement was considered
in cell phone frame. The size of the whole sample was . Total of the variable Income in the whole population is
.
data(Dat) attach(Dat) #We are going to split dataset Dat into two new datasets, each #one corresponding to a frame: frame containing individuals #using landline and frame containing individuals using cell phone. FrameLandline <- Dat[Landline == 1,] FrameCell <- Dat[Cell == 1,] #Equally, we can split the original dataset in three new different #datasets, each one corresponding to one domain: first domain containing #individuals using only landline, second domain containing individuals #using only cell phone and the third domain containing individuals #using both landline and cell phone. DomainLandline <- Dat[Landline == 1 & Cell == 0,] DomainCell <- Dat[Landline == 0 & Cell == 1,] DomainBoth <- Dat[Landline == 1 & Cell == 1,] #From the domain datasets, we can build frame datasets FrameLandline <- rbind(DomainLandline, DomainBoth) FrameCell <- rbind(DomainCell, DomainBoth)
data(Dat) attach(Dat) #We are going to split dataset Dat into two new datasets, each #one corresponding to a frame: frame containing individuals #using landline and frame containing individuals using cell phone. FrameLandline <- Dat[Landline == 1,] FrameCell <- Dat[Cell == 1,] #Equally, we can split the original dataset in three new different #datasets, each one corresponding to one domain: first domain containing #individuals using only landline, second domain containing individuals #using only cell phone and the third domain containing individuals #using both landline and cell phone. DomainLandline <- Dat[Landline == 1 & Cell == 0,] DomainCell <- Dat[Landline == 0 & Cell == 1,] DomainBoth <- Dat[Landline == 1 & Cell == 1,] #From the domain datasets, we can build frame datasets FrameLandline <- rbind(DomainLandline, DomainBoth) FrameCell <- rbind(DomainCell, DomainBoth)
This dataset contains some variables regarding household expenses for a sample of 105 households selected from a list of landline phones (let say, frame A) in a particular city in a specific month.
DatA
DatA
A string indicating the domain each household belongs to. Possible values are "a" if household belongs to domain a or "ab" if household belongs to overlap domain.
Feeding expenses (in euros) at the househould
Clothing expenses (in euros) at the household
Leisure expenses (in euros) at the household
Household income (in euros). Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.
Household municipal taxes (in euros) paid. Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.
Square meters of the house. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.
Household size. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.
First order inclusion probability in frame A. This probability is 0 for households included in domain b.
First order inclusion probability in frame B. This probability is 0 for households included in domain a.
A numeric value indicating the stratum each household belongs to.
The sample, of size , has been drawn from a population of
households with landline phone according to a stratified random sampling. Population units were divided in 6 different strata.
Population sizes of these strata are
.
of the households composing the population have, also, mobile phone. On the other hand, frame totals for auxiliary variables in this frame are
and
.
data(DatA) attach(DatA) #Let perform a brief descriptive analysis for the three main variables param <- data.frame(Feed, Clo, Lei) summary (param) hist (Feed) hist (Clo) hist (Lei)
data(DatA) attach(DatA) #Let perform a brief descriptive analysis for the three main variables param <- data.frame(Feed, Clo, Lei) summary (param) hist (Feed) hist (Clo) hist (Lei)
This dataset contains some variables regarding household expenses for a sample of 135 households selected from a list of mobile phones (let say, frame B) in a particular city in a specific month.
DatB
DatB
A string indicating the domain each household belongs to. Possible values are "b" if household belongs to domain b or "ba" if household belongs to overlap domain.
Feeding expenses (in euros) at the househould
Clothing expenses (in euros) at the household
Leisure expenses (in euros) at the household
Household income (in euros). Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.
Household municipal taxes (in euros) paid. Values for this variable are only available for households included in frame A. For households included in domain b, value of this variable is set to 0.
Square meters of the house. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.
Household size. Values for this variable are only available for households included in frame B. For households included in domain a, value of this variable is set to 0.
First order inclusion probability in frame A. This probability is 0 for households included in domain b.
First order inclusion probability in frame B. This probability is 0 for households included in domain a.
The sample, of size , has been drawn from a population of
households with mobile phone according to a simple random sampling without replacement design.
of these households have, also, landline phone. On the other hand, frame totals for auxiliary variables in this frame are
and
data(DatB) attach(DatB) #Let perform a brief descriptive analysis for the three main variables param <- data.frame(Feed, Clo, Lei) summary (param) hist (Feed) hist (Clo) hist (Lei)
data(DatB) attach(DatB) #Let perform a brief descriptive analysis for the three main variables param <- data.frame(Feed, Clo, Lei) summary (param) hist (Feed) hist (Clo) hist (Lei)
This dataset contains some variables regarding the program choice for a sample of 180 students included in the sampling frame A.
DatMA
DatMA
An integer from 1 to , with
the number of students in the whole population, identifying the student within the population.
An integer from 1 to , with
the number of students in the frame, identifying the student within the frame.
A factor with three categories (academic, general and vocation) indicating the program choice of the student.
An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.
A number indicating the mark of the student in a reading test.
A number indicating the mark of the student in a writing test.
A number indicating the size of the school the students belongs to.
A string indicating the domain each student belongs to. Possible values are "a" if student belongs to domain a or "ab" if student belongs to overlap domain.
First order inclusion probability in frame A.
First order inclusion probability in frame B. This probability is 0 for students included in domain a.
The sample, of size , has been drawn from a population of
students according to a proportional-to-size sampling desing according to the size of the school. So, students
attending bigger schools have a higher probability of being selected in the sample.
of the students composing the population belongs also to frame B.
data(DatMA) attach(DatMA) #Let perform a brief descriptive analysis for the main variable summary (Prog) #And let do the same for the numerical auxiliary variables Read and Write summary(Read) summary(Write)
data(DatMA) attach(DatMA) #Let perform a brief descriptive analysis for the main variable summary (Prog) #And let do the same for the numerical auxiliary variables Read and Write summary(Read) summary(Write)
This dataset contains some variables regarding the program choice for a sample of 232 students included in the sampling frame B.
DatMB
DatMB
An integer from 1 to , with
the number of students in the whole population, identifying the student within the population.
An integer from 1 to , with
the number of students in the frame, identifying the student within the frame.
A factor with three categories (academic, general and vocation) indicating the program choice of the student.
An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.
A number indicating the mark of the student in a reading test.
A number indicating the mark of the student in a writing test.
A number indicating the size of the school the students belongs to.
A string indicating the domain each student belongs to. Possible values are "b" if student belongs to domain b or "ba" if student belongs to overlap domain.
First order inclusion probability in frame A. This probability is 0 for students included in domain b.
First order inclusion probability in frame B.
The sample, of size , has been drawn from a population of
students according to a simple random sampling design.
of the students composing the population belongs also to frame A.
data(DatMB) attach(DatMB) #Let perform a brief descriptive analysis for the main variable summary (Prog) #And let do the same for the numerical auxiliary variables Read and Write summary(Read) summary(Write)
data(DatMB) attach(DatMB) #Let perform a brief descriptive analysis for the main variable summary (Prog) #And let do the same for the numerical auxiliary variables Read and Write summary(Read) summary(Write)
This dataset contains population information about the auxiliary variables of the population of students
DatPopM
DatPopM
An ordinal factor with three categories (low, middle and high) indicating the socio-economical status of the student.
A number indicating the mark of the student in a reading test.
A number indicating the mark of the student in a writing test.
A string indicating the domain each student belongs to. Possible values are "a" if student belongs to domain a, "b" if student belongs to domain b or "ab" if student belongs to overlap domain.
The population size is .
data(DatPopM) attach(DatPopM) #Let perform a brief descriptive analysis for the three auxiliary variables summary (Ses) summary(Read) summary(Write)
data(DatPopM) attach(DatPopM) #Let perform a brief descriptive analysis for the three auxiliary variables summary (Ses) summary(Read) summary(Write)
Given a main vector, an auxiliary one and a value of the latter, identifies positions of the auxiliary vector corresponding to values other than the given one. Then, turns zero values of the main vector corresponding to these positions.
Domains (y, domains, value)
Domains (y, domains, value)
y |
A numeric main vector of size n |
domains |
A numeric/character/logic auxiliary vector of size n |
value |
A value of the auxiliary vector |
A numeric vector, copy of y
, with some values turned zero depending on values of domains
and value
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #Let build an auxiliary vector indicating whether values in U are above or below the mean. aux <- c("Below", "Above", "Above", "Below", "Below") #Now, only values below the mean remain, the other ones are turned zero. Domains (U, aux, "Below") ########## Example 2 ########## data(DatA) attach(DatA) #Let calculate total feeding expenses corresponding to households in domain a. sum (Domains (Feed, Domain, "a"))
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #Let build an auxiliary vector indicating whether values in U are above or below the mean. aux <- c("Below", "Above", "Above", "Below", "Below") #Now, only values below the mean remain, the other ones are turned zero. Domains (U, aux, "Below") ########## Example 2 ########## data(DatA) attach(DatA) #Let calculate total feeding expenses corresponding to households in domain a. sum (Domains (Feed, Domain, "a"))
Produces estimates for population totals and means using the Fuller - Burmeister estimator from survey data obtained from a dual frame sampling desing. Confidence intervals are also computed, if required.
FB(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)
FB(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals. |
Fuller-Burmeister estimator of population total is given by
where optimal values for to minimize variance of the estimator are:
Due to Fuller-Burmeister estimator is not defined for estimating population sizes, estimation of the mean is computed as , where
is the estimation of the population size using Hartley estimator.
Estimated variance for the Fuller-Burmeister estimator can be obtained through expression
If both first and second order probabilities are known, variances and covariances involved in calculation of and
are estimated using functions
VarHT
and CovHT
, respectively. If
only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression
FB
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Fuller, W.A. and Burmeister, L.F. (1972). Estimation for Samples Selected From Two Overlapping Frames ASA Proceedings of the Social Statistics Sections, 245 - 249.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Fuller-Burmeister estimator for variable Clothing FB(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate Fuller-Burmeister estimator and a 90% confidence interval #for variable Leisure, considering only first order inclusion probabilities FB(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Fuller-Burmeister estimator for variable Clothing FB(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate Fuller-Burmeister estimator and a 90% confidence interval #for variable Leisure, considering only first order inclusion probabilities FB(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.90)
Produces estimates for population totals and means using Hartley estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
Hartley(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)
Hartley(ysA, ysB, pi_A, pi_B, domains_A, domains_B, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals. |
Hartley estimator of population total is given by
where . Optimum value for
to minimize variance of the estimator is
Taking into account the independence between and
, an estimator for the variance of the Hartley estimator can be obtained as follows:
If both first and second order probabilities are known, variances and covariances involved in calculation of and
are estimated using functions
VarHT
and CovHT
, respectively. If
only first order probabilities are known, variances are estimated using Deville's method and covariances are estimated using following expression
Hartley
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Hartley, H. O. (1962) Multiple Frames Surveys. Proceedings of the American Statistical Association, Social Statistics Sections, 203 - 206.
Hartley, H. O. (1974) Multiple frame methodology and selected applications. Sankhya C, Vol. 36, 99 - 118.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Hartley estimator for variable Feeding Hartley(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate Hartley estimator and a 90% confidence interval #for variable Leisure, considering only first order inclusion probabilities Hartley(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Hartley estimator for variable Feeding Hartley(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate Hartley estimator and a 90% confidence interval #for variable Leisure, considering only first order inclusion probabilities Hartley(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.90)
Computes the Horvitz - Thompson estimator
HT(y, pik)
HT(y, pik)
y |
A numeric vector of size n containing information about variable of interest |
pik |
A numeric vector of size n containing first order inclusion probabilities for units included in |
Horvitz - Thompson estimator of population total is given by
A numeric value representing Horvitz - Thompson estimator for population total for considered values
Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #A simple random sample of size 2 without replacement is drawn from population s <- sample(U, 2) ps <- c(0.4, 0.4) HT(s, ps) ########## Example 2 ########## data(DatA) attach(DatA) #Let estimate population total for variable Feeding in frame A HT(Feed, ProbA)
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #A simple random sample of size 2 without replacement is drawn from population s <- sample(U, 2) ps <- c(0.4, 0.4) HT(s, ps) ########## Example 2 ########## data(DatA) attach(DatA) #Let estimate population total for variable Feeding in frame A HT(Feed, ProbA)
Calculates confidence intervals for Bankier-Kalton-Anderson estimator using jackknife procedure
JackBKA(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackBKA(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #supposing a stratified sampling in frame A and a simple random sampling without #replacement in frame B with no finite population correction factor in any frame. JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #supposing a stratified sampling in frame A and a simple random sampling without #replacement in frame B with no finite population correction factor in any frame. JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackBKA(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Calculates confidence intervals for dual frame calibration estimator using jackknife procedure
JackCalDF(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackCalDF(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A |
N_B |
(Optional) A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #with frame sizes and overlap domain size known, supposing a stratified #sampling in frame A and a simple random sampling without replacement #in frame B with no finite population correction factor in any frame. JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum) #Finally, let consider a finite population correction factor in both frames. JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #with frame sizes and overlap domain size known, supposing a stratified #sampling in frame A and a simple random sampling without replacement #in frame B with no finite population correction factor in any frame. JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum) #Finally, let consider a finite population correction factor in both frames. JackCalDF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Produces estimates for variance of SF calibration estimator using Jackknife procedure
JackCalSF(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackCalSF(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A |
N_B |
(Optional) A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #with frame sizes and overlap domain size known, supposing a stratified #sampling in frame A and a simple random sampling without replacement #in frame B with no finite population correction factor in any frame JackCalSF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #with frame sizes and overlap domain size known, supposing a stratified #sampling in frame A and a simple random sampling without replacement #in frame B with no finite population correction factor in any frame JackCalSF(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, conf_level = 0.95, sdA = "str", sdB = "srs", strA = DatA$Stratum)
Calculates confidence intervals for Fuller-Burmeister estimator using jackknife procedure
JackFB(ysA, ysB, piA, piB, domainsA, domains_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackFB(ysA, ysB, piA, piB, domainsA, domains_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
domainsA |
A character vector of size |
domains_B |
A character vector of size |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction factor #in any frame. JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Clothing, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction factor #in any frame. JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackFB(DatA$Clo, DatB$Clo, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Calculates confidence intervals for Hartley estimator using jackknife procedure
JackHartley(ysA, ysB, piA, piB, domainsA, domainsB, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackHartley(ysA, ysB, piA, piB, domainsA, domainsB, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackHartley(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Calculates confidence intervals for MLCDF estimator using jackknife procedure
JackMLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
xA |
A numeric vector or length |
xB |
A numeric vector or length |
ind_samA |
A numeric vector of length |
ind_samB |
A numeric vector of length |
ind_domA |
A character vector of length |
ind_domB |
A character vector of length |
N |
A numeric value indicating the size of the population. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95, sdA = "pps", sdB = "srs")
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95, sdA = "pps", sdB = "srs")
Calculates confidence intervals for MLCDW estimator using jackknife procedure
JackMLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of an estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")
Calculates confidence intervals for MLCSW estimator using jackknife procedure
JackMLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of an estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, conf_level = 0.95, sdA = "pps", sdB = "srs")
Calculates confidence intervals for MLDF estimator using jackknife procedure
JackMLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
xA |
A numeric vector or length |
xB |
A numeric vector or length |
ind_samA |
A numeric vector of length |
ind_samB |
A numeric vector of length |
ind_domA |
A character vector of length |
ind_domB |
A character vector of length |
N |
A numeric value indicating the size of the population. |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, 0.95, "pps", "srs")
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, 0.95, "pps", "srs")
Calculates confidence intervals for MLDW estimator using jackknife procedure
JackMLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95, "pps", "srs")
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95, "pps", "srs")
Calculates confidence intervals for MLSW estimator using jackknife procedure
JackMLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackMLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95, "pps", "srs")
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let obtain a 95% jackknife confidence interval for variable Feeding, #supposing a pps sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackMLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95, "pps", "srs")
Calculates confidence intervals for pseudo empirical likelihood estimator using jackknife procedure
JackPEL(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL,clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackPEL(ysA, ysB, piA, piB, domainsA, domainsB, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL,clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A |
N_B |
(Optional) A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
Calculates confidence intervals for pseudo maximum likelihood estimator using jackknife procedure
JackPML(ysA, ysB, piA, piB, domainsA, domainsB, N_A, N_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackPML(ysA, ysB, piA, piB, domainsA, domainsB, N_A, N_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Leisure, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Leisure, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackPML(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Calculates confidence intervals for raking ratio estimator using jackknife procedure
JackSFRR(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, N_A, N_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
JackSFRR(ysA, ysB, piA, piB, pik_ab_B, pik_ba_A, domainsA, domainsB, N_A, N_B, conf_level, sdA = "srs", sdB = "srs", strA = NULL, strB = NULL, clusA = NULL, clusB = NULL, fcpA = FALSE, fcpB = FALSE)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
piA |
A numeric vector of length |
piB |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domainsA |
A character vector of size |
domainsB |
A character vector of size |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
conf_level |
A numeric value indicating the confidence level for the confidence intervals. |
sdA |
(Optional) A character vector indicating the sampling design considered in frame A. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
sdB |
(Optional) A character vector indicating the sampling design considered in frame B. Possible values are "srs" (simple random sampling without replacement), "pps" (probabilities proportional to size sampling), "str" (stratified sampling), "clu" (cluster sampling) and "strclu" (stratified cluster sampling). Default is "srs". |
strA |
(Optional) A numeric vector indicating the stratum each unit in frame A belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame A. |
strB |
(Optional) A numeric vector indicating the stratum each unit in frame B belongs to, if a stratified sampling or a stratified cluster sampling has been considered in frame B. |
clusA |
(Optional) A numeric vector indicating the cluster each unit in frame A belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame A. |
clusB |
(Optional) A numeric vector indicating the cluster each unit in frame B belongs to, if a cluster sampling or a stratified cluster sampling has been considered in frame B. |
fcpA |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame A. Default is FALSE. |
fcpB |
(Optional) A logic value indicating if a finite population correction factor should be considered in frame B. Default is FALSE. |
Let suppose a non stratified sampling design in frame A and a stratified sampling design in frame B where frame has been divided into L strata and a sample of size from the
composing the l-th stratum is selected
In this context, jackknife variance estimator of a estimator
is given by
with the value of estimator
after dropping
unit from
ysA
and the mean of values
.
Similarly,
is the value taken by
after dropping j-th unit of l-th from sample
ysB
and is the mean of values
.
If needed, a finite population correction factor can be included in frames by replacing
or
with
or
, where
and
A confidence interval for any parameter of interest,
can be calculated, then, using the pivotal method.
A numeric matrix containing estimations of population total and population mean and their corresponding confidence intervals obtained through jackknife method.
Wolter, K. M. (2007) Introduction to Variance Estimation. 2nd Edition. Springer, Inc., New York.
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Leisure, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
data(DatA) data(DatB) #Let obtain a 95% jackknife confidence interval for variable Leisure, #supposing a stratified sampling in frame A and a simple random sampling #without replacement in frame B with no finite population correction #factor in any frame. JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum) #Let check how interval estimation varies when a finite #population correction factor is considered in both frames. JackSFRR(DatA$Lei, DatB$Lei, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.95, "str", "srs", strA = DatA$Stratum, fcpA = TRUE, fcpB = TRUE)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated dual frame approach with a possibly different set of auxiliary variables for each frame. Confidence intervals are also computed, if required.
MLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level = NULL)
MLCDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, N_ab = NULL, met = "linear", conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
xA |
A numeric vector or length |
xB |
A numeric vector or length |
ind_samA |
A numeric vector of length |
ind_samB |
A numeric vector of length |
ind_domA |
A character vector of length |
ind_domB |
A character vector of length |
N |
A numeric value indicating the size of the population. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic calibration estimator in dual frame using auxiliary information from each frame for a proportion is given by
with the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
and
calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if
and
are known, calibration constraints are
and
with and
being the maximum likelihood parameters of the multinomial logistic model considering original design weights
.
can be defined similarly.
MLCDF
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let calculate proportions of categories of variable Prog using MLCDF estimator #using Read as auxiliary variable MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N) #Let obtain 95% confidence intervals together with the estimations MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let calculate proportions of categories of variable Prog using MLCDF estimator #using Read as auxiliary variable MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N) #Let obtain 95% confidence intervals together with the estimations MLCDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated dual frame approach with auxiliary information from the whole population. Confidence intervals are also computed, if required.
MLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)
MLCDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic calibration estimator in dual frame using auxiliary information from the whole population for a proportion is given by
with the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
and
calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if
and
are known, calibration constraints are
and
with and
being the maximum likelihood parameters of the multinomial logistic model considering weights
.
MLCDW
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",]) #Let calculate proportions of categories of variable Prog using MLCDW estimator #using Read as auxiliary variable MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB) #Now, let suppose that the overlap domian size is known MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab) #Let obtain 95% confidence intervals together with the estimations MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab, conf_level = 0.95)
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",]) #Let calculate proportions of categories of variable Prog using MLCDW estimator #using Read as auxiliary variable MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB) #Now, let suppose that the overlap domian size is known MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab) #Let obtain 95% confidence intervals together with the estimations MLCDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab, conf_level = 0.95)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated single frame approach with auxiliary information from the whole population. Confidence intervals are also computed, if required.
MLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)
MLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic calibration estimator in single frame using auxiliary information from the whole population for a proportion is given by
with the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
and
calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if
and
are known, calibration constraints are
and
with
being the maximum likelihood parameters of the multinomial logistic model considering weights
.
MLCSW
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",]) #Let calculate proportions of categories of variable Prog using MLCSW estimator #using Read as auxiliary variable MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB) #Now, let suppose that the overlap domian size is known MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab) #Let obtain 95% confidence intervals together with the estimations MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab, conf_level = 0.95)
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",]) N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",]) N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",]) #Let calculate proportions of categories of variable Prog using MLCSW estimator #using Read as auxiliary variable MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB) #Now, let suppose that the overlap domian size is known MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab) #Let obtain 95% confidence intervals together with the estimations MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, N_FrameB, N_Domainab, conf_level = 0.95)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model assisted approach with a possibly different set of auxiliary variables for each frame. Confidence intervals are also computed, if required.
MLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, conf_level = NULL)
MLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, ind_samB, ind_domA, ind_domB, N, conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
xA |
A numeric vector or length |
xB |
A numeric vector or length |
ind_samA |
A numeric vector of length |
ind_samB |
A numeric vector of length |
ind_domA |
A character vector of length |
ind_domB |
A character vector of length |
N |
A numeric value indicating the size of the population. |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic estimator in dual frame using auxiliary information from each frame for a proportion is given by
with ,
the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
and
the design weights for each frame, defined as the inverse of the first order inclusion probabilities and
being the maximum likelihood parameters of the multinomial logistic model considering weights
.
can be defined similarly.
MLDF
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let calculate proportions of categories of variable Prog using MLDF estimator #using Read as auxiliary variable MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N) #Let obtain 95% confidence intervals together with the estimations MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)
data(DatMA) data(DatMB) data(DatPopM) N <- nrow(DatPopM) levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba") DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE) DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba" #Let calculate proportions of categories of variable Prog using MLDF estimator #using Read as auxiliary variable MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N) #Let obtain 95% confidence intervals together with the estimations MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a dual frame model assisted approach. Confidence intervals are also computed, if required.
MLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level = NULL)
MLDW (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic estimator in dual frame using auxiliary information from the whole population for a proportion is given by
with the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
with
and
being the maximum likelihood parameters of the multinomial logistic model considering the weights
.
MLDW
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let calculate proportions of categories of variable Prog using MLDW estimator #using Read as auxiliary variable MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample) #Let obtain 95% confidence intervals together with the estimations MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95)
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let calculate proportions of categories of variable Prog using MLDW estimator #using Read as auxiliary variable MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample) #Let obtain 95% confidence intervals together with the estimations MLDW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, 0.95)
Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design with the same set of auxiliary variables for the whole population. Confidence intervals are also computed, if required.
MLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level = NULL)
MLSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB, x, ind_sam, conf_level = NULL)
ysA |
A data frame containing information about one or more factors, each one of dimension |
ysB |
A data frame containing information about one or more factors, each one of dimension |
pik_A |
A numeric vector of length |
pik_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
xsA |
A numeric vector of length |
xsB |
A numeric vector of length |
x |
A numeric vector or length |
ind_sam |
A numeric vector of length |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Multinomial logistic estimator in single frame using auxiliary information from the whole population for a proportion is given by
with the number of categories of the response variable,
the indicator variable for the i-th category of the response variable,
and
being the maximum likelihood parameters of the multinomial logistic model considering weights
.
PMLSW
returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
class frequencies and proportions estimations for main variable(s). |
Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.
Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let calculate proportions of categories of variable Prog using MLSW estimator #using Read as auxiliary variable MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample) #Let obtain 95% confidence intervals together with the estimations MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, conf_level = 0.95)
data(DatMA) data(DatMB) data(DatPopM) IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop) #Let calculate proportions of categories of variable Prog using MLSW estimator #using Read as auxiliary variable MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample) #Let obtain 95% confidence intervals together with the estimations MLSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA, DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, conf_level = 0.95)
Produces estimates for population totals using the pseudo empirical likelihood estimator from survey data obtained from a dual frame sampling design. Confidence intervals for the population total are also computed, if required.
PEL(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, conf_level = NULL)
PEL(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, XA = NULL, XB = NULL, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A. |
N_B |
(Optional) A numeric value indicating the size of frame B. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain. |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Pseudo empirical likelihood estimator for the population mean is computed as
where
and
with
and
the weights resulting of applying the pseudo empirical likelihood procedure to a determined function under a determined set of constraints, depending on the case.
Furthermore,
. In this case,
and
have been supposed known and no additional auxiliary variables have been considered. This is not happening in some cases.
Function covers following scenarios:
There is not any additional auxiliary variable
and
unknown
and
known and
unknown
and
known
At least, one additional auxiliary variable is available
and
known and
unknown
and
known
Explicit variance of this estimator is not easy to obtain. Instead, confidence intervals can be computed through the bi-section method. This method constructs intervals in the form ,
where
is the
quantile from a
distribution with one degree of freedom and
represents the so called pseudo empirical log likelihood ratio statistic,
which can be obtained as a difference of two pseudo empirical likelihood functions.
PEL
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Rao, J. N. K. and Wu, C. (2010) Pseudo Empirical Likelihood Inference for Multiple Frame Surveys. Journal of the American Statistical Association, 105, 1494 - 1503.
Wu, C. (2005) Algorithms and R codes for the pseudo empirical likelihood methods in survey sampling. Survey Methodology, Vol. 31, 2, pp. 239 - 243.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate pseudo empirical likelihood estimator for variable Feeding, without #considering any auxiliary information PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate pseudo empirical estimator for variable Clothing when the frame #sizes and the overlap domain size are known PEL(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate pseudo empirical likelihood estimator and a 90% confidence interval #for population total for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate pseudo empirical likelihood estimator for variable Feeding, without #considering any auxiliary information PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate pseudo empirical estimator for variable Clothing when the frame #sizes and the overlap domain size are known PEL(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate pseudo empirical likelihood estimator and a 90% confidence interval #for population total for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. PEL(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553, conf_level = 0.90)
This dataset consists of a square matrix of dimension 105 with the first and second order inclusion probabilities
for the units included in sample , which has been drawn from a population of size
according to a
stratified random sampling with population strata sizes
PiklA
PiklA
data(PiklA) #Let choose the submatrix of inclusion probabilities for the first 5 units sA. PiklA[1:5, 1:5] #Now, let select only the first order inclusion probabilities diag(PiklA)
data(PiklA) #Let choose the submatrix of inclusion probabilities for the first 5 units sA. PiklA[1:5, 1:5] #Now, let select only the first order inclusion probabilities diag(PiklA)
This dataset consists of a square matrix of dimension 135 with the first and second order inclusion probabilities
for the units included in , which has been drawn from a population of size
according to a
simple random sampling without replacement.
PiklB
PiklB
data(PiklB) #Let choose the submatrix of inclusion probabilities for the first 5 units in sB. PiklB[1:5, 1:5] #Now, let select the first order inclusion probabilities diag(PiklB)
data(PiklB) #Let choose the submatrix of inclusion probabilities for the first 5 units in sB. PiklB[1:5, 1:5] #Now, let select the first order inclusion probabilities diag(PiklB)
Produces estimates for population totals and means using PML estimator from survey data obtained from a dual frame sampling design. Confidence intervals are also computed, if required.
PML(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A, N_B, conf_level = NULL)
PML(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A, N_B, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Pseudo Maximum Likelihood estimator of population total is given by
where and
is the smaller of the roots of the quadratic equation
Optimal value for is
.
Variance is estimated according to following expression
where, if
and
if
with
Similarly, we define if
and
if
PML
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Skinner, C. J. and Rao, J. N. K. (1996) Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 433, 349 - 356.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Pseudo Maximum Likelihood estimator for population total for variable Clothing PML(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191) #Now, let calculate Pseudo Maximum Likelihood estimator for population total for variable #Feeding, using first order inclusion probabilities PML(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191) #Finally, let calculate Pseudo Maximum Likelihood estimator and a 90% confidence interval for #population total for variable Leisure PML(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate Pseudo Maximum Likelihood estimator for population total for variable Clothing PML(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191) #Now, let calculate Pseudo Maximum Likelihood estimator for population total for variable #Feeding, using first order inclusion probabilities PML(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191) #Finally, let calculate Pseudo Maximum Likelihood estimator and a 90% confidence interval for #population total for variable Leisure PML(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, 0.90)
Produces estimates for population total and mean using the raking ratio estimator from survey data obtained from a dual frame sampling desing. Confidence intervals are also computed, if required.
SFRR(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A, N_B, conf_level = NULL)
SFRR(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A, N_B, conf_level = NULL)
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
N_A |
A numeric value indicating the size of frame A |
N_B |
A numeric value indicating the size of frame B |
conf_level |
(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired. |
Raking ratio estimator of population total is given by
where and
is the smallest root of the quadratic equation
,
with
and
. Weights
and
are obtained as follows
and
being
and
the design weights, obtained as the inverse of the first order inclusion probabilities, that is
and
.
To obtain an estimator of the variance for this estimator, one has taken into account that raking ratio estimator coincides with SF calibration estimator when frame sizes are known and "raking" method is used. So, one can use here Deville's expression to calculate an estimator for the variance of the raking ratio estimator
where and
are the residuals of the regression with auxiliary variables as regressors.
SFRR
returns an object of class "EstimatorDF" which is a list with, at least, the following components:
Call |
the matched call. |
Est |
total and mean estimation for main variable(s). |
VarEst |
variance estimation for main variable(s). |
If parameter conf_level
is different from NULL
, object includes component
ConfInt |
total and mean estimation and confidence intervals for main variables(s). |
In addition, components TotDomEst
and MeanDomEst
are available when estimator is based on estimators of the domains. Component Param
shows value of parameters involded in calculation of the estimator (if any).
By default, only Est
component (or ConfInt
component, if parameter conf_level
is different from NULL
) is shown. It is possible to access to all the components of the objects by using function summary
.
Lohr, S. and Rao, J.N.K. (2000). Inference in Dual Frame Surveys. Journal of the American Statistical Association, Vol. 95, 271 - 280.
Rao, J.N.K. and Skinner, C.J. (1996). Estimation in Dual Frame Surveys with Complex Designs. Proceedings of the Survey Method Section, Statistical Society of Canada, 63 - 68.
Skinner, C.J. and Rao J.N.K. (1996). Estimation in Dual Frame Surveys with Complex Designs. Journal of the American Statistical Association, Vol. 91, 443, 349 - 356.
Skinner, C.J. (1991). On the Efficiency of Raking Ratio Estimation for Multiple Frame Surveys. Journal of the American Statistical Association, Vol. 86, 779 - 784.
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate raking ratio estimator for population total for variable Clothing SFRR(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191) #Now, let calculate raking ratio estimator and a 90% confidence interval for #population total for variable Feeding, considering only first order inclusion probabilities SFRR(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.90)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate raking ratio estimator for population total for variable Clothing SFRR(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191) #Now, let calculate raking ratio estimator and a 90% confidence interval for #population total for variable Feeding, considering only first order inclusion probabilities SFRR(DatA$Feed, DatB$Feed, DatA$ProbA, DatB$ProbB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, 1735, 1191, 0.90)
Computes the variance estimator of Horvitz - Thompson estimator of population total
VarHT(y, pikl)
VarHT(y, pikl)
y |
A numeric vector of size n containing information about variable of interest |
pikl |
A square numeric matrix of dimension n containing first and second order inclusion probabilities for units included in |
Variance estimator of Horvitz - Thompson estimator of population total is given by
A numeric value representing variance estimator of Horvitz - Thompson estimator for population total for considered values
Horvitz, D. G. and Thompson, D. J. (1952) A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663 - 685
Sarndal, C. E., Swensson, B. and Wretman, J. (1992) Model Assisted Survey Sampling. Springer-Verlag. New York.
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #A simple random sample of size 2 without replacement is drawn from population s <- sample(U, 2) #Horvitz - Thompson estimator of population total is calculated. ps <- c(0.4, 0.4) HT(s, ps) #Now, we calculate variance estimator of the Horvitz - Thompson estimator. Ps <- matrix(c(0.4,0.1, 0.1,0.4), 2 ,2) VarHT(s, Ps) ########## Example 2 ########## data(DatA) attach(DatA) data(PiklA) #Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A. HT(Clo, ProbA) #And now, let compute the variance of the previous estimator VarHT(Clo, PiklA)
########## Example 1 ########## U <- c(13, 18, 20, 14, 9) #A simple random sample of size 2 without replacement is drawn from population s <- sample(U, 2) #Horvitz - Thompson estimator of population total is calculated. ps <- c(0.4, 0.4) HT(s, ps) #Now, we calculate variance estimator of the Horvitz - Thompson estimator. Ps <- matrix(c(0.4,0.1, 0.1,0.4), 2 ,2) VarHT(s, Ps) ########## Example 2 ########## data(DatA) attach(DatA) data(PiklA) #Let calculate Horvitz - Thompson estimator for total of variable Clothing in Frame A. HT(Clo, ProbA) #And now, let compute the variance of the previous estimator VarHT(Clo, PiklA)
Computes the g-weights for the dual frame calibration estimator.
WeightsCalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear")
WeightsCalDF(ysA, ysB, pi_A, pi_B, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear")
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
domains_A |
A character vector of length |
domains_B |
A character vector of length |
N_A |
(Optional) A numeric value indicating the size of frame A. |
N_B |
(Optional) A numeric value indicating the size of frame B. |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain. |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
Function provides g-weights in following scenarios:
There is not any additional auxiliary variable
and
unknown
known and
and
unknown
and
known and
unknown
and
known
At least, one additional auxiliary variable is available
known and
and
unknown
and
known and
unknown
and
known
A numeric vector containing the g-weights for the dual frame calibration estimator.
Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimationn in dual frame surveys. arXiv:1312.0761 [stat.ME]
Deville, J. C., S\"arndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate g-weights for the dual frame calibration estimator for variable Feeding, #without considering any auxiliary information WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate g-weights for the dual frame calibration estimator for variable Clothing #when the frame sizes and the overlap domain size are known WeightsCalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate g-weights for the dual frame calibration estimator #for variable Feeding, considering Income as auxiliary variable in frame A #and Metres2 as auxiliary variable in frame B and with frame sizes and overlap #domain size known. WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate g-weights for the dual frame calibration estimator for variable Feeding, #without considering any auxiliary information WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain) #Now, let calculate g-weights for the dual frame calibration estimator for variable Clothing #when the frame sizes and the overlap domain size are known WeightsCalDF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate g-weights for the dual frame calibration estimator #for variable Feeding, considering Income as auxiliary variable in frame A #and Metres2 as auxiliary variable in frame B and with frame sizes and overlap #domain size known. WeightsCalDF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553)
Computes the g-weights for the SF calibration estimator.
WeightsCalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear")
WeightsCalSF(ysA, ysB, pi_A, pi_B, pik_ab_B, pik_ba_A, domains_A, domains_B, N_A = NULL, N_B = NULL, N_ab = NULL, xsAFrameA = NULL, xsBFrameA = NULL, xsAFrameB = NULL, xsBFrameB = NULL, xsT = NULL, XA = NULL, XB = NULL, X = NULL, met = "linear")
ysA |
A numeric vector of length |
ysB |
A numeric vector of length |
pi_A |
A numeric vector of length |
pi_B |
A numeric vector of length |
pik_ab_B |
A numeric vector of size |
pik_ba_A |
A numeric vector of size |
domains_A |
A character vector of size |
domains_B |
A character vector of size |
N_A |
(Optional) A numeric value indicating the size of frame A |
N_B |
(Optional) A numeric value indicating the size of frame B |
N_ab |
(Optional) A numeric value indicating the size of the overlap domain |
xsAFrameA |
(Optional) A numeric vector of length |
xsBFrameA |
(Optional) A numeric vector of length |
xsAFrameB |
(Optional) A numeric vector of length |
xsBFrameB |
(Optional) A numeric vector of length |
xsT |
(Optional) A numeric vector of length |
XA |
(Optional) A numeric value or vector of length |
XB |
(Optional) A numeric value or vector of length |
X |
(Optional) A numeric value or vector of length |
met |
(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear". |
Function provides g-weights in following scenarios:
There is not any additional auxiliary variable
and
unknown
known and
and
unknown
and
known and
unknown
and
known
At least, one additional auxiliary variable is available
known and
and
unknown
and
known and
unknown
and
known
A numeric vector containing the g-weights for the SF calibration estimator.
Ranalli, M. G., Arcos, A., Rueda, M. and Teodoro, A. (2013) Calibration estimationn in dual frame surveys. arXiv:1312.0761 [stat.ME]
Deville, J. C., S\"arndal, C. E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376 - 382
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate g-weights for the SF calibration estimator for variable Clothing, #without considering any auxiliary information WeightsCalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate g-weights for the SF calibration estimator for variable Leisure #when the frame sizes and the overlap domain size are known WeightsCalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate g-weights for the SF calibration estimator #for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. WeightsCalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553)
data(DatA) data(DatB) data(PiklA) data(PiklB) #Let calculate g-weights for the SF calibration estimator for variable Clothing, #without considering any auxiliary information WeightsCalSF(DatA$Clo, DatB$Clo, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain) #Now, let calculate g-weights for the SF calibration estimator for variable Leisure #when the frame sizes and the overlap domain size are known WeightsCalSF(DatA$Lei, DatB$Lei, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601) #Finally, let calculate g-weights for the SF calibration estimator #for variable Feeding, considering Income and Metres2 as auxiliary #variables and with frame sizes and overlap domain size known. WeightsCalSF(DatA$Feed, DatB$Feed, PiklA, PiklB, DatA$ProbB, DatB$ProbA, DatA$Domain, DatB$Domain, N_A = 1735, N_B = 1191, N_ab = 601, xsAFrameA = DatA$Inc, xsBFrameA = DatB$Inc, xsAFrameB = DatA$M2, xsBFrameB = DatB$M2, XA = 4300260, XB = 176553)