Sample Design Documentation EIA Home > Petroleum > Weekly Retail On-Highway Diesel Prices > Sample Design Documentation |
Sign Up for Email Updates |
KEY WORDS: Two-phase sampling; PPS sampling; Petroleum surveys; Telephone surveys ABSTRACT The EIA-888 is a survey of diesel fuel outlet prices that produces estimates of national and regional level prices. The EIA-878 is a survey of motor gasoline outlet prices that produces estimates of national and regional level prices, as well as separate estimates for four formulations and three grades of gasoline. Both of these weekly surveys have used a monthly survey as phase I of a multi-phase sample, subsampling the sample units of the monthly survey who report the specific outlet sales category. Recently phase I of both of the weekly surveys has used a combination of two overlapping sample cycles of the monthly survey as phase 1, adjusting the Probability Proportional to Size (PPS) size measures to account for sample units present in both sample cycles. BACKGROUND EIA conducts two weekly Computer Assisted Telephone Interview surveys that collect prices at the outlet level. The first is the EIA-888 which collects prices of diesel fuel from truck stops and service stations across the country each Monday morning. The second is the EIA-878 which collects prices of regular, midgrade, and premium motor gasoline by formulation from service stations across the country each Monday morning. Average prices of gasoline and diesel fuel through outlets at the five Petroleum Allocation for Defense District (PADD) levels, regions of the country, sub-PADD levels, and the state of California are released by the end of the day through Listserv, the Web, Fax, and telephone hotline. The diesel fuel prices that are released are used by the trucking industry to make rate adjustments in hauling contracts. Gasoline prices are frequently quoted by the media, particularly during times of rising or falling prices, because of the general interest to the public. The gasoline prices have been used in analyses of the cost of the Environmental Protection Agency regulations requiring oxygenated and reformulated gasoline in specified non-attainment areas. The prices have also been used by the state of California in helping to understand the high level of prices associated with their distinct market. Most importantly, they have provided national and state level legislators valuable independent, accurate and timely information during times of volatile markets and prevented the creation of unnecessary legislation in a free market system. SAMPLE DESIGNS The sample designs for these two surveys were based on the need to provide efficient samples with simple estimation to promote the fast turnaround time on gathering the data and releasing estimates. Design targets were originally set at 1 cent when the surveys began but, as more detailed information was required by customers, such as sub-PADD, grade, and formulation of gasoline, these targets were allowed to vary for lower level aggregates to provide sample sizes conducive to quick collection with minimal or no increase in survey costs. These targets are re-evaluated when new samples are drawn, historical standard errors examined, and cost-benefit analysis reviewed. The revised targets are shown in Table 1. Table 1. TARGET STANDARD ERRORS (in cents)
Geographic Area 1 (1.0) 1 (1.5) 1 (1.5) 1 (1.5) 1 (1.5) 1 (1.5) 1 (1.5) 1 (1.5) 1.5(2.01/1.5) 1.5 (1.5) 1 2.0 cents standard error was targeted for conventional but 1.5 for the other individual
formulations.
In addition to the timeliness requirement, the designs were driven by the lack of a frame
listing of diesel fuel outlets or service station outlets. Instead, the designs made use of a
monthly survey of a census of refiners, and a sample of resellers and retailers of
petroleum products. This company level survey collects prices and volumes by state and
enduse, and in particular, for the enduse category sales through retail outlets. The sample
for this survey is rotated roughly every 12 to 18 months. Data from this survey formed the
bases of the first stage of sampling for the two weekly surveys. Company-state units
(CSUs) in the monthly survey with price and volume data for gasoline or diesel fuel in the
sales through retail outlets categories made up the frame for the first phase sample for the
weekly surveys.
DIESEL SAMPLE To determine the allocations, average standard errors across reporting periods for the
previous year of weekly diesel fuel survey prices were calculated for each of the cells. An
average sample size was then determined for each cell by the formula: n' = (e/t)2 n, where t was the targeted standard error, n was the previous sample size for the cell, and e
the average of the previous sample's standard errors, and n' was the new sample
allocation. In addition, a second allocation based on proportional representation (proportion of diesel
fuel volume sold) within the next larger cell (i.e. more aggregated level cell that the
original cell would contribute to) was also obtained. For example, the PADD IB cell
contributes to the PADD I and the U.S. cells. The maximum of the these two allocations
for each cell was then designated as the cell allocation. For the diesel fuel survey, data from cycle 11 of the monthly survey for November 1995
to October 1996 provided 1,536 CSUs from 964 companies. However, because in this
most recent sample selection estimates were being targeted at the sub-PADD and
California level for the first time, concern was raised that the increases in sample size
would result in more cases of multiple outlets per CSU. In addition to increased design
effects, the possibility existed of cases where the number of outlets sampled for a CSU
would exceed the number of outlets that the CSU had in the particular state. As a result of
these concerns, consideration was given to using data for January to June 1994 from the
previous monthly survey cycle, cycle 10, thereby providing more CSUs from which to
sample. Using two survey cycles of data, two separate, independent samples could be
selected, one from each cycle, and outlets could be sampled from the CSUs so as not to
overlap. The estimates from the two samples could then be averaged. However, it is also apparent that there is no need to conceptualize the design as consisting
of two samples. For example, consider a given CSU which is allotted a portion of the
allocations in a sample. The CSU, x, has an expectation of e1(x) outlets. If the method of
selection is a Goodman-Kish approach, where more than one outlet may be selected from
the CSU, then the number of outlets selected will differ from e1(x) by less than one (e.g.
if e1(x) =3.2 then either 3 or 4 outlets will be selected from x). Suppose the expectation
for the same CSU from the second sample was e2(x). Then one could assign an
expectation of e1(x) + e2(x) to the CSU and combine the two samples into one draw. With the combined sample cycle approach, one sample selection, the CSUs' measures of
size could be normalized to sum to the cell allocations. Therefore, a proportion of the
allocation could be assigned to each cycle, and each proportioned allocation could be
multiplied by the proportion of weighted volume each CSU represented in the cell. Size
measures could be added across cycles and only one sample selected. The results from one
sample would be the same as the averaged estimates from two separate samples. The
simpler one-sample method was implemeted. The volumes of companies that appeared in
only one cycle of the monthly survey were multiplied by a ratio reflecting the ratio of
companies present in both sample cycles. The use of the cycle 10 sample provided 1,693
CSUs from 1,089 companies. The final combined frame counts, the sample for Phase 1
for both gasoline and diesel, are provided in table 2. The second phase had two stages. The first stage of the second phase of the sample
design for the diesel fuel weekly survey used as a measure of size for PPS sampling the
CSU's annual state sales volumes from the monthly survey divided by the unit's
probability of selection in the monthly survey. These size measures were normalized by
assigning ½ of the allocation necessary to achieve the target errors in the cell to each cycle
and multiplying this half of the allocation by the proportion of the total weighted volume
in the cell for the cycle represented by the CSU. The allocation procedure described
above yielded a targeted sample size of 350 for the diesel fuel survey. Normalized size
measures for each CSU were determined for each cycle separately, and then the two size
measures were added to form one frame. Each CSU in the frame, therefore, had a size, and the sizes of the CSUs within each cell
added up to the allocation of each cell, which are shown in Table 3. To select the units for the second phase of the sample, the frame CSUs were sorted by
state and randomly ordered within each state. The normalized size measures were then
used to define sampling intervals of 1.0. Using the random order, cumulative size
measures were determined where a CSU's cumulative size was the sum of the sizes of all
CSUs Table 2. FRAME COUNTS FOR THE DIESEL FUEL AND GASOLINE SAMPLES (PHASE 1)
COMPANIES
CSUs EARLY CYCLE ONLY
671
1070 LATER CYCLE ONLY
514
743 BOTH CYCLES
1022
964 TOTAL FRAME 150 214 325 868 296 167 44 143 2207 22 27 53 65 35 39 18 23 282 28 34 64 72 36 56 29 31 350
Since allocations were derived at the cell level, cell averages were just simple averages of
the CSU prices (the weights from the first and second phases cancel). The U.S. average
was a weighted average of the cell/PADD averages where the weights were derived by
taking the inverse of the probability proportional to the PADD weighted volumes. GASOLINE SAMPLE Similarly, the gasoline sample, selected almost a year after the diesel fuel sample, made
use of two frames based on cycle 11 and cycle 12 of the monthly survey as the Phase 1.
In this survey, standard errors were targeted for PADDs, sub-PADDs, and California, as
well as formulation. The sample sizes within the PADD/formulation (conventional,
oxygenated, reformulated, and oxygenated program reformulated gasoline (OPRG)) cells
were allocated using the maximum of the grades' (regular, midgrade and premium)
median standard error across reporting periods for the previous 6 months of the weekly
gasoline survey prices. The weekly standard errors were obtained using a bootstrap
procedure. A single bootstrap covered all reference weeks, but separate variance
estimates were derived for each week. Similar to the diesel fuel survey allocations, cell
allocations took into account the allocation necessary for that cell itself, as well as the
contribution that cell makes to a more aggregated cell by considering its proportion of
total volume in the larger cell and multiplying that proportion by the allocation of the
larger cell. For example, the PADD IV oxygenated gasoline considered the allocation
required in that cell, as well as the proportional allocation needed in PADD IV total for
all formulations of gasoline and the U.S. total oxygenated gasoline allocation. The
maximum of the allocations was assigned to the smaller cell to satisfy all requirements.
However, because the first stage of the second phase sample yields company-state units,
and companies do not always have available a list of outlets designated by attainment
status (i.e. formulation), the number of outlets originally sampled from each CSU often
had to be larger than the number actually desired in order to satisfy the individual
formulation allocations. Once the attainment status of the outlets was determined during
initiation of the sample, the desired number of outlets could be subsampled to obtain the
targeted sample size . To produce CSU expectations on the number of outlets required in oversampling to
achieve the desired number of outlets for each formulation, ratios of the formulations were
derived using the monthly survey where possible. Where ratios could not be calculated,
such as ones involving OPRG, which is not collected separately in the monthly survey,
population ratios for the specific attainment status at the state level were used instead. For
each CSU, the CSU's total gasoline volume across the three grades was multiplied by the
expected proportion of gasoline for each of the four formulations to yield an expected
volume by formulation. The CSU's monthly weight was applied to these formulation
volumes and divided by the total weighted volumes for the PADD for that formulation and
then multiplied by the cell's desired allocation to yield the expected number of outlets to
be sampled for each CSU. This was done separately for companies in each of the two
monthly respondent cycles and the results of the two cycles added together. These
expectations were then divided by their proportion of the CSU volumes. The maximum of
these four results, one per formulation, was the global expectation or the size measure that
was used for each CSU. Sampling then proceeded as in the diesel fuel sample, using
Probability Proportional to Size (PPS) and a sampling interval of 1. Because of the use of
oversampling, second stage sampling was necessary. For each outlet selected, the outlet's
adjusted expectation was divided by the maximum adjusted expectation. If this quotient
was larger than a selected random number between zero and one, the outlet was retained.
If the quotient was smaller, the outlet was dropped. The second phase sampling produced 507 CSUs from 304 companies. Stage 1 resulted in
1,174 outlets, and stage 2 yielded an expectation of 820 outlets as shown in Table 4.
Final stage 2 expected and actual allocations by formulation are shown in Table 5.
Table 4. GASOLINE FRAME AND SAMPLE ALLOCATIONS (1ST STAGE AND 2ND STAGE EXPECTATIONS) BY CELL Table 5. GASOLINE SAMPLE SECOND PHASE SECOND STAGE OUTLET COUNTS: EXPECTED AND ACTUAL BY FORMULATION |
Need Help? phone: 202-586-8800 email: infoctr@eia.doe.gov Specialized Services from NEIC |
For Technical Problems phone: 202-586-8959 email:wmaster@eia.doe.gov |
Energy Information Administration, EI 30 1000 Independence Avenue, SW Washington, DC 20585 |
Home | Petroleum | Gasoline | Diesel | Propane | Natural Gas | Electricity | Coal | Nuclear |
Renewables | Alternative Fuels | Prices | States | International | Country Analysis Briefs |
Environment | Analyses | Forecasts | Processes | Sectors |