Weighting is a vast subject, and MERLIN can handle anything from the simple application of existing factors to the calculation of complex multi-stage weighting.

In this blog we will start from scratch by defining the terms used in weighting and giving some simple examples, then expand on this in future articles.

**Respondent and quantity weighting
factors**

A **weighting factor** can be a wvar or an ivar (a variable
containing a number with or without a decimal point) and is used when
incrementing tables, marginal counts, and frequency counts (but we will confine
our discussion to tables). So, if a respondent has a weighting factor of 3,
they will be counted three times instead of once when incrementing a table.

MERLIN distinguishes between **respondent weighting** (WR) and **quantity
weighting** (WQ) and, to illustrate the difference, we will take the example
of an ivar called $CARS containing the number of cars each respondent owns. If
we use this statement…

SELECT WR $CARS,

… all tables following will be weighted by $CARS until we specify SELECT
WR with another variable, or SELECT WR OFF to stop weighting. By default,
MERLIN will show an **unweighted** total row (the number of respondents) and
a **weighted** total row (the number of cars), and all other numbers and
percentages in the table will relate to the *weighted data*. Whenever WR
is used, MERLIN creates two internal tables, one for unweighted and one for
weighted data, so although unweighted figures are usually shown only in the
total row, they can be shown anywhere if appropriate formats are set. If,
however, we specify…

SELECT WQ $CARS,

… only the **unweighted**
total row will be shown, but it will contain the number of cars – in other
words, the table is incremented entirely in terms of cars rather than
respondents (so there is only one internal table). Some may not view this as
weighting in the truest sense, but simply a way of counting the data. When
doing this, it is sometimes necessary to count different parts of a table using
different factors so, to save repeatedly specifying SELECT WQ, MERLIN allows
you to add the relevant variable to the end of the table statement, e.g.

T#1 = $SIDE1 * $TOP + $FACTOR1, !apply
quantity weight $FACTOR1

+T#1A = $SIDE2 * $TOP + $FACTOR2, !apply quantity weight $FACTOR2

+T#1B = $SIDE3 * $TOP + $FACTOR3, !apply quantity weight $FACTOR3

…

A factor specified in this way will temporarily replace any factor specified with SELECT WQ.

SELECT WR and SELECT WQ may *both*
be applied to a table, and will generate an unweighted total representing the
number of cars (in our example) and a weighted total representing the number of
cars weighted by some additional factor.

The usual understanding of the term “weighting” is respondent weighting (WR)
and, unlike quantity weighting, it is unusual for different factors to be required
within a single table, or even within an entire run. Respondent weighting is usually
done to *correct imbalances in the sample*,
e.g. we have failed to interview enough females, so we give more weight to
their answers

From now on, we will assume that we are using **respondent weighting**, and that it is being used for its usual
purpose of correcting an imbalanced sample.

**The weighting matrix**

We first need to identify the groups to which different weighting
factors will be applied, and will use the simple example of male and female. Together,
these groups constitute a **weighting
matrix** and each group is known as a **cell**.
It is important that every respondent falls into *one cell and one only*, so they each receive one factor.

We don’t usually know the factors when we start the analysis, but should
know the **targets**, i.e. the **ideal**
number of respondents in *each cell* (also known as **universe **or **population**
figures) – so we call this **target weighting**. Targets may be expressed in
various ways such as percentages or population estimates, but it doesn’t really
matter since they are essentially *ratios*, showing the target *proportion*
in each cell. Let’s assume we have them as percentages, as shown in column (a)
below, and the actual number of respondents is shown in column (b). The factor
for each cell is the target divided by the actual sample, so that gives us the
factors in column (c) and, if we apply them, we will arrive at the weighted
sample in column (d).

Our main aim has now been achieved in that the cells are in the correct proportion but, as it stands, the total weighted base will be 100, which probably isn’t what we want – maybe we want a population estimate in thousands, or we want it to be the same as the unweighted. Either way, we can simply apply an overall “grossing-up factor” of the figure required divided by the current weighted total, e.g. 56100 / 100. In other words, we don’t need to convert all our target percentages into numbers.

The good news is that everything can be done in a single MERLIN run which increments the number in each cell, calculates the factor, then applies it to produce weighted tables. New users are sometimes surprised that MERLIN has no specific statements or functions for doing this – but that is because the MANIP stage already provides a powerful tool which enables us to treat a MERLIN table like a spreadsheet where we can reproduce the above table then apply the factors calculated. Item 11.2 of the MERLIN Tips and Examples library shows an insert file (PTARG.INC) which has been developed for this purpose, and can be used in any script where target weighting is required – such as example item 11.4.

**Interlaced matrices and rims**.

Let us now suppose the weighting relates to more than one variable so,
as well as the 2 gender groups, we also have 4 age groups and 3 social class
groups. How we proceed depends whether we have **interlaced** targets (i.e.
2 * 4 * 3 = 24 figures) or only the **totals** for each variable (i.e. 2 + 4
+3 = 9 figures).

In the first case, we can create a single variable which interlaces
gender, age and class, so it is another example of **target weighting**discussed
above. The interlaced variable is easily created with this MERLIN statement…

DS $MATRIX = $CLASS.BY.$AGE.BY.$GENDER,

… in which the first variable is the ‘outer loop’, i.e. the cells will be in this order…

$CLASS/1, $AGE/1, $GENDER/1,$CLASS/1, $AGE/1, $GENDER/2,

$CLASS/1, $AGE/2, $GENDER/1,

… and so on.

The second case described above is called **rim weighting** because
we only know the *rims* (i.e. the totals), and we will discuss this in a future
blog.

**Applying calculated factors**

Once we know the factors to be applied, we can use a ‘data lookup’ statement to specify the factor for each cell, maybe gross it up, then apply it, e.g.

DW $FACTOR = $GENDER (0.0996,1.1059),

DW $FACTOR = $$ * 56100 / 100, !gross total up to 56100

SELECT WR $FACTOR,

The number of factors in brackets must equal the number of items in the matrix variable.

Since MERLIN runs so fast, users often allow it to re-calculate the factors in every run but, if you are doing many runs using the same weighting factors, it makes sense to replace the code that calculates them with the code above.

Any questions? Email support@merlinco.co.uk.