I am deeply sorry to announce that Keith Hughes, a founding director of Merlinco, has died peacefully at the age of seventy-six, following a short battle with cancer.
His career began in 1965 at MAS Research, where he began to focus on computerised survey analysis. After working at ICL, CRC and Libra he became a consultant in 1973 and co-authored the prototype version of MERLIN. In the 1980s he helped found the Tab Shop and Market Research Software and continued developing MERLIN until quite recently when others took over the task. In 2010, Keith was made an Honorary Member of the Association of Survey Computing in recognition of his work developing the Triple-S standard, and earlier this year was made an Honorary Fellow of the Market Research Society for the same reason.
I have already received numerous comments from clients and colleagues to tell me how much Keith was loved and appreciated, especially for his eccentricities! Working with him was never dull – he had strong opinions but was never upset if you disagreed with him as he enjoyed a good discussion.
I first met Keith in 1976 but did not see much of him until 1990 when I became a MERLIN client. When that happened, he delivered a training session to my department and spoke non-stop with great enthusiasm and volume, and I recall taking a headache tablet at the end of the day! In 1991 he phoned me to say he was starting a new company (Merlinco) and asked me to join him, especially to deliver client training as he did not think that was his strength! The very same week another friend (whom I will call Fred) mentioned the possibility of collaborating with him, so I called someone who knew both Keith and Fred to ask for advice, which I will never forget… “If you want to make money, go with Fred. If you want to be happy, go with Keith!”. Based on that advice, I joined Merlinco where I spent many happy years working with Keith (and we made some money as well)!
John Tebboth,
Director
July 2024
MERLIN Toolkit is supplied free with MERLIN and MERLINPLUS, but can also be purchased on its own by non-MERLIN users.
Its main function is to provide a user-friendly bridge between MERLIN and other software packages so – for example – a MERLIN user can import from a web interviewing package (via Triple-S), or export to SPSS. Someone not using MERLIN, however, can get from the web interviewing package to SPSS without doing any MERLIN processing in between, or convert from one Triple-S version to another. For a full list of MERLIN Toolkit imports and exports, refer to the Survey Software section of our website.
This blog, however, concentrates on the lesser known Tools menu, which contains a number of useful utilities.
Join data files vertically / horizontally
These utilities combine separate data files into one, and can be used with ASCII, CSV or “cba” (column binary) data.
Vertical Join simply places all the records from separate files into a single file – whether for convenience, or because you are using software that requires this.
Horizontal Join is used to combine records that belong to the same respondent, but are held in separate files with matching keys – e.g. pre-coded and open-ended data with matching serial numbers can be combined into a single file (typically by adding the open-ended data on the end of the pre-coded records). The user can choose whether or not to include unmatched records from either file.
Join Excel tables files vertically / horizontally
These utilities combine separate Excel files of MERLIN tables into a single volume – the original files having been generated with MERLIN Run Control Parameter XLSTAB set to 1 or 3. The files may be separate because they were generated at different times, or to overcome size limitations, or simply because it was more convenient – but you now wish to put all the tables in a single file
Vertical Join puts all the tables into a single volume without modification (except for renumbering them if required), and generates a new Table of Contents.
Horizontal Join is used with sets of tables that are identical apart from the banner (horizontal axis), and combines these separate banners into one.
Convert CSV/XLSX bitstrings to spreadfields
This utility reads a “csv” or “xlsx” data file in which the multi-coded variables are coded in bitstring format (e.g. 1,0,0,1,1 in separate fields), and outputs a new file of the same type in which these variables are in spreadfield format (e.g. “1,4,5” in a single field). This can greatly reduce the size of the file, overcome limits, and make it faster to process.
The utility works by examining the field names in the header row and looking for names containing at least one underscore character “_” and the last underscore (if more than one) is used to split the name into a prefix and a suffix. If consecutive fields have the same prefix, and all the cells in those fields contain 0, 1, space(s) or blank only, then those columns will be merged into one spreadfield with the prefix as the field name.
If, for example, the input file contains consecutive fields named Q5_A, Q5_B, Q5_C, Q5_D all containing 0, 1, space(s) or blank only, they will be replaced by a single field named Q5.
DMERGE
DMERGE is a utility for combining two or more “idf”s (MERLIN internal data files) into a single “idf”, whether or not they are compatible.
DMERGE is mainly used so MERLIN can analyse data from incompatible “idf”s – especially useful for merging data from several waves of a survey. It is also possible to select variables from each of the input “idf”s, so DMERGE could be used with a single input “idf” simply to output a new one with fewer variables.
The output “idf” will contain all the records from all the input “idf”s, and these new records will (by default) contain all the variables defined in any of the input “idf”s. Variables that are not defined in a particular “idf” will be empty.
Any questions? Email support@merlinco.co.uk.
In blog #10, we described how to generate MERLIN tables in “xlsx” files, and how they can be enhanced by creating your own styles file. In this blog we will look in more detail at possible enhancements, using this example (which was generated directly by MERLIN without any post-processing).
For general information about styles files, refer to blog #10, and for more detailed information about the commands, see the MERLIN Help file > User files > “txt” Excel styles file.
A styles file commences with global commands which control things like the width of the row label and data columns, and whether commas are shown in thousands (e.g. 2,385). In our example, we have changed the following global commands from their default settings:
TOCLINK=ON | create hyperlink from each table back to Contents sheet |
TOCLINK COLUMN=B | column to contain above (see cell B1) |
LOGO=MERLINCO.PNG | file containing logo graphic |
LOGO COLUMN=L | column to contain logo in Tables (see cell L1) |
LOGO TOC COLUMN=D | column to contain logo in Contents sheet |
BANNER COLOUR ROTATION=1 | header level at which colour rotation operates (see row 6) |
FREEZE=ON | freeze panes containing table axes (see columns B-C and rows 1-9) |
GRIDLINES=OFF | do not show Excel cell gridlines |
CHART TYPE=HORIZONTAL BAR | generate horizontal bar chart (see rows 23+) |
The global commands are followed by style definitions, which control things such as the font, cell colours and borders, text wrapping or shrinking. The style definitions form a hierarchy as shown in the following chart (information in brackets relates to our example):
So, BASALL defines the style for the entire table but this can be overwritten by BASTOP for titles at the top, then by the individual components in that section such as LBJ (lines before job header) and JH (job header). See MERLIN Help file for a full list of the individual components.
Each style definition starts with its name in square brackets, e.g. [BASALL], followed by style definition commands.
The following commands may be used within each style definition (but it is only necessary to specify them if non-default):
FONT FACE=fontname |
FONT SIZE=n |
FONT COLOUR=colour |
FONT BOLD=OFF / ON |
FONT ITALIC=OFF / ON |
FONT UNDERLINE=OFF / ON |
TEXT SHRINK=OFF / ON |
TEXT WRAP=OFF / ON |
CELL BORDER STYLE=NONE / DASHED / DOTTED / THICK / MEDIUM / THIN |
CELL BORDER COLOUR=colour |
CELL COLOUR=colour / colour / colour … |
CELL COLOUR ALTERNATIVE= colour / colour / colour … |
JUSTIFICATION=CENTRE |
FIGURES=LABEL |
The CELL COLOUR ALTERNATIVE command may only be used in BASROW and BASFIG and their individual components.
The JUSTIFICATION=CENTRE command may only be used in BASTLC, BASBAN, BASROW and BASFIG and their individual components. Otherwise, all text is left justified.
The FIGURES=LABEL command may only be used in BASFIG and its individual components, signifying that the style definition is to be copied from the corresponding style in BASROW, e.g. if used in UNR_FIG, the definition will be copied from UNR.
Colours may be specified using three different methods which may be mixed freely:
Hexadecimal and RGB values may be found on the Microsoft Word Custom Colours form.
In our example, we have used FONT COLOUR=BLUE in the VT style to show the Vertical Title in blue (row 4).
The CELL COLOUR command (only) may contain two or more colours (using any of the above methods) separated by forward slashes, e.g. CELL COLOUR=RED / GREEN / CYAN / GREY. Multiple cell colours are only relevant in BASROW, BASBAN and BASFIG and their individual components (they will not actually be used in any BASROW styles, but will be picked up if FIGURES=LABEL is used in BASFIG styles). Multiple colours will be used in rotation across the banner as specified by the global command BANNER COLOUR ROTATION. A setting of 0 means it will change on every column, 1 on every header, 2 on every overheader, etc. In our example, the colour changes on every header (row 6) and this effect is continued for all rows beneath (rows 7-20).
The CELL COLOUR ALTERNATIVE command allows you to use different colours on alternate rows of styles BASROW and BASFIG and their individual components. The alternation works within the individual components, so in our example we have included these commands in the RTV style definition…
CELL COLOUR=#5FEB69 |
CELL COLOUR ALTERNATIVE=#C4F8C8 |
… to produce the two shades of green, and these commands in RTV_FIG…
CELL COLOUR=#71E4F3 / #FFFF69 / #FFB343 / #E99BB9 |
CELL COLOUR ALTERNATIVE=#BDF2F9 / #FFFFCD / #FFDDAB / #F9E3EB |
… to produce the two shades of blue / yellow / brown / purple (used in rotation as defined by BANNER COLOUR ROTATION=1).
We have also used CELL COLOUR commands to specify the colours of each of the blank rows, and the Mean Score row. No ALTERNATIVE is needed, since these rows only occur once within each table.MERLIN formats PSV and SCV
If MERLIN formats PSV (Print Score Values) and SCV (Separate Cell for score Values) are used, the score values will appear in column C, and the figures in columns D+.
When this is the case, column C is treated as belonging to BASTLC and BASROW and their component styles, so it will have all the same characteristics as column B. The global FREEZE command will also be adjusted to include column D.
Only one chart type is currently available (HORIZONTAL BAR). If a table contains any column percentages, and you are using RCP XLSTAB=3 (one table per worksheet), this chart will be generated below the table using the figures from column C (or column D if formats PSV/SCV are used). This will usually be the total column but not always – so you may need to delete or change charts that are inappropriate.
We will add other chart options if there is sufficient client demand.
Any questions? Email support@merlinco.co.uk.
MERLIN can generate tables in “xlsx” files so you can open them in Microsoft Excel – whether you simply want to view them, or maybe create charts. MERLIN’s creation of Excel tables is completely independent of Excel, and may be done without Excel being installed. There are numerous options controlling the output and appearance of these tables, which are introduced in this blog.
Tables and worksheets
Output of tables to Excel is controlled by Run Control Parameter (RCP) XLSTAB, which has three possible settings:
XLSTAB=1 all tables in one Excel worksheet called “Tables”, one after another
XLSTAB=2 each table in a separate Excel file named after the MERLIN table ID
XLSTAB=3 each table in a separate Excel worksheet named after the MERLIN table ID
Settings 1 and 3 also generate a worksheet called “Contents” containing a Table of Contents (TOC) with hyperlinks to the tables themselves. No TOC is generated by setting 2.
All settings automatically suppress output of tables in other formats, except you may specify RCP TABSML if you also want tables in TABSML format.
Plain Excel output
By default, MERLIN produces “plain Excel output” like this:
You will notice that the banner labels (including the header “Class”) are left justified, since that is Excel’s default treatment of text, and zero cells are shown as “0” instead of the usual “-”, since the latter would also be treated as text. This plain output is suitable if you just want to view the figures, or maybe create charts, and the appearance of the tables themselves isn’t important – but it is usually preferable (and very easy) to produce enhanced output.
Enhanced Excel output
By setting RCP XLSID, we can produce “enhanced Excel output” like this:
XLSID causes an “identifier flag” to be placed in column A (hidden by default) which enables MERLIN to format specific rows differently to others. So, by default, a different font is used for the job header, table title and banner labels – and the latter are also boxed and shaded. This is all specified in a “styles” file called “xlstab_std_styles.txt”, which is supplied with the software and used by default – but you can create your own styles files as explained in the next section below.
If you always want to use XLSID with Excel tables, simply set it in your global RCP settings file (MERLIN allows users to define default settings that will be used on every project you run).
Creating your own styles
To create your own style, copy the default styles file from the software installation folder to a folder of your choice, and give it a suitable name, e.g. “my_styles_1.txt”. You can then edit this file, following the explanatory comments to change or insert simple commands relating to the various parts of the table, e.g. there is a line that reads “freeze=off” but, if you change it to “freeze=on”, the Excel “freeze panes” option will be applied so both axes of the table remain visible when you are scrolling. Similarly, you can set “toclink=on” so each table contains a hyperlink back to the TOC.
Cells and fonts can be changed to standard colours, e.g. the banner cells to yellow, the raw numbers font to red, and the vertical percentages font to blue – or you can specify any colour using RGB or hexadecimal values – plus there are many other options, fully described in the MERLIN Help File > USER FILES > “txt” Excel styles file.
To apply your own styles file, use RCP XLSTYLES, e.g.
XLSSTYLES= my_styles_1.txt
Here is a table produced using a styles file, with the changes described above:
MERLIN formats
MERLIN itself has hundreds of formats and label controls affecting tabular output – but not all of these are relevant to Excel tables, and some settings will be ignored. These include all that control page breaks, text spacing, indentation and justification – since these are irrelevant, or can be specified in the styles file. See the MERLIN Help File chapters about FORMATS and LABEL CONTROLS for full details of which settings are applicable or not. There are, however, some formats that specifically relate to Excel tables:
SCV creates an extra column for score values, as opposed to including them in the row labels column
BLC includes blank rows in Excel tables (as specified by the relevant MERLIN formats, or by using * in text) – by default all blank rows are suppressed
BTL causes significance flag rows (e.g. generated by format TTS) to be output when there are no flags – giving the table a more uniform appearance.
There is also a label control <PX> which is used only when outputting tables to Excel, and which forces a banner to be split on that item. Excel tables usually use as many columns and rows as are needed to display the entire table, i.e. there are no “page breaks” – but it may sometimes be desirable to split the banner into two or more sections.
Any questions? Email support@merlinco.co.uk.
In blog #8, we discussed weighting in general, the terminology used, and how to do some types of weighting in MERLIN. In this blog about rim weighting, we will assume you are familiar with the terminology but, if not, we advise you to read blog #8 first.
How rim weighting works
As an example, we will use a project with two rims – gender and age – and the targets shown below.
Just to be clear, we cannot use target weighting because we don’t know the targets for the interlaced matrix, e.g. we don’t know the target for males aged 16-34, only the ‘edges’ or ‘rims’ of the interlaced matrix.
Some people have approached this problem by calculating the factor needed to achieve the gender targets then, having applied this factor to the data, calculated the factor needed to achieve the age targets – then multiplied both factors together. Although this will achieve the age targets, it is unlikely to achieve the gender targets – and if we do the whole process the other way round, the gender targets will be achieved but not the age ones. Rim weighting, however, is a way of achieving two or more sets of targets at the same time.
Unlike target weighting, it is virtually impossible to calculate rim weighting without a computer, since it involves trying lots of different factors until we find the ones that achieve all the targets. These attempts to find suitable factors are known as iterations. Sometimes it simply isn’t possible to achieve all the targets, and MERLIN will report this (but see next section below). Another point to understand about rim weighting is that different programs use different methods of calculation, so may end up with different factors. Although weighted figures would be the same when tabulating gender or age, they may not be the same when tabulating other variables.
Run Control Parameters (RCPs) MAXRIMIT and OKRIM
By default, MERLIN will do 500 iterations before abandoning rim weighting calculations – but this can be increased with MAXRIMIT=n, where n is any number up to 32767. In practice, if MERLIN cannot achieve the targets in 500 attempts, it cannot usually achieve them in more – but it is worth a try. There are two main reasons why rim weighting targets cannot be achieved:
RCP OKRIM specifies that MERLIN will continue with the run even if the targets are not achieved. Sometimes in this situation, inspection of weighted tables shows that the results are close enough or, because of rounding, they may even appear to be what is required.
Calculating weighting for two rims
As with target weighting, the calculation is done with MERLIN’s powerful MANIP facilities.
First, in the Data Stage we need to generate a table (#RIMACT) showing the actual (unweighted) figures for gender by age, then three empty tables to be used in MANIP – a ‘total only’ table for each rim variable (#RIMV01 and #RIMV02), and another table of gender by age (#RIMWGT) that will contain the weighting factors calculated.
F=NPTB, !don’t print these tables
T#RIMACT = $GENDER*$AGE,
T#RIMV01 (F=NITB) = $GENDER*,
T#RIMV02 (F=NITB) = $AGE*,
T#RIMWGT (F=NITB) = $GENDER*$AGE,
Then, in the Manip Stage, we specify the target figures for each rim variable in #RIMV01 and #RIMV02. As with target weighting, it doesn’t matter whether these figures are numbers or percentages (and they could even vary between rims) providing the first figure is the total of the others.
To get the weighting factors into table #RIMWGT, we use function %RIM. The first argument is the total weighted base required, the second is the name of the table containing the actual figures, and the remainder are the names of the tables containing the rim targets. If the total weighted base is required to be the same as the unweighted, specify the first argument as U.
MT#RIMV01 = (100.0,42.0,58.0),
MT#RIMV02 = (100.0,31.0,33.0,36.0),
MT#RIMWGT = %RIM(1500.0,#RIMACT,#RIMV01,#RIMV02),
Finally, in the Tables Stage, we extract the weight from the relevant Row and Column of table #RIMWGT, and apply it to all tables following.
DW $WT=#RIMWGT(R$GENDER,C$AGE)
SELECT WR $WT,
A complete example script can be found in item 11.6 of the MERLIN Tips and Examples library.
Using more than two rims
If we have more than two rims, we need to interlace some of them so we still end up with two variables to use in tables #RIMACT and #RIMWGT. If, for example, we have rims for gender, age and area, we could interlace gender and age…
DS $SIDE=$GENDER.BY.$AGE,
… then tabulate $SIDE * $AREA.
If we have four rims, we could interlace the first three, and so on … but might eventually break some limits. The maximum number of rows in a table is 32000 and the maximum number of columns is 1500 so, if interlacing all except the last variable gives more than 32000 cells, we must also interlace some variables to be tabulated across the top, e.g.
DS $SIDE = $GENDER.BY.$AGE.BY.$CLASS.BY.$MARITAL.BY.$HHSIZE,
DS $TOP = $AREA.BY.$CARS,
The maximum number of rims is 100, and the maximum number of cells when interlacing them all is 1,228,800.
Further examples can be found in items 11.7, 11.8 and 11.9 of the Tips and Examples library.
Any questions? Email support@merlinco.co.uk. ls