The intent of this tutorial is not to make you an expert in SAS, but rather simply to use some of the basic functions to generate a main effects experimental design for a conjoint marketing survey. A fabulous reference book has been written by Warren Kuhfeld and is available for free at http://support.sas.com/techsup/tnote/tnote_stat.html. Kuhfeld covers theory of discrete choice modeling, experimental design, and conjoint analysis, and he works out several examples in SAS starting on page 114 that will be good references for you in generating your own survey. Here I describe only the bare minimum of what is necessary to generate the experimental design for your project. If you are interested in utilizing more of the features of SAS, Kuhfeld’s book is the best resource.

This tutorial follows an example where three product characteristics and price are used to define the product for the marketing model. The first characteristic z1 has been discretized into three levels {1,2,3}, the second z2 has been discretized into two levels {1,2}, the third z3 into four levels {1,2,3,4} and price has been discretized into four levels {1,2,3,4}, so that:

• z1 = {1,2,3}
• z2 = {1,2}
• z3 = {1,2,3,4}
• p = {1,2,3,4}

With these levels, set by the team, there are 3×2×4×4 = 96 product profiles. Asking about preferences among all possible combinations of product profiles would lead to too many questions for a survey, but we can generate an efficient choice-based survey design using SAS.

We will cover two different methods for generating choice-based design. The first method is for labeled products, and the second is for generic products. Kuhfeld reviews the differences on p322.

1. Labeled choice designs are intended for situations where you want to treat one of your product characteristics as special and get more information about that one and about how the others interact with it. The Fabric Softener example in Kuhfeld’s book (p126) uses this approach. The attributes are brand and price. Brand is chosen as the label so that all four brands appear in every choice set, and a linear design is generated from this. If you are not concerned about specific labels and just want to know measure people’s preferences for all of our product characteristics and price, it is still possible to use this approach by picking one of the product characteristics as the ‘label’, and the resulting survey design will be conservative (i.e., it will ask more questions than needed, and the extra data will be focused on gaining information about interactions with the label, which you are likely not to use).
2. Generic choice designs are intended for situations where we just want to gain information about consumer preferences for our product attributes. The chair design example in Kuhfeld’s book (p314) uses this approach. Generic choice designs are generated by computer search to find the most efficient design, given an estimate of the beta values at the solution. They will generally result in smaller surveys than the labeled approach, but there is some uncertainty because the most efficient survey design is generated based on a guess of what the answer will be. If the guess is way off, the survey design could be poor.

Kuhfeld states that “the linear model (labeled) approach is very conservative and safe in that it should let you specify a very general model and still produce estimable parameters. The cost is you may be using many more choice sets than you need, particularly for nonbranded generic attributes. If you really have some information about your parameters, you should use them to produce a smaller and better design. However, if you have little or no information about parameters and if you anticipate specifying very general models … then you probably want to use the linear design approach.” It is up to you which approach you wish to use. Read through the examples, try both approaches, and decide for yourself.

SAS screen shot
1. First, you will need to download the newest SAS marketing application macros from the website listed above. Unzip them into a folder. For example, on a PC you can place them in your personal H:/ drive as “H:/sasmacros/<filename>.sas”.
2. Next, open up SAS. A picture of the interface is shown here. The window on the bottom is where you write your code. This code is executed by pressing the icon at the top that looks like a running person. Upon executing your code, any output will be saved to an output file. The output files are listed on the left panel, and clicking on one of the output files opens an output window where you can scroll through the pages of output text.
3. In the lower window you will first want to include the macros that you have downloaded. Type the following text, replacing “H:\sasmacros” with the directory to which you have unzipped the macros.
```%include 'H:\sasmacros\mktallo.sas';
%include 'H:\sasmacros\mktbal.sas';
%include 'H:\sasmacros\mktblock.sas';
%include 'H:\sasmacros\mktdes.sas';
%include 'H:\sasmacros\mktdups.sas';
%include 'H:\sasmacros\mkteval.sas';
%include 'H:\sasmacros\mktex.sas';
%include 'H:\sasmacros\mktkey.sas';
%include 'H:\sasmacros\mktlab.sas';
%include 'H:\sasmacros\mktmerge.sas';
%include 'H:\sasmacros\mktorth.sas';
%include 'H:\sasmacros\mktroll.sas';
%include 'H:\sasmacros\mktruns.sas';
%include 'H:\sasmacros\choiceff.sas';
%include 'H:\sasmacros\phchoice.sas';
%include 'H:\sasmacros\plotit.sas';
```

# The Linear Approach for Labeled Choice Designs

Generating a labeled choice-based design requires choosing one attribute whose levels will appear in every question: called here the “label attribute”. In principal you can choose any attribute, but you will do best to pick an attribute with 3 or 4 levels and avoid choosing an attribute where some survey respondents are likely to respond exclusively to that attribute while ignoring other attributes (for example, avoid choosing price). In our example, we choose z1 as the column attribute. The remaining attributes z2, z3, and p have 2, 4, and 4 levels respectively. We will tell this information to the %mktruns macro by writing 2 4 4 three times in a row, signifying that for each of the three levels of z1 there are 2 levels of z2, 4 levels of z3, and 4 levels of p. The %mktruns command will tell you how many questions you need on your survey, given the number of attributes and the number of levels for each attribute. We write

```%mktruns(2 4 4 2 4 4 2 4 4);
```

and press the execute button. Part of the output is listed below

```                                    Design Summary

Number of
Levels       Frequency

2           3
4           6

Saturated      = 22
Full Factorial = 32,768

Some Reasonable                      Cannot Be
Design Sizes       Violations     Divided By

32 *              0
48 *              0
64 *              0
24               15     16
40               15     16
56               15     16
28               33      8 16
36               33      8 16
44               33      8 16
52               33      8 16

* - 100% Efficient Design can be made with the MktEx Macro.
```

The first part of the output tells us that if we were to ask all possible combinations of questions, we would be asking 32,768 survey questions. In order to generate a manageable survey size, we will need to generate an efficient fractional factorial design that can estimate main effects. The second part of the output suggests some reasonable sizes for which fractional factorial designs exist. We must choose the number of questions for our survey. In this case the output of %mktruns tells us that good main effects designs exist for designs of size 32, 48, and 64. Larger designs contain more data; however, we need to keep our surveys manageable for respondents. The difficulty of the survey task depends not only on the number of questions, but also on the number of alternatives per question and the number of attributes. Use your judgment to pick a reasonable-sized survey. Given this output, we choose that our survey will have 32 questions.

Next, we ask SAS to use the %mktex macro to generate an efficient fractional factorial survey design with 32 questions. This macro acts like an optimizer internally (like Excel Solver) and could be sensitive to the starting point, so if you have trouble getting good results, just run it again – SAS picks a random starting point each time. We need to tell the %mktex macro the same info we told %mktruns, plus tell it how many questions our survey should have.

```%mktex(2 4 4 2 4 4 2 4 4, n=32);
proc print; run;
```

Part of the output is listed below:

```                                                                           Average
Prediction
Design                                                         Standard
Number     D-Efficiency     A-Efficiency     G-Efficiency       Error
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1       100.0000         100.0000         100.0000          0.8292

Obs    x1    x2    x3    x4    x5    x6    x7    x8    x9

1     1     1     1     1     1     1     1     1     1
2     1     1     2     2     2     3     1     3     4
3     1     1     3     1     2     1     2     4     2
4     1     1     4     2     1     3     2     2     3
5     1     2     1     2     1     4     1     4     3
6     1     2     2     1     2     2     1     2     2
7     1     2     3     2     2     4     2     1     4
8     1     2     4     1     1     2     2     3     1
9     1     3     1     1     4     3     2     2     4
10     1     3     2     2     3     1     2     4     1
11     1     3     3     1     3     3     1     3     3
12     1     3     4     2     4     1     1     1     2
13     1     4     1     2     4     2     2     3     2
14     1     4     2     1     3     4     2     1     3
15     1     4     3     2     3     2     1     2     1
16     1     4     4     1     4     4     1     4     4
17     2     1     1     2     3     4     2     2     2
18     2     1     2     1     4     2     2     4     3
19     2     1     3     2     4     4     1     3     1
20     2     1     4     1     3     2     1     1     4
21     2     2     1     1     3     1     2     3     4
22     2     2     2     2     4     3     2     1     1
23     2     2     3     1     4     1     1     2     3
24     2     2     4     2     3     3     1     4     2
25     2     3     1     2     2     2     1     1     3
26     2     3     2     1     1     4     1     3     2
27     2     3     3     2     1     2     2     4     4
28     2     3     4     1     2     4     2     2     1
29     2     4     1     1     2     3     1     4     1
30     2     4     2     2     1     1     1     2     4
31     2     4     3     1     1     3     2     1     2
32     2     4     4     2     2     1     2     3     3
```

The first part of the output tells us efficiency metrics of the design. Kuhfeld’s book contains more info about what these mean. If you did not get numbers very close to 100%, check that you used a number recommended by %mktruns when you called %mktex. If you did, try rerunning it. The algorithm may have just gotten stuck. If you use a number recommended by %mktruns, you should get an efficient design from %mktex.

The second part of the output is your experimental design. The 32 questions (32 choice sets) are listed as columns, and the rows have the following meaning:

• x1: level for z2 for option #1 in the choice set (z1 at level 1)
• x2: level for z3 for option #1 in the choice set (z1 at level 1)
• x3: level for p for option #1 in the choice set (z1 at level 1)
• x4: level for z2 for option #2 in the choice set (z1 at level 2)
• x5: level for z3 for option #2 in the choice set (z1 at level 2)
• x6: level for p for option #2 in the choice set (z1 at level 2)
• x7: level for z2 for option #3 in the choice set (z1 at level 3)
• x8: level for z3 for option #3 in the choice set (z1 at level 3)
• x9: level for p for option #3 in the choice set (z1 at level 3)

The design provided by %mktex should be checked to make sure major biases do not exist. The following command outputs detailed information about the experimental design. Kuhfeld explains these starting with p43. Basically, you want to check that the canonical correlation matrix is diagonal and use the rest of the output to check for orthogonality and balance. If it looks good, your design is good. If not, you may need to rerun it or even, if all else fails, choose a different number of questions or number of levels on one of the attributes.

```%mkteval;
proc print; run;
```

Once you’ve checked that the results given by %mktex are good, and you will need to translate the encoded numbers in the survey design into actual attributes with actual values for your survey. Kuhfeld offers some macros that help to automate this process, and you are welcome to read up on them (the examples use them, and they are easy to follow); however, it is possible to do this yourself in other software, for example using a “find and replace” option.

Also, the design has been organized so that you can clearly see the patterns. You do NOT want these patterns to be obvious in your survey. You should randomize the rows of the survey (ask the questions out of order). You should also randomize the order of alternatives in each choice set. In the example x1, x2, and x3 refer to alternative 1, with z1 at level 1; x4, x5, and x6 refer to alternative 2 with z1 and level 2; and x7, x8, and x9 refer to alternative 3 with z1 at level 3. In most designs, you do not want the order of alternative 1,2, and 3 with levels z1 = {1,2,3} respectively to be in the same order for every question: respondents will see the pattern and it could affect results. So, mix up the order of alternatives in each question. SAS has macros to help you do this, or you can use excel or other software.

Finally, in your final survey you will want to add an extra option to each question called the “no choice option”. This option has no attributes, but may instead be labeled “I would not choose any of these”. This allows the respondent to opt out of any of the product alternatives in a choice set if they are all undesired, and you will need the resulting estimates based on this choice for calculating demand later.

A question from the final survey might look like this, starting (reordered) with question #17 from the SAS design and reordering the three alternatives as 1,2,3 as 1,3,2:

Suppose you are in the market for a <product class> but to not necessarily need to buy one and the following were your only alternatives. Which would you choose? (Please circle one)

Alternative 1 Alternative 2 Alternative 3 Alternative 4
• Has a <z1> of <level 1>
• Has a <z2> of <level 2>
• Has a <z3> of <level 1>
• Has a price of <level 1>
• Has a <z1> of <level 3>
• Has a <z2> of <level 2>
• Has a <z3> of <level 2>
• Has a price of <level 2>
• Has a <z1> of <level 2>
• Has a <z2> of <level 2>
• Has a <z3> of <level 3>
• Has a price of <level 4>

I would not choose any of these

Of course, you can make your survey look nicer, making the attributes and levels easy to read. You can even use pictures to describe the attribute levels if appropriate.

# The Computer Search Approach for Generic (Unlabeled) Choice Designs

When generating a generic, unlabeled design, we use the %mktruns macro directly on the full set of attributes. The attributes z1, z2, z3, and p have 3, 2, 4, and 4 levels respectively, so we write:

```%mktruns(3 2 4 4);
```
```                                    Number of
Levels       Frequency

2           1
3           1
4           2

Saturated      = 10
Full Factorial = 96
Some Reasonable                      Cannot Be
Design Sizes       Violations     Divided By

48 *              0
96 *              0
24                1     16
72                1     16
12                3      8 16
36                3      8 16
60                3      8 16
84                3      8 16
16                4      3  6 12
32                4      3  6 12

* - 100% Efficient Design can be made with the MktEx Macro.
```

The first part tells us that we could create 3×2×4×4 = 96 different product profiles with these attributes. The second part tells us that efficient designs can be found with either 48 or 96 different profiles.

Next we want to create a linear design with the %mktex macro, which will generate candidate profiles for the survey. It is usually best to use the full factorial (in this case 96) which will allow all possible profiles to be candidates from which the algorithm will later choose when designing the survey.

```%mktex(3 2 4 4, n=96);
```

Before printing this design, we want to add three columns to the front of it labeled f1, f2, and f3 that are all 1’s. This will be used to tell the macro that we want three product alternatives per choice set (plus the no-choice option, which we will add manually later) and that any of the product profiles in the full factorial can be used for any alternative in any choice set. (Kuhfeld provides more detail on this in the chair example).

```%mktlab(data=design, int=f1-f3)
proc print; run;
```

Part of the output is shown below:

```                          Obs    f1    f2    f3    x1    x2    x3    x4

1     1     1     1     1     1     1     1
2     1     1     1     1     1     1     2
3     1     1     1     1     1     1     3
4     1     1     1     1     1     1     4
5     1     1     1     1     1     2     1
6     1     1     1     1     1     2     2
7     1     1     1     1     1     2     3
8     1     1     1     1     1     2     4
9     1     1     1     1     1     3     1
10     1     1     1     1     1     3     2
11     1     1     1     1     1     3     3
12     1     1     1     1     1     3     4
13     1     1     1     1     1     4     1
14     1     1     1     1     1     4     2
15     1     1     1     1     1     4     3
16     1     1     1     1     1     4     4
17     1     1     1     1     2     1     1
18     1     1     1     1     2     1     2
```

This is simply a full factorial design in x1, x2, x3, and x4 with three columns of 1’s in front of it.

Next we ask the computer to organize these profiles into an efficient choice design. The efficiency of the design depends on the actual values for the beta coefficients; however, we do not know the actual values yet (that’s why were doing a survey). If we have a sense for what these beta values should be, we could input guesses. In the absence of this information, we would typically guess zero for all beta coefficients. As stated earlier, if the beta values turn out to be very different from the guess, the resulting design could be poor for estimating these. How much will misspecification of the beta coefficients affect results? This is still an open research question. However, these computer generated designs have been used with much success in many situations (including the dial-readout scale example), and they are sometimes the only practical option when the more conservative labeled design is too big for the survey. Here we will guess (beta=zero).

Also, we must choose the number of choice sets for our survey. We can ask the macro to generate an arbitrarily small survey size; however, the ability to accurately estimate beta parameters decreases as the number of questions decreases. It is suggested to use either a full factorial of 96/3 = 32 questions (which would be the same size as the linear labeled design in this case) or a fractional factorial, such as the ½ fraction of 48 profiles suggested in the output of %mktruns, producing 48/3 = 16 questions. Here we will use 16 questions to get a smaller survey (nsets=16).

Finally, the macro uses search techniques to find a good survey design, but it is not guaranteed to find one on the first try. The parameter maxiter tells the macro how many times to repeat the search (with random starting points each time) looking for the best survey design. It is a good idea to set this high enough to have some certainty that it has found a good design, but not so high that the runtime is unbearable. Here I have set (maxiter = 20). Kuhfeld’s chair example provides more detail about all of this.

```%choiceff(data=final, model=class(x1-x4), nsets=16, maxiter=20,   flags=f1-f3, beta=zero);
proc print; by set; id set; run;
```

The output is shown below

```                           Variable                               Standard
n      Name      Label    Variance    DF      Error

1      x11       x1 1      0.37500     1     0.61237
2      x12       x1 2      0.37500     1     0.61237
3      x21       x2 1      0.28236     1     0.53137
4      x31       x3 1      0.56442     1     0.75128
5      x32       x3 2      0.56471     1     0.75147
6      x33       x3 3      0.56334     1     0.75056
7      x41       x4 1      0.56392     1     0.75094
8      x42       x4 2      0.56527     1     0.75184
9      x43       x4 3      0.56551     1     0.75200
==
9
```

```Set    Design    Efficiency    Index      Prob      n     f1    f2    f3    x1    x2    x3    x4

1      15        2.52736       72     0.33333    673     1     1     1     3     1     2     4
15        2.52736       10     0.33333    674     1     1     1     1     1     3     2
15        2.52736       49     0.33333    675     1     1     1     2     2     1     1

2      15        2.52736       10     0.33333    676     1     1     1     1     1     3     2
15        2.52736       83     0.33333    677     1     1     1     3     2     1     3
15        2.52736       64     0.33333    678     1     1     1     2     2     4     4

3      15        2.52736       60     0.33333    679     1     1     1     2     2     3     4
15        2.52736       79     0.33333    680     1     1     1     3     1     4     3
15        2.52736       17     0.33333    681     1     1     1     1     2     1     1

4      15        2.52736       14     0.33333    682     1     1     1     1     1     4     2
15        2.52736       43     0.33333    683     1     1     1     2     1     3     3
15        2.52736       81     0.33333    684     1     1     1     3     2     1     1

5      15        2.52736       89     0.33333    685     1     1     1     3     2     3     1
15        2.52736       38     0.33333    686     1     1     1     2     1     2     2
15        2.52736       31     0.33333    687     1     1     1     1     2     4     3

6      15        2.52736       72     0.33333    688     1     1     1     3     1     2     4
15        2.52736       18     0.33333    689     1     1     1     1     2     1     2
15        2.52736       45     0.33333    690     1     1     1     2     1     4     1

7      15        2.52736       28     0.33333    691     1     1     1     1     2     3     4
15        2.52736       86     0.33333    692     1     1     1     3     2     2     2
15        2.52736       35     0.33333    693     1     1     1     2     1     1     3

8      15        2.52736        3     0.33333    694     1     1     1     1     1     1     3
15        2.52736       73     0.33333    695     1     1     1     3     1     3     1
15        2.52736       62     0.33333    696     1     1     1     2     2     4     2

9      15        2.52736       58     0.33333    697     1     1     1     2     2     3     2
15        2.52736        5     0.33333    698     1     1     1     1     1     2     1
15        2.52736       95     0.33333    699     1     1     1     3     2     4     3

10      15        2.52736       86     0.33333    700     1     1     1     3     2     2     2
15        2.52736       16     0.33333    701     1     1     1     1     1     4     4
15        2.52736       41     0.33333    702     1     1     1     2     1     3     1

11      15        2.52736       94     0.33333    703     1     1     1     3     2     4     2
15        2.52736       27     0.33333    704     1     1     1     1     2     3     3
15        2.52736       40     0.33333    705     1     1     1     2     1     2     4

12      15        2.52736       75     0.33333    706     1     1     1     3     1     3     3
15        2.52736       52     0.33333    707     1     1     1     2     2     1     4
15        2.52736       21     0.33333    708     1     1     1     1     2     2     1

13      15        2.52736       68     0.33333    709     1     1     1     3     1     1     4
15        2.52736       23     0.33333    710     1     1     1     1     2     2     3
15        2.52736       45     0.33333    711     1     1     1     2     1     4     1

14      15        2.52736       66     0.33333    712     1     1     1     3     1     1     2
15        2.52736       55     0.33333    713     1     1     1     2     2     2     3
15        2.52736       32     0.33333    714     1     1     1     1     2     4     4

15      15        2.52736       92     0.33333    715     1     1     1     3     2     3     4
15        2.52736       34     0.33333    716     1     1     1     2     1     1     2
15        2.52736        5     0.33333    717     1     1     1     1     1     2     1

16      15        2.52736       55     0.33333    718     1     1     1     2     2     2     3
15        2.52736       77     0.33333    719     1     1     1     3     1     4     1
15        2.52736        4     0.33333    720     1     1     1     1     1     1     4
```

The first part of the output tells us about the variance for estimating each of the beta terms. If the algorithm finds the optimal survey design, the variance terms will be mostly equal for characteristics that have the same number of levels. Also you will want to verify that DF, the degrees of freedom, is 1 for each. Kuhfeld provides more detail.

The second part of the output is the choice design. The “by set; id set;” in the proc print command told it to organize the data this way for better readability. Each choice set has three profiles. For each profile there are levels listed for x1, x2, x3, and x4, which refer to the attributes in the order that we entered them: z1, z2, z3, and p. So, choice set #1 looks like this:

Suppose you are in the market for a <product class> but to not necessarily need to buy one and the following were your only alternatives. Which would you choose? (Please circle one)

Alternative 1 Alternative 2 Alternative 3 Alternative 4
• Has a <z1> of <level 3>
• Has a <z2> of <level 1>
• Has a <z3> of <level 2>
• Has a price of <level 4>
• Has a <z1> of <level 1>
• Has a <z2> of <level 1>
• Has a <z3> of <level 3>
• Has a price of <level 2>
• Has a <z1> of <level 2>
• Has a <z2> of <level 2>
• Has a <z3> of <level 1>
• Has a price of <level 1>

I would not choose any of these

# Full SAS Code:

```%include 'H:\sasmacros\mktallo.sas';
%include 'H:\sasmacros\mktbal.sas';
%include 'H:\sasmacros\mktblock.sas';
%include 'H:\sasmacros\mktdes.sas';
%include 'H:\sasmacros\mktdups.sas';
%include 'H:\sasmacros\mkteval.sas';
%include 'H:\sasmacros\mktex.sas';
%include 'H:\sasmacros\mktkey.sas';
%include 'H:\sasmacros\mktlab.sas';
%include 'H:\sasmacros\mktmerge.sas';
%include 'H:\sasmacros\mktorth.sas';
%include 'H:\sasmacros\mktroll.sas';
%include 'H:\sasmacros\mktruns.sas';
%include 'H:\sasmacros\choiceff.sas';
%include 'H:\sasmacros\phchoice.sas';
%include 'H:\sasmacros\plotit.sas';

/* LABELED CHOICE BASED SURVEY DESIGN */
/* ================================== */
%mktruns(4 2 4 4 2 4 4 2 4);
/* Find some reasonable design sizes */
%mktex(4 2 4 4 2 4 4 2 4, n=32);
/* Generate experimental design, given design size n */
proc print; run;
/* Output experimental design to window */
%mkteval;
/* Calculate evaluation of the experimental design */
proc print; run;
/* Output experimental design evaluation to window */

/* GENERIC CHOICE BASED SURVEY DESIGN */
/* ================================== */
%mktruns(3 2 4 4);
/* Find some reasonable design sizes */
%mktex(3 2 4 4, n=96);
/* Generate the full factorial design */
%mktlab(data=design, int=f1-f3)
/* Add three rows of 1's in front of this design labeled f1, f2, f3 */
/* This specifies three product profiles per choice set */
proc print; run;
/* Output to the window */
%choiceff(data=final, model=class(x1-x4), nsets=32, maxiter=20, flags=f1-f3, beta=zero);
/* Search for an efficient choice based design */
proc print; by set; id set; run;
/* Output best found choice design to window */
```