Tutorial for creating a choice based conjoint design in SAS
From DDL Wiki
The intent of this tutorial is not to make you an expert in SAS, but rather simply to use some of the basic functions to generate a main effects experimental design for a conjoint marketing survey. A fabulous reference book has been written by Warren Kuhfeld and is available for free at http://support.sas.com/techsup/tnote/tnote_stat.html. Kuhfeld covers theory of discrete choice modeling, experimental design, and conjoint analysis, and he works out several examples in SAS starting on page 114 that will be good references for you in generating your own survey. Here I describe only the bare minimum of what is necessary to generate the experimental design for your project. If you are interested in utilizing more of the features of SAS, Kuhfeld’s book is the best resource.
This tutorial follows an example where three product characteristics and price are used to define the product for the marketing model. The first characteristic z1 has been discretized into three levels {1,2,3}, the second z2 has been discretized into two levels {1,2}, the third z3 into four levels {1,2,3,4} and price has been discretized into four levels {1,2,3,4}, so that:
- z1 = {1,2,3}
- z2 = {1,2}
- z3 = {1,2,3,4}
- p = {1,2,3,4}
With these levels, set by the team, there are 3×2×4×4 = 96 product profiles. Asking about preferences among all possible combinations of product profiles would lead to too many questions for a survey, but we can generate an efficient choice-based survey design using SAS.
We will cover two different methods for generating choice-based design. The first method is for labeled products, and the second is for generic products. Kuhfeld reviews the differences on p322.
- Labeled choice designs are intended for situations where you want to treat one of your product characteristics as special and get more information about that one and about how the others interact with it. The Fabric Softener example in Kuhfeld’s book (p126) uses this approach. The attributes are brand and price. Brand is chosen as the label so that all four brands appear in every choice set, and a linear design is generated from this. If you are not concerned about specific labels and just want to know measure people’s preferences for all of our product characteristics and price, it is still possible to use this approach by picking one of the product characteristics as the ‘label’, and the resulting survey design will be conservative (i.e., it will ask more questions than needed, and the extra data will be focused on gaining information about interactions with the label, which you are likely not to use).
- Generic choice designs are intended for situations where we just want to gain information about consumer preferences for our product attributes. The chair design example in Kuhfeld’s book (p314) uses this approach. Generic choice designs are generated by computer search to find the most efficient design, given an estimate of the beta values at the solution. They will generally result in smaller surveys than the labeled approach, but there is some uncertainty because the most efficient survey design is generated based on a guess of what the answer will be. If the guess is way off, the survey design could be poor.
Kuhfeld states that “the linear model (labeled) approach is very conservative and safe in that it should let you specify a very general model and still produce estimable parameters. The cost is you may be using many more choice sets than you need, particularly for nonbranded generic attributes. If you really have some information about your parameters, you should use them to produce a smaller and better design. However, if you have little or no information about parameters and if you anticipate specifying very general models … then you probably want to use the linear design approach.” It is up to you which approach you wish to use. Read through the examples, try both approaches, and decide for yourself.
- First, you will need to download the newest SAS marketing application macros from the website listed above. Unzip them into a folder. For example, on a PC you can place them in your personal H:/ drive as “H:/sasmacros/<filename>.sas”.
- Next, open up SAS. A picture of the interface is shown here. The window on the bottom is where you write your code. This code is executed by pressing the icon at the top that looks like a running person. Upon executing your code, any output will be saved to an output file. The output files are listed on the left panel, and clicking on one of the output files opens an output window where you can scroll through the pages of output text.
- In the lower window you will first want to include the macros that you have downloaded. Type the following text, replacing “H:\sasmacros” with the directory to which you have unzipped the macros.
%include 'H:\sasmacros\mktallo.sas'; %include 'H:\sasmacros\mktbal.sas'; %include 'H:\sasmacros\mktblock.sas'; %include 'H:\sasmacros\mktdes.sas'; %include 'H:\sasmacros\mktdups.sas'; %include 'H:\sasmacros\mkteval.sas'; %include 'H:\sasmacros\mktex.sas'; %include 'H:\sasmacros\mktkey.sas'; %include 'H:\sasmacros\mktlab.sas'; %include 'H:\sasmacros\mktmerge.sas'; %include 'H:\sasmacros\mktorth.sas'; %include 'H:\sasmacros\mktroll.sas'; %include 'H:\sasmacros\mktruns.sas'; %include 'H:\sasmacros\choiceff.sas'; %include 'H:\sasmacros\phchoice.sas'; %include 'H:\sasmacros\plotit.sas';
The Linear Approach for Labeled Choice Designs
Generating a labeled choice-based design requires choosing one attribute whose levels will appear in every question: called here the “label attribute”. In principal you can choose any attribute, but you will do best to pick an attribute with 3 or 4 levels and avoid choosing an attribute where some survey respondents are likely to respond exclusively to that attribute while ignoring other attributes (for example, avoid choosing price). In our example, we choose z1 as the column attribute. The remaining attributes z2, z3, and p have 2, 4, and 4 levels respectively. We will tell this information to the %mktruns macro by writing 2 4 4 three times in a row, signifying that for each of the three levels of z1 there are 2 levels of z2, 4 levels of z3, and 4 levels of p. The %mktruns command will tell you how many questions you need on your survey, given the number of attributes and the number of levels for each attribute. We write
%mktruns(2 4 4 2 4 4 2 4 4);
and press the execute button. Part of the output is listed below
Design Summary Number of Levels Frequency 2 3 4 6 Saturated = 22 Full Factorial = 32,768 Some Reasonable Cannot Be Design Sizes Violations Divided By 32 * 0 48 * 0 64 * 0 24 15 16 40 15 16 56 15 16 28 33 8 16 36 33 8 16 44 33 8 16 52 33 8 16 * - 100% Efficient Design can be made with the MktEx Macro.
The first part of the output tells us that if we were to ask all possible combinations of questions, we would be asking 32,768 survey questions. In order to generate a manageable survey size, we will need to generate an efficient fractional factorial design that can estimate main effects. The second part of the output suggests some reasonable sizes for which fractional factorial designs exist. We must choose the number of questions for our survey. In this case the output of %mktruns tells us that good main effects designs exist for designs of size 32, 48, and 64. Larger designs contain more data; however, we need to keep our surveys manageable for respondents. The difficulty of the survey task depends not only on the number of questions, but also on the number of alternatives per question and the number of attributes. Use your judgment to pick a reasonable-sized survey. Given this output, we choose that our survey will have 32 questions.
Next, we ask SAS to use the %mktex macro to generate an efficient fractional factorial survey design with 32 questions. This macro acts like an optimizer internally (like Excel Solver) and could be sensitive to the starting point, so if you have trouble getting good results, just run it again – SAS picks a random starting point each time. We need to tell the %mktex macro the same info we told %mktruns, plus tell it how many questions our survey should have.
%mktex(2 4 4 2 4 4 2 4 4, n=32); proc print; run;
Part of the output is listed below:
Average Prediction Design Standard Number D-Efficiency A-Efficiency G-Efficiency Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 100.0000 100.0000 100.0000 0.8292 Obs x1 x2 x3 x4 x5 x6 x7 x8 x9 1 1 1 1 1 1 1 1 1 1 2 1 1 2 2 2 3 1 3 4 3 1 1 3 1 2 1 2 4 2 4 1 1 4 2 1 3 2 2 3 5 1 2 1 2 1 4 1 4 3 6 1 2 2 1 2 2 1 2 2 7 1 2 3 2 2 4 2 1 4 8 1 2 4 1 1 2 2 3 1 9 1 3 1 1 4 3 2 2 4 10 1 3 2 2 3 1 2 4 1 11 1 3 3 1 3 3 1 3 3 12 1 3 4 2 4 1 1 1 2 13 1 4 1 2 4 2 2 3 2 14 1 4 2 1 3 4 2 1 3 15 1 4 3 2 3 2 1 2 1 16 1 4 4 1 4 4 1 4 4 17 2 1 1 2 3 4 2 2 2 18 2 1 2 1 4 2 2 4 3 19 2 1 3 2 4 4 1 3 1 20 2 1 4 1 3 2 1 1 4 21 2 2 1 1 3 1 2 3 4 22 2 2 2 2 4 3 2 1 1 23 2 2 3 1 4 1 1 2 3 24 2 2 4 2 3 3 1 4 2 25 2 3 1 2 2 2 1 1 3 26 2 3 2 1 1 4 1 3 2 27 2 3 3 2 1 2 2 4 4 28 2 3 4 1 2 4 2 2 1 29 2 4 1 1 2 3 1 4 1 30 2 4 2 2 1 1 1 2 4 31 2 4 3 1 1 3 2 1 2 32 2 4 4 2 2 1 2 3 3
The first part of the output tells us efficiency metrics of the design. Kuhfeld’s book contains more info about what these mean. If you did not get numbers very close to 100%, check that you used a number recommended by %mktruns when you called %mktex. If you did, try rerunning it. The algorithm may have just gotten stuck. If you use a number recommended by %mktruns, you should get an efficient design from %mktex.
The second part of the output is your experimental design. The 32 questions (32 choice sets) are listed as columns, and the rows have the following meaning:
- x1: level for z2 for option #1 in the choice set (z1 at level 1)
- x2: level for z3 for option #1 in the choice set (z1 at level 1)
- x3: level for p for option #1 in the choice set (z1 at level 1)
- x4: level for z2 for option #2 in the choice set (z1 at level 2)
- x5: level for z3 for option #2 in the choice set (z1 at level 2)
- x6: level for p for option #2 in the choice set (z1 at level 2)
- x7: level for z2 for option #3 in the choice set (z1 at level 3)
- x8: level for z3 for option #3 in the choice set (z1 at level 3)
- x9: level for p for option #3 in the choice set (z1 at level 3)
The design provided by %mktex should be checked to make sure major biases do not exist. The following command outputs detailed information about the experimental design. Kuhfeld explains these starting with p43. Basically, you want to check that the canonical correlation matrix is diagonal and use the rest of the output to check for orthogonality and balance. If it looks good, your design is good. If not, you may need to rerun it or even, if all else fails, choose a different number of questions or number of levels on one of the attributes.
%mkteval; proc print; run;
Once you’ve checked that the results given by %mktex are good, and you will need to translate the encoded numbers in the survey design into actual attributes with actual values for your survey. Kuhfeld offers some macros that help to automate this process, and you are welcome to read up on them (the examples use them, and they are easy to follow); however, it is possible to do this yourself in other software, for example using a “find and replace” option.
Also, the design has been organized so that you can clearly see the patterns. You do NOT want these patterns to be obvious in your survey. You should randomize the rows of the survey (ask the questions out of order). You should also randomize the order of alternatives in each choice set. In the example x1, x2, and x3 refer to alternative 1, with z1 at level 1; x4, x5, and x6 refer to alternative 2 with z1 and level 2; and x7, x8, and x9 refer to alternative 3 with z1 at level 3. In most designs, you do not want the order of alternative 1,2, and 3 with levels z1 = {1,2,3} respectively to be in the same order for every question: respondents will see the pattern and it could affect results. So, mix up the order of alternatives in each question. SAS has macros to help you do this, or you can use excel or other software.
Finally, in your final survey you will want to add an extra option to each question called the “no choice option”. This option has no attributes, but may instead be labeled “I would not choose any of these”. This allows the respondent to opt out of any of the product alternatives in a choice set if they are all undesired, and you will need the resulting estimates based on this choice for calculating demand later.
A question from the final survey might look like this, starting (reordered) with question #17 from the SAS design and reordering the three alternatives as 1,2,3 as 1,3,2:
Suppose you are in the market for a <product class> but to not necessarily need to buy one and the following were your only alternatives. Which would you choose? (Please circle one)
Alternative 1 | Alternative 2 | Alternative 3 | Alternative 4 |
---|---|---|---|
|
|
|
I would not choose any of these |
Of course, you can make your survey look nicer, making the attributes and levels easy to read. You can even use pictures to describe the attribute levels if appropriate.
The Computer Search Approach for Generic (Unlabeled) Choice Designs
When generating a generic, unlabeled design, we use the %mktruns macro directly on the full set of attributes. The attributes z1, z2, z3, and p have 3, 2, 4, and 4 levels respectively, so we write:
%mktruns(3 2 4 4);
Number of Levels Frequency 2 1 3 1 4 2 Saturated = 10 Full Factorial = 96 Some Reasonable Cannot Be Design Sizes Violations Divided By 48 * 0 96 * 0 24 1 16 72 1 16 12 3 8 16 36 3 8 16 60 3 8 16 84 3 8 16 16 4 3 6 12 32 4 3 6 12 * - 100% Efficient Design can be made with the MktEx Macro.
The first part tells us that we could create 3×2×4×4 = 96 different product profiles with these attributes. The second part tells us that efficient designs can be found with either 48 or 96 different profiles.
Next we want to create a linear design with the %mktex macro, which will generate candidate profiles for the survey. In general, we can always use the full factorial (in this case 96) which will allow the computer to choose among all possible profiles.
%mktex(3 2 4 4, n=96);
Before printing this design, we want to add three columns to the front of it labeled f1, f2, and f3 that are all 1’s. This will be used to tell the macro that we want three product alternatives per choice set (plus the no-choice option, which we will add manually later) and that any of the product profiles in the full factorial can be used for any alternative in any choice set. (Kuhfeld provides more detail on this in the chair example).
%mktlab(data=design, int=f1-f3) proc print; run;
Part of the output is shown below:
Obs f1 f2 f3 x1 x2 x3 x4 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 3 1 1 1 1 1 1 3 4 1 1 1 1 1 1 4 5 1 1 1 1 1 2 1 6 1 1 1 1 1 2 2 7 1 1 1 1 1 2 3 8 1 1 1 1 1 2 4 9 1 1 1 1 1 3 1 10 1 1 1 1 1 3 2 11 1 1 1 1 1 3 3 12 1 1 1 1 1 3 4 13 1 1 1 1 1 4 1 14 1 1 1 1 1 4 2 15 1 1 1 1 1 4 3 16 1 1 1 1 1 4 4 17 1 1 1 1 2 1 1 18 1 1 1 1 2 1 2
This is simply a full factorial design in x1, x2, x3, and x4 with three columns of 1’s in front of it.
Next we ask the computer to organize these profiles into an efficient choice design. The efficiency of the design depends on the actual values for the beta coefficients; however, we do not know the actual values yet (that’s why were doing a survey). If we have a sense for what these beta values should be, we could input guesses. In the absence of this information, we would typically guess zero for all beta coefficients. As stated earlier, if the beta values turn out to be very different from the guess, the resulting design could be poor for estimating these. How much will misspecification of the beta coefficients affect results? This is still an open research question. However, these computer generated designs have been used with much success in many situations (including the dial-readout scale example), and they are sometimes the only practical option when the more conservative labeled design is too big for the survey. Here we will guess (beta=zero).
Also, we must choose the number of choice sets for our survey. We can ask the macro to generate an arbitrarily small survey size; however, the ability to accurately estimate beta parameters decreases as the number of questions decreases. It is suggested to use either a full factorial of 96/3 = 32 questions (which would be the same size as the linear labeled design in this case) or a fractional factorial, such as the ½ fraction of 48 profiles suggested in the output of %mktruns, producing 48/3 = 16 questions. Here we will use 16 questions to get a smaller survey (nsets=16).
Finally, the macro uses search techniques to find a good survey design, but it is not guaranteed to find one on the first try. The parameter maxiter tells the macro how many times to repeat the search (with random starting points each time) looking for the best survey design. It is a good idea to set this high enough to have some certainty that it has found a good design, but not so high that the runtime is unbearable. Here I have set (maxiter = 20). Kuhfeld’s chair example provides more detail about all of this.
%choiceff(data=final, model=class(x1-x4), nsets=16, maxiter=20, flags=f1-f3, beta=zero); proc print; by set; id set; run;
The output is shown below
Variable Standard n Name Label Variance DF Error 1 x11 x1 1 0.37500 1 0.61237 2 x12 x1 2 0.37500 1 0.61237 3 x21 x2 1 0.28236 1 0.53137 4 x31 x3 1 0.56442 1 0.75128 5 x32 x3 2 0.56471 1 0.75147 6 x33 x3 3 0.56334 1 0.75056 7 x41 x4 1 0.56392 1 0.75094 8 x42 x4 2 0.56527 1 0.75184 9 x43 x4 3 0.56551 1 0.75200 == 9
Set Design Efficiency Index Prob n f1 f2 f3 x1 x2 x3 x4 1 15 2.52736 72 0.33333 673 1 1 1 3 1 2 4 15 2.52736 10 0.33333 674 1 1 1 1 1 3 2 15 2.52736 49 0.33333 675 1 1 1 2 2 1 1 2 15 2.52736 10 0.33333 676 1 1 1 1 1 3 2 15 2.52736 83 0.33333 677 1 1 1 3 2 1 3 15 2.52736 64 0.33333 678 1 1 1 2 2 4 4 3 15 2.52736 60 0.33333 679 1 1 1 2 2 3 4 15 2.52736 79 0.33333 680 1 1 1 3 1 4 3 15 2.52736 17 0.33333 681 1 1 1 1 2 1 1 4 15 2.52736 14 0.33333 682 1 1 1 1 1 4 2 15 2.52736 43 0.33333 683 1 1 1 2 1 3 3 15 2.52736 81 0.33333 684 1 1 1 3 2 1 1 5 15 2.52736 89 0.33333 685 1 1 1 3 2 3 1 15 2.52736 38 0.33333 686 1 1 1 2 1 2 2 15 2.52736 31 0.33333 687 1 1 1 1 2 4 3 6 15 2.52736 72 0.33333 688 1 1 1 3 1 2 4 15 2.52736 18 0.33333 689 1 1 1 1 2 1 2 15 2.52736 45 0.33333 690 1 1 1 2 1 4 1 7 15 2.52736 28 0.33333 691 1 1 1 1 2 3 4 15 2.52736 86 0.33333 692 1 1 1 3 2 2 2 15 2.52736 35 0.33333 693 1 1 1 2 1 1 3 8 15 2.52736 3 0.33333 694 1 1 1 1 1 1 3 15 2.52736 73 0.33333 695 1 1 1 3 1 3 1 15 2.52736 62 0.33333 696 1 1 1 2 2 4 2 9 15 2.52736 58 0.33333 697 1 1 1 2 2 3 2 15 2.52736 5 0.33333 698 1 1 1 1 1 2 1 15 2.52736 95 0.33333 699 1 1 1 3 2 4 3 10 15 2.52736 86 0.33333 700 1 1 1 3 2 2 2 15 2.52736 16 0.33333 701 1 1 1 1 1 4 4 15 2.52736 41 0.33333 702 1 1 1 2 1 3 1 11 15 2.52736 94 0.33333 703 1 1 1 3 2 4 2 15 2.52736 27 0.33333 704 1 1 1 1 2 3 3 15 2.52736 40 0.33333 705 1 1 1 2 1 2 4 12 15 2.52736 75 0.33333 706 1 1 1 3 1 3 3 15 2.52736 52 0.33333 707 1 1 1 2 2 1 4 15 2.52736 21 0.33333 708 1 1 1 1 2 2 1 13 15 2.52736 68 0.33333 709 1 1 1 3 1 1 4 15 2.52736 23 0.33333 710 1 1 1 1 2 2 3 15 2.52736 45 0.33333 711 1 1 1 2 1 4 1 14 15 2.52736 66 0.33333 712 1 1 1 3 1 1 2 15 2.52736 55 0.33333 713 1 1 1 2 2 2 3 15 2.52736 32 0.33333 714 1 1 1 1 2 4 4 15 15 2.52736 92 0.33333 715 1 1 1 3 2 3 4 15 2.52736 34 0.33333 716 1 1 1 2 1 1 2 15 2.52736 5 0.33333 717 1 1 1 1 1 2 1 16 15 2.52736 55 0.33333 718 1 1 1 2 2 2 3 15 2.52736 77 0.33333 719 1 1 1 3 1 4 1 15 2.52736 4 0.33333 720 1 1 1 1 1 1 4
The first part of the output tells us about the variance for estimating each of the beta terms. If the algorithm finds the optimal survey design, the variance terms will be mostly equal for characteristics that have the same number of levels. Also you will want to verify that DF, the degrees of freedom, is 1 for each. Kuhfeld provides more detail.
The second part of the output is the choice design. The “by set; id set;” in the proc print command told it to organize the data this way for better readability. Each choice set has three profiles. For each profile there are levels listed for x1, x2, x3, and x4, which refer to the attributes in the order that we entered them: z1, z2, z3, and p. So, choice set #1 looks like this:
Suppose you are in the market for a <product class> but to not necessarily need to buy one and the following were your only alternatives. Which would you choose? (Please circle one)
Alternative 1 | Alternative 2 | Alternative 3 | Alternative 4 |
---|---|---|---|
|
|
|
I would not choose any of these |
Full SAS Code:
%include 'H:\sasmacros\mktallo.sas'; %include 'H:\sasmacros\mktbal.sas'; %include 'H:\sasmacros\mktblock.sas'; %include 'H:\sasmacros\mktdes.sas'; %include 'H:\sasmacros\mktdups.sas'; %include 'H:\sasmacros\mkteval.sas'; %include 'H:\sasmacros\mktex.sas'; %include 'H:\sasmacros\mktkey.sas'; %include 'H:\sasmacros\mktlab.sas'; %include 'H:\sasmacros\mktmerge.sas'; %include 'H:\sasmacros\mktorth.sas'; %include 'H:\sasmacros\mktroll.sas'; %include 'H:\sasmacros\mktruns.sas'; %include 'H:\sasmacros\choiceff.sas'; %include 'H:\sasmacros\phchoice.sas'; %include 'H:\sasmacros\plotit.sas'; /* LABELED CHOICE BASED SURVEY DESIGN */ /* ================================== */ %mktruns(4 2 4 4 2 4 4 2 4); /* Find some reasonable design sizes */ %mktex(4 2 4 4 2 4 4 2 4, n=32); /* Generate experimental design, given design size n */ proc print; run; /* Output experimental design to window */ %mkteval; /* Calculate evaluation of the experimental design */ proc print; run; /* Output experimental design evaluation to window */ /* GENERIC CHOICE BASED SURVEY DESIGN */ /* ================================== */ %mktruns(3 2 4 4); /* Find some reasonable design sizes */ %mktex(3 2 4 4, n=96); /* Generate the full factorial design */ %mktlab(data=design, int=f1-f3) /* Add three rows of 1's in front of this design labeled f1, f2, f3 */ /* This specifies three product profiles per choice set */ proc print; run; /* Output to the window */ %choiceff(data=final, model=class(x1-x4), nsets=32, maxiter=20, flags=f1-f3, beta=zero); /* Search for an efficient choice based design */ proc print; by set; id set; run; /* Output best found choice design to window */