Jim Lyttle

THE ETHICS CHALLENGE

The mini-cases used in this experiment were based on The Ethics Challenge, a board game used by Lockheed Martin to persuade its employees to consult company officials on ethical questions.

What do we mean by ethics? Speaking very broadly, ethics can be characterized as the study of morals. Morals are answers to the question What is right and wrong, or good and bad? Ethics is the study of the answers people provide and the moral reasoning that led to them . Business ethics can be characterized as the study of what is considered right and wrong, or good and bad, in the context of conducting business.

Ethicists disagree about the extent to which business ethics is different from any other type of applied ethics. Some take the business context to be nothing more than the site of the inquiry while some recognize that business presents its own unique problems and others consider business transactions and/or decisions to be central to an ethics of business . It is the position of this author that the conduct of business is sufficiently different from other parts of life to justify a separate discipline focusing on business ethics.

Business ethics training of workers has met with mixed success . One factor that sometimes limits the effectiveness of such training is worker cynicism . Workers can be skeptical about moral training sponsored by their employers, and board games are thought to break through this cynicism by making the exercise fun and having managers take themselves less seriously . In 1987, Citicorp began training its workers in ethics with a board game called the Work Ethic. Thirty to forty thousand Citicorp employees have played the game now in at least 54 countries. The game has been translated into Spanish, Portuguese, French, German, Flemish and Japanese. The next year, in 1988, George Sammet of Martin Marietta began developing his own version which he called Gray Matters.

The original Gray Matters game presented 55 mini-cases and invited participants to select the best of four answers. Each answer carried a score that ranged from -10 to +10. At the end of the game, the scores were tallied and humorous rewards (such as Allowed to stay in the room during really important meetings) and punishments (such as Must start paying for issues of the company newsletter) were given out. Here are the objectives that were listed for that game: (1) To make employees aware of various ethical challenges that can arise in their day-to-day job responsibilities; (2) to stimulate employees' imaginations regarding the ethical consequences of their business decision [sic] and actions; (3) to inform employees regarding the company's values and standards of business ethics and conduct; (4) to strengthen employees' skills in applying company standards to hypothetical situations; (5) to instruct employees regarding the proper procedure(s) for dealing with ethical concerns; and (6) to involve employees in discussion rather than have them listen passively to a speech or read an abstract set of ethical principles.

This game would undergo a major face lift under the leadership of a new leader who understood both ethics and humor very well.

Norman R. Augustine

Norm Augustine joined Martin Marietta in 1977 after serving as Under Secretary to the Army. By 1987, he had risen to the position of Chief Executive Officer. When the company merged with Lockheed in March of 1995, he served as President of Martin Marietta and then in 1997 took over and Chairman and Chief Executive Officer of the combined corporation with nearly 190,000 employees and revenue of close to 30 billion dollars. Lockheed Martin of Bethesda, Maryland is a highly diversified global enterprise principally engaged in the research, design, development, manufacture and integration of advanced-technology systems, products and services.

Long before Augustine came onto the scene, the previous Lockheed Company had been fined 24.8 million dollars under the Foreign Corrupt Practices Act. It had been planning to pay foreign legislators millions of dollars for help in selling three C-130 transport planes worth 79 million dollars. Lockheed argued that it was protecting the jobs of its workers by doing whatever was necessary to land the contract, but the provisions of the Act are clear. This first major fine generated wide press coverage, and Lockheed's image was tarnished.

Norm Augustine may have been the ideal leader to upgrade Lockheed Martin's reputation. He was a member of the Board of Directors of the Ethics Resource Center in Washington and is now a member of the Benjamin Franklin Society. Augustine brought with him a healthy and humble sense of humor. Of Dilbert cartoons, Augustine says I'm a big fan . . . I can give a lecture on the philosophy of ethics and two days later there wouldn't be a soul who remembered what I said; with Dilbert everyone talks about it.

Under Augustine's leadership the Gray Matters game was completely redesigned by a task force of Ethics Officers from the various operating companies. The consulting firm of cohen/gebler Associates, which owns the rights to Dilbert, was engaged to combine the Gray Matters game with the irreverent workplace humor of Scott Adams. Adams had approval over the material and rewrote 35 of the 50 Dogbert answers to make them truer to the character. The cohen/gebler firm is now known as Working Values and features such related products as the P&G Challenge for Procter & Gamble.

OPERATION OF THE GAME

Here is how the Ethics Challenge operates. A group of about 30 workers gathers in a conference room for an hour. Six groups of about five members each are formed. The leader (the direct supervisor of these employees) draws a case card to randomly select one of fifty brief mini-cases. Team members turn to that case in their team guide and spend a few minutes debating the best response in their small group. Each mini-case is followed by four possible responses lettered A to D. There is also a fifth answer, which is a joke response. It is a part of the playing materials that are designed to inject a spirit of fun . . . help energize your audience and enrich the experience for everyone.

In many ways, The Ethics Challenge is similar to the popular card game called Scruples. Scruples has no board and no pieces to move, but instead uses many cards, each of which describes a situation and asks a question. For example one card says You owe the bank $4,000 for a student loan. You've moved and the bank hasn't traced you. Do you repay the loan? Another card says Your mate has a habit of impulse buying. A catalogue arrives from a very expensive store. Do you quietly dispose of it? The stated (score-earning) goal of the game Scruples is to correctly predict the answer (Yes, No, or Depends) of another player. There is no need to enter into discussion of the correct answer, although people usually do. In almost all of the questions the player is asked to decide whether to make an exception to an accepted social rule (paying your debts in the first example and the privacy of personal mail in the second one).

However, The Ethics Challenge is different in several ways. Mini-case descriptions are larger (although sparse by business school standards). For example Case File Number 17 says You work in a purchasing department and have been asked to select a vendor for an upcoming purchase. One of the competing companies is owned by your manager's spouse. Your manager told you that she wants you to make the decision all on your own, and to take care not to give any extra consideration to her husband's bid. In your judgement, the husband's bid has the best value. How are you going to handle this?

This situation is very specifically focused on the business world. In fact, it is quite plausible that the employees being trained might confront a similar situation in their own careers. The trainees are not asked to predict what someone else will answer, but to decide which is the correct choice according to the Lockheed Martin code of conduct. Four multiple choice answers are offered:

A. Select the husband's bid and make the purchase.
B. Talk to the Legal Department.
C. Tell your manager you're uncomfortable making his decision without first discussing it with the Ethics Office.
D. Select the second best bid and make the purchase.

As with most of the cases, a joke answer is added at the end: Try to break up the marriage!

The joke response is clearly identified as such, so that players would not be embarrassed by selecting that response by mistake. It is at the end, separated from the real answers, and has a drawing of Dogbert instead of a letter. Some of the joke responses, like the one above, suggest outrageous ways of resolving the dilemma. Others jokingly recommend unethical behavior. For example, after finding out that the boss may have broken a rule, Dogbert suggests This would be a good time to ask for a raise. In all cases, the joke responses are directly related to the topic of discussion. Also, in all cases, the joke answers represent facetious counter-arguments that are easy to refute (a popular tool for inoculating perduadees against future counter-persuasion).

In the case of the purchasing department above the Leaders's Guide reveals that C is the best answer (five points) closely followed by B (four points) and neither A nor D are given any points. The first answer A is unacceptable because it creates a conflict of interest and the last answer D is unacceptable because it fails to optimize the business decision . There are no negative scores in this game. Scores are used to move game pieces around the board, enter rooms, draw lucky cards, and other typical board game activities. Among other things, these activities serve as a break between heavy ethical discussions.

In playing this game for one hour, employees are randomly exposed to five or six of the fifty scenarios. This training is conducted once a year with each of the more than 160,000 remaining Lockheed Martin employees starting from the CEO downward. The goal is not to provide instruction (there is separate training program, on CD-ROM, concerning the company's code of conduct), but to persuade the workers to follow the advice if in doubt, ASK!

RESEARCH DESIGN

In this study the dependent variable was persuasion, made up of behavioral intention, attitude change, and learning. The independent variable was humor, made up of cartoon drawings, ironic wisecracks, or both. Proposed moderators included age, gender, experience, years outside of North America, sense of humor, liking for Dilbert and socially desirable responses.

FIGURE 7 The Research Design
MODERATORS (age, gender, experience, years outside North America sense of humor, liking for Dilbert, socially desirable responses) \| \| V
INDEPENDENT VARIABLE Humor (none, cartoons wisecracks, both)	> > > > > > > > > > > > > > > > > > > > > > > > > > > >	DEPENDENT VARIABLE Persuasion (behavioral intention attitude, learning)

The proposed intervening variables have been characterized as moderations instead of mediators. While they are all presumed to have some effect on the relationship between the independent variable and the dependent variable, and while they are all expected to exercise an indirect effect on the dependent variable, they are not expected to be affected by the independent variable . The variable sense of humor might seem like an exception. One might think that the Dilbert comics would change the mood of participants and increase their sense of humor, but this misunderstands the nature of the term sense of humor, which refers to a disposition rather than a mood. The variable liking for Dilbert might seem to be an exception, because it may be affected by the participants' direct experience of Dilbert cartoons during the experiment. This is a more valid objection, but it turns out that there are no significant correlations between the humor treatment and any of these variables. In any event, because the sense of humor tests and the questions about Dilbert both precede the experimental administration, neither of them is a potential mediator in this experiment.

Since the research was meant to test the proposition that humor caused increased persuasion, it was a top priority to assure that legitimate causal inferences could be drawn. Accordingly the prime concern in this design was internal validity. A true experiment was designed using pre-post-tests and a control group .

This design was selected for a variety of reasons. Research in the area of humor has been characterized by atheoretical approaches and naive experimentation . Accordingly, it was a priority of this project to adopt well-established methods and procedures, and only true experimental designs were considered.

Furthermore, the specific research question concerned causal relationships that have long been in dispute. Internal validity (ability to attribute causality) was far more important to this project than the external validity (generalizability). This trade-off will be discussed further under limitations in the discussion chapter.

Researchers sometimes prefer the Solomon four-group design (Design 5), which assesses the impact of the pre-test. However, in the current research, all participants were university students for whom frequent testing is characteristic . Campbell argued that there was no reason to expect any systematic effect on the observations as a result of the existence of a pre-test in a school environment.

In cases where the Solomon design is not necessary, the post-test-only control group design (Design 6) is sometimes preferred. Once randomization has been achieved, the pre-test seems to be an unnecessary complication. However, the pre-post-test control group design allows statistical analyses such as paired t-tests that can detect the small differences expected in this study. This is especially true when the form of the pre-test and post-test are identical, as they are in this design.

The selected design offers maximum internal validity by ruling out several alternative explanations of observed differences. True randomization minimizes the effects of most selection errors. The effects of history, maturation, testing, instrumentation, and statistical regression should occur equally in all experimental and control groups and tend to cancel each other out. Differences in the dependent variables among treatment groups can confidently be attributed to the experimental manipulation. Also, the pre-post-test aspect of this design allows testing for differential characteristics in the case of mortality (if some participants fail to complete the whole experiment).

Because of its strong focus on internal validity, the pre-post-test control group design is not ideal for generalization. External validity is threatened by the interaction of the experimental effects with other factors. For example, the demonstrated results may only be valid for those who have undergone a pre-test, or for those very similar demographically. Perhaps students with the same demographics and at the same point in the same course of study, but located elsewhere geographically, might respond differently to the stimuli.

In tests of attitude in general, and persuasion specifically, the pre-test has to be carefully crafted to avoid sensitizing participants. The pre-test questions should be mixed in with other questions so they do not stand out as the main feature of the experiment. Also, the offered responses should not lead the participants' thinking in any particular direction.

Asking attitude questions may forewarn thoughtful participants that an attempt is going to be made to persuade them. This has been shown to generate the formation and rehearsal of counter-arguments (especially among university students, a group with a presumed high need for cognition) unless they are distracted with other activities . Thus it was important to keep participants busy with other questions that surrounded the pre-test questions. This was achieved by including them in a packet with several measures of the sense of humor, questions about Dilbert appreciation, and a social desirability scale.

External validity can be improved by making the experimental situation as similar as possible to the situation to which one wants to generalize. For example, the training instrument upon which this experiment was based is usually administered to workers during an hour off from work at their usual workplace, surrounded by their peers and their direct supervisor. This experiment was similar in many ways. It was conducted during an hour off from course work, in the familiar setting of the classroom, with their usual peers and the regular teacher in attendance.

Previous research has sometimes suffered by being held in sterile laboratories. Some researchers have achieved controls at the expense of interfering with the spontaneity that is required for humor to occur . TO avoid that, this experiment was conducted and the observations were taken in the field, in the sense that a regular classroom with familiar classmates was used instead of an artificial laboratory setting.

Each experiment was personally administered by the researcher. This was another decision that maximized internal validity at the expense of generalizability. It would have been desirable to have the local teachers administer the experiment because (a) they would be blind to the experimental manipulation, and (b) this would better simulate the actual use of the training instrument by immediate supervisors. However, a detailed written script and rigid controls would have been necessary to achieve consistency across different administrations of the experiment. This would have interfered with the spontaneity that is necessary for humor to operate. Also, the researcher was readily able to answer specific questions about the game and the mini-cases. Preparing local teachers to answer such questions consistently would have required extensive training.

The humor, which had been offered only tangentially in the Ethics Challenge, had become central in this experiment, serving as the treatment variable. The design was a simple factorial version of the classic Design 4 with the following form:

R	O1	X1	O5
R	O2	X2	O6
R	O3	X3	O7
R	O4	X4	O8

The R indicates that all groups were initially formed through random assignment. Then pre-test observations (O1, O2, O3, O4) were taken before the experimental treatments (X1, X2, X3, X4) were administered. Finally post-test observations (O5, O6, O7, O8) were taken.

The four experimental conditions consisted of different versions of the business ethics mini-cases from the Ethics Challenge game described above (versions with more or less humor). This was the independent variable. In the case of X4 participants were exposed to the original version of the mini-cases, adorned with cartoon drawings of Dilbert and an ironic wisecrack from Dogbert as the fifth multiple-choice response. In the case of X3 participants were exposed to mini-cases that had the cartoon drawings of Dilbert and Dogbert removed and replaced with generic graphics. In the case of X2 participants were exposed to a version of the mini-cases that had the ironic wisecracks from Dogbert removed and replaced with the innocuous comment I prefer not to answer. In the case of X1 participants were exposed to a version of the mini-cases that had both the wisecracks and the cartoon drawings removed.

PARTICIPANTS

Participants were drawn from more than 200 students registered in the first year of the undergraduate program in business at a large public Canadian university. They were all full-time students who had recently graduated from high school with a high grade point average.

The experiments were conducted in the fall, during the second month of the required two-semester introductory course entitled Business History and Ethics. This course was selected both for the relevance of the training material to its content (so that the experiment would not be a disruption) and for the fact that all students were required to take this course at the beginning of their curriculum. Classes in this course were small (strictly limited to 25) and were conducted in small seminar rooms. Students sat around a conference table with the teacher at the head. Accordingly, the presentation method in these classes consisted primarily of lecture, interactive discussion, and occasional use of an overhead projector. Students in eight sections of this course participated in the experiment.

Eight of the ten teachers of the course volunteered to make time available to the researcher. Participants were those who were in attendance on the day of the experiment. Students had been informed only that there would be a guest presenter, which was a common event in the course. There is no reason to think that some students stayed home that day, in order to avoid participation in this experiment. No particular extra-curricular events or religious holidays were occurring at that time, and there seemed to be no systematic relation among people who were absent at that time.

There were 173 students, but three did not sign their informed consent forms and their experimental packets were discarded. Thus there were 170 participants in the study.

The teachers who did not volunteer to participate reported that they had already pre-arranged their courses in some detail. They were planning to bring in the topic of business ethics later in the course. Thus there was some systematic variation in sampling, because those students whose teachers were most methodical (or least flexible) in their planning had been excluded from pool.

Participants were 51.2% female. Fully 61.2% of the participants were 19 years old, with only 21.8% being older than that. Men were slightly older than women (with a significant Pearson correlation of 0.214) and tended to have more work experience (0.161). People from outside of North American culture tended to be older (0.385). Part-time jobs were held by 58.8% of the participants, all of whom were full-time students. Of the participants, 25% had spent more than ten years of their lives outside of North America, 17% had spent ten years or less living outside of North America and 58% reported having spent all of their lives in North America.

MATERIALS
Manipulation

The humor elements that were manipulated were those on the mini-case question and answer pages: cartoon drawings of Dilbert and ironic wisecracks from Dogbert. These elements were chosen from amongst the many humorous elements of the training experience (humorous posters, an introductory videotape, funny board pieces, brightly colored game booklets) for four reasons. First, the ironic wisecracks and cartoon drawings were accessible and easy to manipulate. Second, that manipulation could be obscured from the participants to avoid demand characteristics. Third, they constituted a discrete manipulation of elements that had be designed to elicit a response, and the removal of which could be expected to have a discernable (if small) impact. Fourth, the wisecracks and cartoon drawings were theoretically relevant because they represented both irony and mood-enhancing humor. Furthermore, those two elements together created an example of self-effacing humor, from the point of view of the company.

Since it was only the peripheral humor that was manipulated, teachers could be sure that every participant received some ethics training. This was an essential feature of the design because the event was being offered as a component of the participants' education.

Mini-cases

A panel of three senior students was convened at the technical institute at which the researcher was employed as a part-time instructor. Two of these students were female (age 26 and 29), while the other student and the researcher were male (age 27 and 46). The students and the researcher went through all fifty of the mini-cases paying particular attention to the ironic wisecracks from Dogbert. They were rated for funniness by being given a score of 1 if the rater found it particularly funny and a score of 0 otherwise. Afterwards the researcher went through the cases, noting any for which two answers were considered equally correct by the answer guide. These were eliminated because they would have been difficult to score. The researcher also eliminated any cases in which the ironic wisecracks tended to lead the participants toward the right choice or away from a wrong choice.

There were twelve mini-cases with facetious responses that were found particularly funny by three or four panelists, that were not leading, and that did not have more than one best choice. These were numbers 16, 17, 23, 24, 28, 29, 36, 41, 42, 45, 48, and 50.

From these, selections were made to reflect the best possible balance among the six ethical values that were espoused by Lockheed Martin: honesty (12 mini-cases), integrity (15 mini-cases), respect (three mini-cases), trust (four mini-cases), responsibility (14 mini-cases), and citizenship (two mini-cases). They were described by the company as follows:

HONESTY: to be truthful in all our endeavors; to be honest and forthright with one another and with our customers, communities, suppliers, and shareholders.

INTEGRITY: to say what we mean, to deliver what we promise, and to stand for what is right.

RESPECT: to treat one another with dignity and fairness, appreciating the diversity of our workforce and the uniqueness of each employee.

TRUST: to build confidence through teamwork and open, candid communication.

RESPONSIBILITY: to speak up - without fear of retribution - and report concerns in the work place including violations of laws, regulations and company policies, and seek clarification and guidance whenever there is doubt.

CITIZENSHIP: to obey all the laws of the United States and the foreign countries in which we do business and to do our part to make the communities in which we live a better place to be.

Since there was only one qualified mini-case involving each of the following ethical values (integrity #17, respect #42, trust #24, and citizenship #36), these four mini-cases were automatically selected for inclusion in the questionnaire. Of the three qualified mini-cases involving honesty (numbers 28, 29 and 45), only #28 had been rated particularly funny by all of the panelists so it was selected for inclusion in the questionnaire. Of the five mini-cases involving responsibility, only two (numbers 41 and 50) were rated as particularly funny by all of the panelists. Because the ironic wisecrack in mini-case #41 was very similar to the one in mini-case #24, which had already been selected for inclusion in the questionnaire, mini-case #50 was selected instead. Thus the experimental cases selected were numbers 17, 24, 28, 36, 42, and 50.

One mini-case, number 28, contained an error. The phrase that read see what is his explanation is in the Leader's Guide was corrected to read see what his explanation is. Another mini-case, number 36, referred to a $50 ceiling on gift-giving that was well known to Lockheed Martin employees, but would be unfamiliar to the participants of this experiment. Accordingly, the sentence Company rules prohibit the exchange of gifts over $50 was added.

Two mini-cases were needed for a pre-test of participants' case solving skills and two more for a post-test, so that within-subject comparisons could be made. For this purpose, the joke responses would not be used. There were four mini-cases that had not been considered funny by any panelist. Two of these (numbers 19 and 30) involved responsibility and the other two (numbers 6 and 35) involved integrity. One from each category (numbers 6 and 19) was selected for inclusion in the pre-test and the others (numbers 30 and 35) were used as the post-test.

One mini-case, number 19, referred to an expensive gyroscope. Because participants in the experiment might not be familiar with that piece of equipment, the following statement was added. A gyroscope is a sensitive and delicate instrument that helps space vehicles navigate.

Measures

The dependent variable was persuasion, which has been defined above as attitude change. Attitude can be measured in three ways. First, participants can be asked about their attitude toward the object (the direct approach). Second, participants can be asked to indicate the degree to which they agree or disagree with prepared statements about the object (the quasi-direct approach). Finally, other factors such as galvanic skin response measures of the direction of erroneous estimates can be investigated (the indirect approach). Two methods were used in this study. Participants were asked to indicate their agreement with descriptive adjectives (a quasi-direct approach) and to report behavioral intentions (an indirect approach).

The descriptive adjectives for the quasi-direct approach came from open-ended questions on earlier pilot studies and included: (a) successful, (b) serious, (c) ethical, (d) caring, and (e) fun. Participants were not asked to select one, but rather to circle a number for each choice on a 5-point Likert scale anchored with the terms Does not apply (1) and Applies very well (5).

Behavioral intentions for the indirect approach were assessed with the answers to the question If you worked for Lockheed Martin and had to make a decision on an ethical issue, who would you likely contact for advice? The available choices were those that had been mentioned most often in response to an open-ended question in the earlier pilot studies: (a) your manager, (b) your co-workers, (c) the ethics office, (d) friends or family, (e) the ethics help line, and (f) the legal department. Participants were not asked to select one, but rather to circle a number for each of the choices on a 5-point Likert scale anchored with the terms Very unlikely (1) and Very likely (5).

Three humor scales were selected for inclusion in the questionnaire to operationalize the construct need for levity. Each of them was a six-item scale so that this whole section consisted of 18 questions. The first humor scale was the Sense of Humor Questionnaire first developed in 1974 by Sven Svebak at the University of Trondheim in Norway. His test began with 22 items, was then revised to 21 items on three sub-scales, and now uses just six items to produce a Cronbach alpha of 0.85. This scale has been used widely in humor research , so there is a body of data against which to compare it.

The second humor scale to be included was the Coping Humor Scale (CHS) developed in 1983 by Herb Lefcourt (Psychology, University of Waterloo) and Rod Martin (Psychology, University of Western Ontario), to assess the extent to which participants use humor as a method for coping with problems at work . This measure was selected for its relevance to work and its compact size. Correlations of about 0.50 were found with peer responses to When faced with problems or difficulties, to what extent would you say this person finds something funny about the problem situation? and correlations of -0.58 to -0.78 with peer responses to To what extent would you say that this person takes himself/herself too seriously? . High correlations were found between CHS scores and joking and laughing at a dentist's office . A test-retest reliability of 0.80 over a 12-week period has been reported using this test . More recently, William Hampes found Pearson correlations of 0.52 between a measurement of trust and the CHS score.

Although the original instrument contained seven items, it has been found that item number four is interpreted differently by different readers and that reliability is improved if only the other six are used . Accordingly only those six items were included in this questionnaire.

The third humor test was the HIS or Humor Initiation Scale . It has been included because it measures humor creation. Willibald Ruch has discovered that self-report scales of this type have more validity (especially convergent validity) for creation of humor than for appreciation of humor.

Three statements about Dilbert cartoons were presented with five-point Likert scales anchored at the extremes as Agree Strongly and Disagree Strongly. The statements were I have seen Dilbert cartoons, I find Dilbert cartoons funny, and I agree with Dilbert's ideas. Those who had not seen Dilbert cartoons were instructed to circle one for the first choice and leave the other two blank. These questions were meant to address the proposition that those who liked Dilbert better, and were familiar with the character, would respond better to the humor in the experiment.

Because people tend to over-estimate their sense of humor, it may be considered the most inaccurately judged characteristic . To assess the strength of this effect, the short form of the Marlowe-Crowne Social Desirability Scale was included. The original scale was a response to the then-popular Edwards' Social Desirability Scale (SDS) which tested participants for answers that were statistical outliers (and not likely true). Marlowe and Crowne proposed a list of 33 statements to be rated as true or false by the participants. Questions were asked that were socially desirable but unlikely to be true, such as Before voting, I thoroughly investigate the qualifications of all the candidates. To a large extent these were based on the Lie scale of the Minnesota Multiphasic Personality Inventory (MMPI). Although the items used on that version are still timely, the size was cumbersome. Shorter versions of the scale were tested. The best 13 items were found to be 90% as effective as the full version, and the items could then be presented as a separate part of a survey instead of interspersing them to disguise their intent . Additionally, the shorter version was more likely to be completed carefully by participants and for that reason it was adopted in this study.

PROCEDURE

At the beginning of the class, the researcher presented a brief introduction to the exercise which included a class discussion on the meaning and value of ethics in business. Because this discussion was interactive, its content varied somewhat from one administration to another. However, an outline was followed by the researcher to ensure that the content was approximately the same, and that all important points were covered. Whatever differences there were from one class group to the next were spread evenly across all of the treatment conditions in that class group, since every treatment condition was used in every class. An analysis of variance would be conducted to identify any systematic differences among the class groups.

Random assignment was achieved surreptitiously. After the introduction, participants took experimental packets from a pile that had been pre-shuffled by the researcher. This concealed the existence of different versions of the packet (the four different experimental treatments). Participants were asked to refrain from talking to one another during the administration of the exercise, a condition they were accustomed to observing during written work, and their compliance with this condition was monitored. If participants happened to glance at another participant's experimental packet they would soon see that it was identical and be discouraged from further curiosity. (With the exception of the small experimental treatment differences discussed above, the experimental packets were absolutely identical.)

Step-by-Step

After signing a consent form (Appendix A) and removing a copy to retain for themselves, participants were asked to fill out several pages of preliminary information (Appendix B). There were the pre-tests for intention and attitude and two mini-cases to try for practice. They helped to establish a baseline of pre-existing competence at solving business ethics cases, and to reassure participants about the nature of the exercises they would be undertaking. There were the three short humor scales, three questions about Dilbert cartoons, and the short form of the social desirability instrument.

When these were completed, participants turned their experimental packets over to indicate that they were done (and to prevent peeking ahead at one another's mini-cases). When all participants were ready, everyone was instructed to turn their packets back over, read the first mini-case, and decide on their preferred response. Then they were instructed to turn the packets over again. This was ostensibly to prevent others from copying their responses, but also to prevent participants from comparing notes and discovering the slight differences among the various experimental treatments.

Each mini-case took up one whole page (Appendix C, D, E, F) with a small graphic in the top right corner (Dilbert or Zeus, depending on the experimental treatment). The brief mini-case was typed at the top followed by the four multiple-choice responses (each with a corresponding letter). Below that was a fifth response that was either a wisecrack or the statement I prefer not to answer this question accompanied by either a small picture of Dogbert or a graphic arrow, depending on the experimental treatment group. There was also a place to record scores and cumulative scores, although participants were assured that these were not required and would not be graded in any way.

After all students were done with a mini-case, the researcher asked how many had chosen A and then B and so forth. Then the official scores (out of five) for each response were announced by reading them (and their justifications) from the Lockheed Martin Leader's Guide.

Participants recorded their scores, sometimes accompanied by groans of disagreement or outbursts of agreement. It was explained that these answers were provided by a panel of ethics experts, but that there was usually room for discussion about ethical issues. Participants were asked to hold their comments and questions for the discussion that would follow the experiment, but if they had burning issues these were noted on the board by the researcher for later discussion. In this way the issues had been acknowledged but also put on hold so that the experiment could continue. Although this technique minimized differences among the class groups, it did not eliminate them.

After working through all six of the selected mini-cases, participants were asked to take a few minutes to fill out the rest of the experimental packet (Appendix G) before entering into a free-for-all discussion of the cases and students' opinions about the best responses. In the packet, they were asked which elements of the training they had found entertaining. This maintained the cover story that they were testing out a new way of teaching business ethics, and provided a manipulation check. Next they were asked who they now thought they would consult on ethical matters and what adjectives they now thought were applicable to the firm (the post-tests). Then they were asked a few brief demographic questions: How many years have you worked (full time, part time)? Are you presently employed (full time, part time)? How many years have you lived in North America? What is your current age? Are you male or female?

Two more mini-cases were provided to assess improvement in case-solving skill. After these distraction tasks, participants were asked to recognize the correct answers to the six mini-cases that had been discussed in the learning activity and were thanked for their participation. Once the experimental packets had been collected and put away, a discussion ensued about the answers that had been provided by the company, whose interests they served, and any other issues students wanted to raise. Participants were debriefed about the experiment and asked not to discuss it with students in other classes. The entire experience took just over an hour and the discussion filled the time until the 90-minute class (or, in two cases, the 90-minute segment of a three-hour class) ended.

RESEARCH HYPOTHESES

Based on the proposition that any humor would create a positive and playful mood that would make participants more susceptible to persuasion, the first hypothesis compared the intact version of the mini-cases with the version that had both elements of humor removed. The null hypothesis was that there would be no difference on any of the six behavioral intention measures, the five descriptive adjective measures, the skill tests, or the ability to recall correct answers.

H1 - Participants with neither cartoon drawings nor ironic wisecracks in their experimental packets will report less improvement in behavioral intention and attitude, demonstrate less increased skill, and recall fewer correct answers than those with both cartoon drawings and ironic wisecracks.

Based on the proposition that cartoon drawings would build rapport with participants (demonstrating similarity) and make them more susceptible to persuasion, the second hypothesis compared the intact version of the mini-cases with the version that had the cartoon drawings removed. The null hypothesis was that there would be no difference on any of the six behavioral intention measures, the five descriptive adjective measures, the skill tests, or the ability to recall correct answers.

H2 - Participants without cartoon drawings in their experimental packets will report less improvement in behavioral intention and attitude, demonstrate less increased skill, and recall fewer correct answers than those with both cartoon drawings and ironic wisecracks.

Based on the proposition that ironic wisecracks would distract participants and make them more susceptible to persuasion, the third hypothesis compared the intact version of the mini-cases with the version that had the ironic wisecracks removed. The null hypothesis was that there would be no difference on any of the six behavioral intention measures, the five descriptive adjective measures, the skill tests, or the ability to recall correct answers.

H3 - Participants without ironic wisecracks in their experimental packets will report less improvement in behavioral intention and attitude, demonstrate less increased skill, and recall fewer correct answers than those with both cartoon drawings and ironic wisecracks.

Based on the proposition that the self-effacing combination of wisecracks in the Dilbert anti-management context would greatly enhance credibility, and make participants more responsive to persuasion, the fourth hypothesis compared the intact version of the mini-cases with the versions that had either the ironic wisecracks or the cartoon drawings removed. The null hypothesis was that there would be no difference on any of the six behavioral intention measures, the five descriptive adjective measures, the skill tests, or the ability to recall correct answers.

H4 - Participants with cartoon drawings or ironic wisecracks but not both in their experimental packets will report less improvement in behavioral intention and attitude, demonstrate less increased skill, and recall fewer correct answers than those with both cartoon drawings and ironic wisecracks.

Based on the proposition that age does not make much difference once adulthood has been achieved, the fifth hypothesis compared the regression coefficients of humor treatment, age, and the interaction of humor treatment with age.

H5 - Age will not interact significantly with the effects of the humor treatment on behavioral intention, attitude, skill, or recall.

Based on the proposition that male participants would enjoy Dilbert humor better and be more comfortable humor in general, the sixth hypothesis compared the regression coefficients of humor treatment, gender, and the interaction of humor treatment with gender.

H6 - Male gender will significantly amplify the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

Based on the proposition that those with more work experience would identify with Dilbert better and appreciate the humor more, the seventh hypothesis compared the regression coefficients of humor treatment, experience, and the interaction of humor treatment with experience.

H7 - Experience will significantly amplify the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

Based on the proposition that those who have spent less time in North American culture would identify less with the characters, be less comfortable with this use of humor, and not appreciate the humor as well, the eighth hypothesis compared the regression coefficients of humor treatment, time outside of North America, and the interaction of humor treatment with time outside of North America.

H8 - Time spent living outside of North America will significantly attenuate the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

Based on the proposition that those with a stronger sense of humor would appreciate the humor better and therefore be convinced by it, the ninth hypothesis compared the regression coefficients of humor treatment, sense of humor, and the interaction of humor treatment with sense of humor.

H9 - Sense of humor will significantly amplify the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

Based on the proposition that those who express a stronger affinity for Dilbert would appreciate that humor more and therefore be convinced by it, the tenth hypothesis compared the regression coefficients of humor treatment, appreciation of Dilbert, and the interaction of humor treatment with appreciation of Dilbert.

H10 - Appreciation for Dilbert will significantly amplify the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

Based on the proposition that those who are disposed to give socially desirable responses will be less frank and less spontaneous, and respond less strongly to the humor, the final hypothesis compared the regression coefficients of humor treatment, socially desirable responses, and the interaction of humor treatment and socially desirable responses.

H11 - The propensity to provide socially desirable responses will significantly attenuate the effects of the humor treatment on behavioral intention, attitude, skill, and recall.

DATA ANALYSIS

The overall research question for this study was Does the use of various types of humor in the Ethics Challenge support persuasion?

The dependent variable was persuasion, operationalized as within-subject variation (gain scores) in (1) behavioral intention, (2) attitude, and (3) case-solving skill, and (4) between-subject variation in ability to recall the correct case answers. Simple gain scores have recently been defended as informative even when they are not fully reliable . Ordinal data was obtained in this study with Likert scales but, as is the practice in such cases, was treated as interval data for the purpose of analysis. This treatment is based on the dubious but common assumption that participants see the points on the scale as equidistant.

The independent variable was humor, presented at four discrete levels: intact, without cartoons, without wisecracks, and without either. Other variables were introduced as potential moderators; age, gender, work experience, years in the culture, measures of the sense of humor, preference for Dilbert, and socially desirable response.

Step-by-Step

After each administration of the experiment, packets were checked by the researcher to verify that the informed consent form had been signed. If so, it was detached from the packet and stored separately to preserve complete participant confidentiality. However, if the consent form was not signed, the entire packet was discarded and excluded from the analysis. Although it would have been interesting to have retained those forms to analyze non-responders, the language of the informed consent statement prohibited it.

Data from the experimental packets was then entered directly into a Microsoft Excel spreadsheet. First the date of administration was noted along with the treatment condition. Then the participant's responses were entered. All numbers selected on Likert scales were entered directly. For mini-cases (experimental, pre-test and post-test), the official Lockheed Martin scores that corresponded to the participants' selected answers were entered. On the final test of recall, a one was entered for a correct response and a zero otherwise. Humor scales and the social desirability scale were scored according to their authors' instructions. Demographic information, where it consisted of numbers of years, was entered directly. Gender was entered as one for male and zero for female, and current employment status was entered as one for yes and zero for no.

The spread sheet generated calculations of gain-scores (pre-post-score changes) in intention and attitude, total scores for the humor and social desirability scales, and differences between pre- and post-experiment case solution scores. A measure of international experience was created by subtracting years in North America from age. A measure of work experience was created by adding years of full-time experience to half of the number of years of part-time experience. Averages and standard deviations were computed for each entry and then the data was re-sorted by experimental treatment.

Once all of the data had been entered, the information was cut-and-pasted directly into a MiniTab 13 worksheet for further analysis.

In order to assess the success of randomization and to identify any imbalances that should be addressed later, one-way analysis of variance was used to verify that the pre-test measures of the dependent variable did not differ significantly across class groups or by demographic factors.

As a manipulation check, it was verified that the answers to the question Which elements of this learning activity were entertaining? differed across treatment groups.

Then, paired t-tests were used to verify that the game (in its intact version) was in fact achieving significant changes in the expected direction for each measure of the dependent variable. Self-reported intention to consult the manager, the ethics office, the ethics help line, and the legal department was expected to increase. Self-reported intention to consult co-workers or family and friends was expected to decrease. Self-reported agreement with each of the descriptive adjectives was expected to increase. Skill at solving cases was expected to increase.

As a first stage of hypothesis testing, one-way analysis of variance was applied across the gain scores. It was expected that the differences among groups due to the humor manipulation would be very small and possibly not visible to ANOVA at the p=.05 level of significance. Because the use of humor is only peripheral, the effects are very small and difficult to assess.

Accordingly, specific predictions were made about the expected effects so that planned (a priori) comparisons could be made between pairs of means. As a second stage of hypothesis testing, independent t-tests were conducted for each measure of the dependent variable. Under this multiple t-test or unprotected t procedure, pairs of relationships are contrasted by using the family-wise alpha rate of .05 and adopting the pooled variance from the ANOVA instead of individual sample standard deviations.

This is a very liberal approach, since conducting multiple t-tests on the same data at an alpha level of .05 can double the possibility of Type I error (falsely rejecting the null hypothesis). For that reason, some researchers would recommend using a correction such as the Benjamani and Hochberg procedure for false discovery rate . However, this would tend to discard just the sort of small effects that are being sought in this study. Accordingly, the more liberal approach was adopted, and the limitation was noted in the Discussion chapter.

After testing the first four hypotheses, interaction effects were assessed. Using linear regression, coefficients for the humor treatment alone were compared against those for the proposed moderator and the interaction term created by multiplying the two together.

Finally, Pearson product-moment correlations were used to explore further inter-relationships among the data that had not been specifically predicted in the research hypotheses. This exploratory work allowed unanticipated findings to emerge.

Frontspiece

Introduction

Literature

Foundations

Methods

Results

Discussion

References

Appendices