This chapter consists of two sections. The first one describes tests for suitability of the data; including testing for randomization, participant predispositions, manipulation checks, and assurance that the intact version of the game was effective. The second section reports the results of testing the research hypotheses; looking at the main effects, interaction effects, and then correlations among other variables.
This first section describes tests that were performed on the data to be sure it was robust. It is somewhat technical in nature. The next section reports the results of hypothesis testing.
There were enough participants in the experiment to assign around 30 observations to each cell. Thus normal distributions could be assumed and parametric tests were utilized. Non-pooled t-tests were used throughout to avoid vulnerability to heterogeneity of variance, and an alpha level of .05 was adopted.
Some data was grouped into categories to create cells large enough for analysis. Age was handled by grouping participants into those who were under 19 (n = 29), those who were 19 (n = 103), and those who were over 19 (n = 38). The new variable experience was calculated as the number of years of full-time experience plus one-half the number of years of part-time experience. Participants were grouped into those who had less than one year of experience (n = 39), those with one to one-and-a-half years of experience (n = 66), those with more than one-and-a-half to two-and-a-half years of experience (n = 43), and those with more than two and a half years of experience (n = 22).
Time outside of North America was calculated as Age minus Years Spent Living in North America. Participants were grouped into those who had never lived outside of North America (n = 99), those who had lived ten years of less outside of North America (n = 29), and those who had lived more than ten years outside of North America (n = 42). Sense of humor was calculated by summing the scores on the three humor scales. Participants were grouped into those with a score of 47 or less (n = 41), those with scores from 48 to 51 (n = 41), those with scores from 52 to 57 (n = 43), and those with a score of 58 or higher (n = 45).
Random Treatment Groups
To verify the random assignment of participants to groups, an Analysis of Variance was conducted across humor treatment groups (those who saw versions with no humor, cartoons only, wisecracks only, or both). Treatment groups were compared to see if they differed significantly, which would threaten the randomness of the assignment. Each treatment group was measured in terms of the average age classification, gender, level of experience, time outside of North America, and sense of humor. No significant differences among them were indicated by the Analysis of Variance, as illustrated in Table 1. Thus it was accepted that the treatment groups had been assigned randomly.
Differences Among Class Groups
An Analysis of Variance was run to compare the different class groups. The pre-test answers of the different class groups were compared to be sure that they did not differ in a systematic way, which would confound the results. The results are presented in Table 2 and discussed below.
There were two significant differences among the class groups, challenging the assumption that the classes had been randomly constituted. Although this assumption was not crucial to the experiment, since students within each class had been randomly assigned to all four treatment groups, it was important to understand any systematic differences among the groups so that controls could be introduced if necessary. The results of this analysis make up the first column of Table 2.
The first significant variation had to do with the perception that the Lockheed Martin company was successful. Specifically, Tukey's pairwise comparisons (family error rate = 0.05) revealed that later classes were less likely to consider Lockheed Martin a successful company. Indeed there was a downward trend sufficient to generate a Pearson correlation of -.272 (p < .001). This led to fears that students may have been discussing the experiment among themselves (with people in other classes). However, that proposition was disconfirmed when it was found that there were no such correlations over time for any of the other variables.
Speculating, this downward trend may have been related to the periodic release through the news media of negative information about the company's performance contemporary with the conduct of these experiments. It was decided that this pattern did threaten the suitability of the data for analysis.
Another variation was that the class sampled on December 1 exhibited a strong predilection to consult co-workers, according to Tukey's pairwise comparison. The reason for this was not apparent, and may have been related to the personal style of the local instructor or some incident that had occurred earlier in the course. However, this inflated reliance on co-workers seems to have been counteracted by a smaller increase in intention to consult co-workers during the experiment. By the end of the process, there turned out to be no overall correlation between class group and intention to consult co-workers. The situation was under control and no further action was taken to control for it.
The remaining columns of Table 2 report differences on pre-test scores due to other factors than class group. Each of those is discussed here.
Gender seemed to make very little difference, except that women tended to rate the sponsor as ethical significantly more often than males.
Work experience had a surprising effect. In the categories of work experience, those with more than two and a half years of work experience were significantly less likely to consult co-workers for ethics advice compared to the three classifications with less work experience. It had been expected that those who had more work experience would rely more on their peers for advice.
Those who had spent more time outside of North America were also significantly less likely to rely on co-workers, friends and family for ethics advice. It had been expected that people from outside of North America might have come from collectivist societies that would rely more on personal acquaintances for advice.
At first, there seemed to be a systematic effect for sense of humor. However, on inspection of Tukey's pairwise comparisons, there was no discernable pattern. Those who scored in the first and third quartiles on sense of humor tests rated the sponsor as more successful than those who scored in the second and fourth quartiles.
On investigation, none of the predispositions identified in the various columns of Table 2 seriously threatened the suitability of the data for hypothesis testing, so the next step was taken.
To verify that the ironic wisecracks and cartoon drawings were in fact entertaining for the participants, a manipulation check was performed using t-tests. Participant ratings of the various elements of the game were compared according to whether they had seen cartoons, wisecracks, or neither. The results are presented in Table 3 and discussed below.
The effects of the manipulation were most evident in the ratings of the entertainment value of the graphics. Participants rated them as significantly more entertaining when there was any humor (t66 = 2.81, p = .006), when there were wisecracks (t160 = 2.04, p = .043), and especially when there were cartoon drawings (t165 = 3.30, p = .001). This between-subject result confirmed that the manipulation had been effective.
Efficacy of the Intact Version
Having verified that there were no troublesome variances among the experimental groups before the experiment, and having verified that the manipulation was effective, it remained to verify that the intact version of the board game was still achieving its intended persuasive effects. To suit the demands of the experiment, the game had been altered considerably (removing group interactions and several fun elements), so it could not simply be assumed that it would still be effective in its altered form.
To assess the efficacy of the intact version (as altered for the experiment, but with both elements of humor), paired t-tests were conducted to compare the post-test results with the pre-tests. One-tailed tests were used in this analysis only, to verify that post-tests had changed significantly in the predicted directions.
It was expected that the intact version of the game would increase participants' intention to consult their manager, the ethics office, the ethics help line, and the legal department. It was also expected that it would decrease their intention to consult co-workers or friends and family. Furthermore, playing the game was expected to increase participants’ agreement with the five descriptive adjectives about the sponsor (Lockheed Martin) and their skill at solving pre- and post-experimental cases. The results are presented in Table 4 and discussed below.
When it came to changing the reported behavioral intention of the participants, the altered board game was indeed successful. All of the behavioral intention variables moved in the expected direction and, except for intention to consult co-workers, moved a statistically significant distance. Thus the board game was delivering its intended effects, even though the group interaction and board activities had been completely eliminated. This was strong vindication of the game's effectiveness.
However, the adjectives (meant to serve as a more direct measure of attitude) were less revealing. Participants were only impressed that the company was more serious and more ethical after playing the board game. Changes in agreement with the words successful and caring were insignificant, and agreement that the company was fun actually fell slightly after playing the game.
Case-solving skill did not seem to improve with this version of the game, according to raw scores. Most subjects found it harder to solve the post-test cases than the pre-test cases. Although the pilot group did not demonstrate this effect, the participants in this experiment earned negative gain scores across the board. Thus this measure did not support the effectiveness of the Ethics Challenge. However, the relative performance of different treatment groups on these cases was still an important measure.
Not Familiar with Dilbert
This researcher had been surprised that 46% of the participants reported that they had not seen Dilbert cartoons. Accordingly, the effects of this phenomenon were investigated. Fortunately, as seen in Table 5, t-tests between those who had and had not seen Dilbert cartoons revealed no significant difference on any of the measures of the dependent variable.
After the data had been tested and found suitable for analysis, attention turned to testing the hypotheses.
The first four hypotheses investigated the relationship between the independent variable (humor) and the dependent variable (persuasion). Would humor actually improve the effectiveness of the persuasive message? Would the removal of elements of the humor inhibit its effectiveness? First a one-way analysis of variance was conducted on all the measures of the dependent variable to see if there was any significant difference across the four different humor conditions. The results are presented in Table 6 and discussed here.
The first column begins with six possible responses to the question, Who would you contact with questions about an ethical situation? It was hypothesized that participants would increase their intention to consult managers, the ethics office, the ethics help line, and the legal department after taking the Ethics Challenge; while they would decrease their intention to consult co-workers or family and friends. It was further hypothesized that removing elements of the humor would interfere with that result. However, only the variable for consulting the ethics office changed significantly across humor treatments.
The next five items are descriptive-adjectives that were rated as more or less applicable to the sponsoring company (Lockheed Martin). It was hypothesized that participants would increase their agreement with each of these adjectives after taking the Ethics Challenge, and that removing elements of the humor would interfere with that result. That was only true for the two adjectives successful and serious.
Skill item is made up of the sum of post-test cases numbers 1 and 2 minus the sum of pre-test cases numbers 1 and 2. It was hypothesized that this gain score would be reduced by the removal of elements of the humor, but in fact there was no significant difference.
Recall item is the number of case answers (out of six) remembered correctly after a few distraction tasks at the end of the experiment. It was hypothesized that this number would decrease with the removal of elements of the humor, but in fact there was no significant difference.
As expected in light of previous research, analysis of variance identified only a few significant relationships. Intention to contact the ethics office (the main goal of the Ethics Challenge) varied significantly across treatment groups, as did agreement with the adjectives successful and serious. The other variables did not reach significance.
In anticipation of small effects, four specific predictions had been made about the effects so that planned (a priori) comparisons could be made between the intact version of the game and other versions. All of the results are reported in Table 7 for the sake of completeness, but only those variables that had been identified as significant using analysis of variance were taken into consideration in this analysis.
H1 - Removing all of the humor from the persuasive message was expected to reduce its effectiveness. To test this, gain scores using the intact version were contrasted with those using no humor at all. The resulting t-scores are shown in the first column of Table 7.
With no humor at all in the persuasive message, participants were less likely to report increased agreement that the company was successful (the strongest effect) or serious, and less likely to report increased intention to consult the ethics office.
H2 - Removing cartoon drawings from the persuasive message was expected to reduce its effectiveness. To test this, gain scores using the intact version were contrasted with those using only the ironic wisecracks. The resulting t-scores are shown in the second column of Table 7.
The results in this case were inconclusive. Without the cartoon drawings, participants were slightly less likely to report increased intention to consult the ethics office or to agree with the adjective serious, but neither effect reached significance.
H3 - Removing ironic wisecracks from the persuasive message was expected to reduce its effectiveness. To test this, gain scores using the intact version were contrasted with those using only the cartoon drawings. The resulting t-scores are shown in the third column of Table 7.
Without the wisecracks, participants were much less likely to report increased agreement that the company was serious. They were also less likely to report increased intention to consult the ethics office. Both of these effects were stronger when removing the ironic wisecracks than they were in any other case. There was a similar effect on consulting the ethics help line, but it did not reach significance and was therefore excluded.
H4 - Interfering with the self-effacing nature of the persuasive message (by removing either component of the humor) was expected to reduce its effectiveness. To test this, gain scores using the intact version were contrasted with those using one and only one element of the humor. The resulting t-scores are shown in the first column of Table 7.
Without the interaction of the two humor elements, participants were less likely to report increased intention to consult the ethics office. They were also less likely to report increased agreement that the company was serious. This effect was very similar to the removal of the wisecracks, but slightly less pronounced. There were similar effects on recall and consulting the ethics help line, but neither of these reached significance so they were excluded.
Overall, removing wisecracks affected intention to consult the ethics office and agreement that the company was serious. Interfering with the self-effacing interaction affected the same variables less strongly. Removing the humor altogether affected intention to consult the ethics office and agreement that the company was successful or serious. Removing the cartoon drawings alone had no significant effect.
The effects of several potential moderators (age, gender, work experience, years outside of North America, sense of humor, finding Dilbert funny, and socially desirable responses) were investigated. Would these variables interact with the treatment variable (humor level) to moderate its effects on the dependent variable (persuasion)? The results of a linear regression comparing the effects of the treatment with interaction effects are presented in Table 8.
H5 - Age was expected to have very little effect, because the literature does not foresee much development in the sense of humor of adults. Although the tightly restricted age range limits the meaning of this test, age did seem to interact with the humor treatment, such that older students were more influenced by the humor to improve their agreement that the company (Lockheed Martin) was fun, caring, and ethical.
H6 - Gender did not significantly interact with the humor treatment.
H7 - Work experience did not significantly interact with the humor treatment.
H8 - Time spent outside of North America did not significantly interact with the humor treatment.
H9 - Sense of humor did interact with the humor treatments. Those with more sense of humor were less influenced to abandon their intention to consult friends and family. Regressing for interaction effects revealed that it was the Humor Initiation Scale (T = 2.36, p = .019) and the Sense of Humor Questionnaire (T = 2.24, p = .026) that were involved in the effect, rather than the Coping Humor Scale (T = 0.48, p = .634).
H10 - Appreciation for Dilbert cartoons interacted with the humor treatment in a specific way. Those who found Dilbert more funny were significantly less influenced by the humor to trust the legal department. Also, those who appreciated Dilbert were less influenced by the humor to increase their agreement that the company was fun.
H11 - Willingness to provide socially desirable responses did not significantly interact with the humor treatment.
Correlations among non-treatment variables (summarized in Table 9) uncovered certain interesting patterns in the results.
The three questions about Dilbert were highly correlated with one another, as would be expected. Many who had seen Dilbert cartoons agreed with his views, and many more found the cartoons funny.
Furthermore, the three tests of the sense of humor were correlated with one another. They were correlated highly enough to suggest that they were measuring the same phenomenon, but not so highly as to be redundant. As expected from previous research, the Humor Initiation Scale was correlated with gender; men seem to be more comfortable initiating humor.
Recall was negatively correlated with age and time spent outside of North America. In other words, students who were younger and less experienced remembered the academic answers better. Presumably they had been at school more recently and were blessed with good study habits.
Solving pre-test cases was negatively correlated with the Marlowe-Crowne social desirability scale, such that people who were willing to give socially desirable responses were less skilled at solving the pre-test cases.
Naturally, age was correlated with work experience and time spent living outside of North America.
To summarize the results of the hypothesis tests, removing ironic wisecracks or self-effacing humor, and to a lesser degree removing all of the humor, affected whether participants were persuaded. There were few interactions, except with sense of humor and appreciation for Dilbert cartoons. Younger participants recalled answers better. In general, there were few significant effects and they were weak.
© 2001, James Bruce Lyttle