Operant Conditioning

Introduction

A well-established behavioural theory is that of operant conditioning, as formulated by B. F. (Burrhus Frederic) Skinner (1904–1990). Commencing in the 1930s, Skinner published a series of papers detailing the results of laboratory studies with animals, wherein he identified the various components intrinsic to operant conditioning. He summarised much of this early work in his influential tome, *The Behaviour of Organisms* (Skinner, 1938).

Skinner applied his ideas to the resolution of human difficulties. Early in his career, he evinced a keen interest in education and subsequently developed teaching machines and programmed instruction. *The Technology of Teaching* (Skinner, 1968) addresses matters of instruction, motivation, discipline, and creativity. In 1948, subsequent to a period of considerable hardship in his life, he published *Walden Two*, which delineates how behavioural principles might be applied to the creation of a utopian society. Skinner (1971) addressed the exigencies of modern life and advocated the application of a behavioural technology to the design of cultures in *Beyond Freedom and Dignity*. Skinner and others have since applied operant conditioning principles to domains such as school learning and discipline, child development, language acquisition, social behaviours, mental illness, medical problems, substance abuse, and vocational training (DeGrandpre, 2000; Karoly & Harris, 1986; Morris, 2003).

As a young man, Skinner harboured aspirations of becoming a writer (Skinner, 1970):
I built a small study in the attic and set to work. The results were disastrous. I frittered away my time. I read aimlessly, built model ships, played the piano, listened to the newly-invented radio, contributed to the humorous column of a local paper but wrote almost nothing else, and thought about seeing a psychiatrist. (p. 6)

He became interested in psychology after reading Pavlov’s (1927) *Conditioned Reflexes* and Watson’s (1924) *Behaviorism*. His subsequent career exerted a profound influence upon the psychology of learning.

Despite his admission that “I had failed as a writer because I had had nothing important to say” (Skinner, 1970, p. 7), he was a prolific writer who channelled his literary aspirations into scientific writing that spanned six decades (Lattal, 1992). His dedication to his profession is evident in his giving an invited address at the American Psychological Association convention a mere eight days prior to his demise (Holland, 1992; Skinner, 1990). The association honoured him with a special issue of its monthly journal, *American Psychologist* (American Psychological Association, 1992). Albeit his theory has been discredited by contemporary learning theorists on account of its inadequacy in explaining higher-order and complex forms of learning (Bargh & Ferguson, 2000), his influence persists, as operant conditioning principles are commonly applied to the enhancement of student learning and behaviour (Morris, 2003). In the opening scenario, for example, Leo employs operant conditioning principles to regulate student misbehaviour. Emily and Shayna, on the other hand, argue for the significance of cognitive factors.

Conceptual Framework

This section doth deliberate upon the suppositions underlying operant conditioning, how it doth reﬂect a functional analysis of behaviour, and the implications of the theory for the prediction and governance of behaviour. The theory and principles of operant conditioning are of considerable intricacy (Dragoi & Staddon, 1999); those principles most pertinent to human learning shall be encompassed within this chapter.

Scientific Assumptions

Pavlov traced the locus of learning to the nervous system and viewed behaviour as a manifestation of neurological functioning. Skinner (1938) did not deny that neurological functioning doth accompany behaviour, but he maintained that a psychology of behaviour can be comprehended in its own terms, without recourse to neurological or other internal events.

Skinner raised similar objections to the unobservable processes and entities proposed by modern cognitive views of learning (Overskeid, 2007). Private events, or internal responses, are accessible only to the individual, and may be studied through individuals' verbal reports, which are forms of behaviour (Skinner, 1953). Skinner did not deny the existence of attitudes, beliefs, opinions, desires, and other forms of self-knowledge (he, after all, possessed them), but rather qualiﬁed their role.

Individuals do not experience consciousness or emotions, but rather their own bodies, and internal reactions are responses to internal stimuli (Skinner, 1987). A further quandary with internal processes is that translating them into language is arduous, inasmuch as language doth not completely capture the dimensions of an internal experience (e.g., pain). Much of what is termed knowing involves the employment of language (verbal behaviour). Thoughts are types of behaviour that are brought about by other stimuli (environmental or private) and that give rise to responses (overt or covert). When private events are expressed as overt behaviours, their role in a functional analysis can be ascertained.

Functional Analysis of Behavior

Skinner (1953) referred to his means of examining behaviour as a functional analysis:
The external variables of which behaviour is a function provide for what may be termed a causal or functional analysis. We undertake to predict and govern the behaviour of the individual organism. This is our “dependent variable”—the effect for which we are to ﬁnd the cause. Our “independent variables”—the causes of behaviour—are the external conditions of which behaviour is a function. Relations between the two—the “cause-and-effect relationships” in behaviour—are the laws of a science. A synthesis of these laws expressed in quantitative terms yields a comprehensive picture of the organism as a behaving system. (p. 35)

Learning is “the reassortment of responses in a complex situation”; conditioning refers to “the strengthening of behaviour which results from reinforcement” (Skinner, 1953, p. 65). There exist two types of conditioning: Type S and Type R. Type S is Pavlovian conditioning, characterised by the pairing of the reinforcing (unconditioned) stimulus with another (conditioned) stimulus. The S doth call attention to the importance of the stimulus in eliciting a response from the organism. The response made to the eliciting stimulus is known as respondent behaviour.

Although Type S conditioning may explicate conditioned emotional reactions, most human behaviours are emitted in the presence of stimuli rather than automatically elicited by them. Responses are governed by their consequences, not by antecedent stimuli. This type of behaviour, which Skinner termed Type R to emphasise the response aspect, is operant behaviour because it operates on the environment to produce an effect.

Skinner (1938, p. 21):
If the occurrence of an operant is followed by presentation of a reinforcing stimulus, the strength is increased. . . . If the occurrence of an operant already strengthened through conditioning is not followed by the reinforcing stimulus, the strength is decreased.

One might conceive of operant behaviour as “learning by doing,” and, indeed, much learning occurs when we perform behaviours (Lesgold, 2001). Unlike respondent behaviour, which prior to conditioning doth not occur, the probability of occurrence of an operant is never zero, inasmuch as the response must be made for reinforcement to be provided. Reinforcement changes the likelihood or rate of occurrence of the response. Operant behaviours act upon their environments and become more or less likely to occur because of reinforcement.

Basic Processes

This section shall scrutinise the fundamental processes inherent in operant conditioning: to wit, reinforcement, extinction, primary and secondary reinforcers, the Premack Principle, punishment, schedules of reinforcement, generalisation, and discrimination.

Reinforcement

Reinforcement is accountable for the strengthening of response—augmenting the rate of responding or rendering responses more likely to ensue. A reinforcer (or reinforcing stimulus) is any stimulus or event subsequent to a response that conduces to response strengthening. Reinforcers (rewards) are defined predicated on their effects, which are not contingent upon mental processes such as consciousness, intentions, or aims (Schultz, 2006). Inasmuch as reinforcers are defined by their effects, they cannot be ascertained in advance.

Skinner (1953, pp. 72–73):
The sole means of discerning whether a given event is reinforcing to a given organism under given conditions is to conduct a direct test. We observe the frequency of a selected response, then render an event contingent upon it and observe any alteration in frequency. Should there be an alteration, we classify the event as reinforcing to the organism under the existing conditions.

Reinforcers are situationally specific: They appertain to individuals at given junctures under given conditions. That which is reinforcing to a particular student during reading at present may not be during mathematics at present or during reading later. Despite this specificity, stimuli or events that reinforce behaviour can, to some extent, be predicted (Skinner, 1953). Students typically find reinforcing such events as commendation from the pedagogue, leisure time, privileges, stickers, and high marks. Nonetheless, one can never know for certain whether a consequence is reinforcing until it is presented subsequent to a response and one observes whether behaviour changes.

The basic operant model of conditioning is the three-term contingency:

A discriminative stimulus sets the occasion for a response (R) to be emitted, which is followed by a reinforcing stimulus ( , or reinforcement). The reinforcing stimulus is any stimulus (event, consequence) that augments the probability that the response will be emitted in future when the discriminative stimulus is present. In more familiar terms, one might label this the A-B-C model:

Positive reinforcement involves presenting a stimulus, or adding something to a situation, subsequent to a response, which augments the future likelihood of that response occurring in that situation. A positive reinforcer is a stimulus that, when presented subsequent to a response, augments the future likelihood of the response occurring in that situation. In the opening scenario, Leo employs points as positive reinforcers for good behaviour.

'T' refers to teacher, 'L', to learner:

Reinforcement and punishment processes.
Discriminative Stimulus	Response	Reinforcing (Punishing) Stimulus
Positive Reinforcement (Present positive reinforcer)	-	-
'T': gives independent study time	'L': studies	'T': praises 'L': for good work
Negative Reinforcement (Remove negative reinforcer)	-	-
'T': gives independent study time	'L': studies	'T': says 'L': does not have to do homework
Punishment (Present negative reinforcer)	-	-
'T': gives independent study time	'L': wastes time	'T': gives homework
Punishment (Remove positive reinforcer)	-	-
'T': gives independent study time	'L': wastes time	'T': says 'L': will miss free time

Negative reinforcement involves removing a stimulus, or taking something away from a situation subsequent to a response, which augments the future likelihood that the response will occur in that situation. A negative reinforcer is a stimulus that, when removed by a response, augments the future likelihood of the response occurring in that situation. Some stimuli that often function as negative reinforcers are bright lights, loud noises, criticism, annoying people, and low marks, inasmuch as behaviours that remove them tend to be reinforcing. Positive and negative reinforcement have the same effect: They augment the likelihood that the response will be made in future in the presence of the stimulus.

To illustrate these processes, assume that a pedagogue is holding a question-and-answer session with the class. The pedagogue asks a question ( or A), calls on a student volunteer who gives the correct answer (R or B), and praises the student ( or C). If volunteering by this student augments or remains at a high level, praise is a positive reinforcer and this is an example of positive reinforcement because giving the praise augmented volunteering. Now assume that after a student gives the correct answer the pedagogue tells the student he or she need not do the homework assignment. If volunteering by this student augments or remains at a high level, the homework is a negative reinforcer and this is an example of negative reinforcement because removing the homework augmented volunteering.

Positive and Negative Reinforcement

Pedagogues can employ positive and negative reinforcement to motivate students to master skills and expend more time on task. For example, whilst teaching concepts in a scientific unit, a pedagogue might request students to complete questions at the end of the chapter. The pedagogue also might establish activity centres around the room that involve hands-on experiments related to the lesson. Students would circulate and complete the experiments contingent on their successfully answering the chapter questions (positive reinforcement). This contingency reflects the Premack Principle of providing the opportunity to engage in a more-valued activity (experiments) as a reinforcer for engaging in a less-valued one (completing chapter questions). Students who complete 80% of the questions correctly and who participate in a minimum of two experiments need not complete homework. This would function as negative reinforcement to the extent that students perceive homework as a negative reinforcer.

A middle school counsellor working with a student on improving classroom behaviour could have each of the student’s pedagogues mark “yes” or “no” as it relates to class behaviour for that day (acceptable, unacceptable). For each “yes,” the student receives 1 minute in the computer laboratory to play computer games (positive reinforcement for this student). At the end of the week the student can utilise the earned computer time following luncheon. Further, if the student earns a minimum of 15 minutes in the laboratory, he or she need not take a behaviour note home to be signed by parents (this assumes the student perceives a behaviour note as a negative reinforcer).

Extinction

Extinction involves the decline of response strength due to nonreinforcement. Students who raise their hands in class but never get called on may cease raising their hands. Persons who send numerous e-mail messages to the same individual but never receive a reply eventually may quit sending messages to that person.

How rapidly extinction occurs hinges on the reinforcement history (Skinner, 1953). Extinction occurs swiftly if few preceding responses have been reinforced. Responding is much more durable with a lengthier history of reinforcement. Extinction is not the same as forgetting. Responses that extinguish can be performed but are not owing to a lack of reinforcement. In the preceding examples, the students still know how to raise their hands and the persons still know how to send e-mail messages. Forgetting involves a true loss of conditioning over time in which the opportunities for responding have not been present.

Primary and Secondary Reinforcers

Stimuli such as victuals, water, and shelter are denominated primary reinforcers because they are necessary for survival. Secondary reinforcers are stimuli that become conditioned through their association with primary reinforcers. A child’s favourite milk glass becomes secondarily reinforcing through its association with milk (a primary reinforcer). A secondary reinforcer that becomes paired with more than one primary reinforcer is a generalised reinforcer. Persons work long hours to earn money (a generalised reinforcer), which they utilise to purchase many reinforcers (e.g., victuals, housing, televisions, vacations).

Operant conditioning elucidates the development and maintenance of much social behaviour with generalised reinforcers. Children may behave in ways to draw adults’ attention. Attention is reinforcing because it is paired with primary reinforcers from adults (e.g., victuals, water, protection). Important educational generalised reinforcers are pedagogues’ praise, high marks, privileges, honours, and degrees. These reinforcers often are paired with other generalised reinforcers, such as approval (from parents and friends) and money (a college degree leads to a good job).

Premack Principle

Recall that one labels a behavioural consequence as reinforcing only after one applies it and observes how it affects future behaviour. It is somewhat troubling that one must utilise common sense or trial and error in choosing reinforcers because one cannot know for certain in advance whether a consequence will function as a reinforcer.

Premack (1962, 1971) described a means for ordering reinforcers that allows one to predict reinforcers. The Premack Principle avers that the opportunity to engage in a more valued activity reinforces engaging in a less valued activity, where “value” is defined in terms of the amount of responding or time expended on the activity in the absence of reinforcement. If a contingency is arranged such that the value of the second (contingent) event is higher than the value of the first (instrumental) event, an augmentation will be expected in the probability of occurrence of the first event (the reward assumption). If the value of the second event is lower than that of the first event, the likelihood of occurrence of the first event ought to diminish (the punishment assumption).

Suppose that a child is permitted to choose between working on an art project, going to the media centre, perusing a book in the classroom, or working at the computer. Over the course of 10 such choices the child goes to the media centre 6 times, works at the computer 3 times, works on an art project 1 time, and never peruses a book in the classroom. For this child, the opportunity to go to the media centre is valued the most. To apply the Premack Principle, a pedagogue might say to the child, “After you finish perusing this book, you can go to the media centre.” Considerable empirical evidence supports Premack’s ideas, especially with respect to the reward assumption (Dunham, 1977).

The Premack Principle offers guidance for selecting effective reinforcers: Observe what persons do when they have a choice, and order those behaviours in terms of likelihood. The order is not permanent, since the value of reinforcers can change. Any reinforcer, when applied often, can result in satiation and lead to decreased responding. Pedagogues who employ the Premack Principle need to check students’ preferences periodically by observing them and asking what they like to do. Determining in advance which reinforcers are likely to be effective in a situation is critical in planning a programme of behavioural change (Timberlake & Farmer-Dougan, 1991).

Punishment

Punishment diminishes the future likelihood of responding to a stimulus. Punishment may involve withdrawing a positive reinforcer or presenting a negative reinforcer subsequent to a response, as shown in Table 'Reinforcement and punishment processes.' Assume that during a question-and-answer session a student repeatedly bothers another student when the pedagogue is not watching (pedagogue not watching or A; misbehaviour = R or B). The pedagogue spots the misbehaviour and says, “Cease bothering him” ( or C). If the student quits bothering the other student, the pedagogue’s criticism operates as a negative reinforcer and this is an example of punishment because giving the criticism diminished misbehaviour. But note that from the pedagogue’s perspective, this is an example of negative reinforcement (misbehaviour = or A; criticism = R or B; end of misbehaviour = or C). Since the pedagogue was negatively reinforced, the pedagogue is likely to continue to criticise student misbehaviour.

Instead of criticising the student, assume that the pedagogue says, “You’ll have to stay inside during recess today.” If the student’s misbehaviour ceases, recess operates as a positive reinforcer and this is an example of punishment because the loss of recess ceases the misbehaviour. As before, the cessation of student misbehaviour is negatively reinforcing for the pedagogue.

Punishment suppresses a response but does not eliminate it; when the threat of punishment is removed, the punished response may return. The effects of punishment are complex. Punishment often brings about responses that are incompatible with the punished behaviour and that are strong enough to suppress it (Skinner, 1953). Spanking a child for misbehaving may produce guilt and fear, which can suppress misbehaviour. If the child misbehaves in future, the conditioned guilt and fear may reappear and lead the child quickly to cease misbehaving. Punishment also conditions responses that lead one to escape or avoid punishment. Students whose pedagogue criticises incorrect answers soon learn to avoid volunteering answers. Punishment can condition maladaptive behaviours, because punishment does not teach how to behave more productively. Punishment can further hinder learning by creating a conflict such that the individual vacillates between responding one way or another. If the pedagogue sometimes criticises students for incorrect answers and sometimes does not, students never know when criticism is forthcoming. Such variable behaviour can have emotional by-products—fear, anger, crying—that interfere with learning.

Punishment is utilised often in schools to deal with disruptions. Common punishments are loss of privileges, removals from the classroom, in- and out-of-school suspensions, and expulsions (Maag, 2001). Yet there are several alternatives to punishment. One is to change the discriminative stimuli for negative behaviour. For example, a student seated in the back of the room may misbehave often. Pedagogues can change the discriminative stimuli by moving the disruptive student to the front of the class. Another alternative is to allow the unwanted behaviour to continue until the perpetrator becomes satiated, which is similar to Guthrie’s fatigue method. A parent may allow a child throwing a tantrum to continue to throw it until he or she becomes fatigued. A third alternative is to extinguish an unwanted behaviour by ignoring it. This may work well with minor misbehaviours (e.g., students whispering to one another), but when classrooms become disruptive, pedagogues need to act in other ways. A fourth alternative is to condition incompatible behaviour with positive reinforcement. Pedagogue praise for productive work habits helps condition those habits. The primary advantage of this alternative over punishment is that it shows the student how to behave adaptively.

Alternatives to punishment.
Alternative	Example
Change the discriminative stimuli	Move misbehaving student away from other misbehaving students.
Allow the unwanted behavior to continue	Have student who stands when he or she should be sitting continue to stand.
Extinguish the unwanted behavior	Ignore minor misbehavior so that it is not reinforced by teacher attention.
Condition an incompatible behavior	Reinforce learning progress, which occurs only when student is not misbehaving.

Schedules of Reinforcement

Schedules refer to when reinforcement is applied (Ferster & Skinner, 1957; Skinner, 1938; Zeiler, 1977). A continuous schedule involves reinforcement for every correct response. This may be desirable whilst skills are being acquired: Students receive feedback after each response concerning the accuracy of their work. Continuous reinforcement helps to ensure that incorrect responses are not learned.

An intermittent schedule involves reinforcing some but not all correct responses. Intermittent reinforcement is common in classrooms, because usually it is not possible for pedagogues to reinforce each student for every correct or desirable response. Students are not called on every time they raise their hands, are not praised after working each problem, and are not constantly told they are behaving appropriately.

Intermittent schedules are defined in terms of time or number of responses. An interval schedule involves reinforcing the first correct response after a specific time period. In a fixed-interval (FI) schedule, the time interval is constant from one reinforcement to the next. An FI5 schedule means that reinforcement is delivered for the first response made after 5 minutes. Students who receive 30 minutes of leisure time every Friday (contingent on good behaviour during the week) are operating under a fixed-interval schedule. In a variable-interval (VI) schedule, the time interval varies from occasion to occasion around some average value. A VI5 schedule means that on the average, the first correct response after 5 minutes is reinforced, but the time interval varies (e.g., 2, 3, 7, or 8 minutes). Students who receive 30 minutes of leisure time (contingent on good behaviour) on an average of once a week, but not necessarily on the same day each week, are operating under a variable-interval schedule.

A ratio schedule depends on the number of correct responses or rate of responding. In a fixed-ratio (FR) schedule, every nth correct response is reinforced, where n is constant. An FR10 schedule means that every 10th correct response receives reinforcement. In a variable-ratio (VR) schedule, every nth correct response is reinforced, but the value varies around an average number n. A pedagogue may give leisure time after every fifth workbook assignment is completed (FR5) or periodically around an average of five completed assignments (VR5).

Reinforcement schedules produce characteristic patterns of responding. In general, ratio schedules produce higher response rates than interval schedules. A limiting factor in ratio schedules is fatigue due to rapid responding. Fixed-interval schedules produce a scalloped pattern. Responding drops off immediately after reinforcement but picks up toward the end of the interval between reinforcements. The variable-in-terval schedule produces a steady rate of responding. Unannounced quizzes operate on variable-interval schedules and typically keep students studying regularly. Intermittent schedules are more resistant to extinction than continuous schedules: When reinforcement is discontinued, responding continues for a longer time if reinforcement has been intermittent rather than continuous. The durability of intermittent schedules can be seen in persons’ persistence at such events as playing slot machines, fishing, and shopping for bargains.

Generalisation

Once a certain response occurs regularly to a given stimulus, the response also may occur to other stimuli. This is denominated generalisation (Skinner, 1953). Generalisation seems troublesome for operant theory, because a response should not be made in a situation in which it never has been reinforced. Skinner elucidated generalisation by noting that persons perform many behaviours that lead to the final (reinforced) response. These component behaviours are often part of the chains of behaviour of different tasks and therefore are reinforced in different contexts. When persons are in a new situation, they are likely to perform the component behaviours, which produce an accurate response or rapid acquisition of the correct response.

For example, students with good academic habits typically come to class, attend to and participate in the activities, take notes, do the required reading, and keep up with the assignments. These component behaviours produce high achievement and marks. When such students begin a new class, it is not necessary that the content be similar to previous classes in which they have been enrolled. Rather, the component behaviours have received repeated reinforcement and thus are likely to generalise to the new setting.

Generalisation, however, does not occur automatically. O’Leary and Drabman (1971) noted that generalisation “must be programmed like any other behavioural change” (p. 393). One problem with many behaviour modification programmes is that they change behaviours but the new behaviours do not generalise outside the training context. O’Leary and Drabman (1971) offer suggestions on ways to facilitate generalisation.

Discrimination

Discrimination, the complementary process to generalisation, involves responding differently (in intensity or rate) depending on the stimulus or features of a situation (Rilling, 1977). Although pedagogues want students to generalise what they learn to other situations, they also want them to respond discriminately. In solving mathematical word problems, pedagogues might want students to adopt a general problem-solving approach comprising steps such as determining the given and the needed information, drawing a picture, and generating useful formulae. Pedagogues also want students to learn to discriminate problem types (e.g., area, time-rate-distance, interest rate). Being able to identify quickly the type of problem enhances students’ successes.

Generalisation

Generalisation can advance skill development across subject areas. Finding main ideas is relevant to language arts, social studies, mathematics (word problems), and other content areas. A language arts pedagogue might provide students with a strategy for finding main ideas. Once students master this strategy, the pedagogue elucidates how to modify its use for other academic subjects and requests students to think of uses. By teaching the strategy well in one domain and facilitating potential applications in other domains, pedagogues save much time and effort because they do not have to teach the strategy in each content area.

Teaching expected behaviours (e.g., walking in the hall, raising a hand to speak) can also be generalised. For example, if all seventh-grade pedagogues decide to have students utilise the same format for the heading on their papers, it could be elucidated in one class. Then students could be requested to utilise the same format (with minor alterations) in each of their other classes.

Suggestions for facilitating generalization.
Name	Purpose
Parental Involvement	Involve parents in behavioral change programs.
High Expectations	Convey to students that they are capable of performing well.
Self-Evaluation	Teach students to monitor and evaluate their behaviors.
Contingencies	Withdraw artiﬁcial contingencies (e.g., points), and replace with natural ones (privileges).
Participation	Allow students to participate in specifying behaviors to be reinforced and reinforcement contingencies.
Academics	Provide a good academic program because many students with behavior problems have academic deﬁciencies.
Beneﬁts	Show students how behavioral changes will beneﬁt them by linking changes to activities of interest.
Reinforcement	Reinforce students in different settings to reduce discrimination between reinforced and nonreinforced situations.
Consistency	Prepare teachers in regular classes to continue to shape behaviors of students in special classes after they are mainstreamed into the regular program.

Spence (1936) proposed that to teach discrimination, desired responses should be reinforced and unwanted responses extinguished by nonreinforcement. In school, pedagogues point out similarities and differences among similar content and provide for periodic reviews to ensure that students discriminate properly and apply correct problem–solution methods.

Errors generally are thought to be disruptive and to produce learning of incorrect responses. This suggests that student errors should be kept to a minimum. Whether all errors need to be eliminated is debatable. Motivation research shows that students who learn to deal with errors in an adaptive manner subsequently persist longer on difficult tasks than do students who have experienced errorless learning (Dweck, 1975)

Behavioural Change

Reinforcement can be administered for the making of correct responses only when individuals are cognisant of the required actions. Oftentimes, however, operant responses do not manifest in their final, polished state. Should instructors elect to withhold reinforcement until such time as learners proffer the requisite responses, many a learner would remain bereft of reinforcement, owing to their failure to acquire the responses ab initio. We shall now turn our attention to a discussion of the manner in which behavioural change transpires within the ambit of operant conditioning, a subject replete with salient implications for the enterprise of learning.

Successive Approximations (Shaping)

The fundamental operant conditioning method employed in the alteration of behaviour is shaping, also known as differential reinforcement of successive approximations to the desired form or rate of behaviour (Morse & Kelleher, 1977). In order to shape behaviour, one must adhere to the subsequent sequence:

Ascertain the extant capabilities of the student (initial behaviour)
Delineate the desired behaviour
Identify potential reinforcers within the student's environment
Dissect the desired behaviour into diminutive sub-steps, to be mastered sequentially
Transition the student from the initial behaviour to the desired behaviour by means of successively reinforcing each approximation thereof

Shaping is the process of learning through active participation coupled with corrective feedback. A natural illustration of shaping may be discerned in a student endeavouring to propel a basketball from a designated point on the court. The initial shot falls short of the basket. The student imparts greater force upon the second attempt, whereupon the ball strikes the backboard. Upon the third attempt, the student diminishes the force applied, resulting in the ball striking the right rim before bouncing away. During the fourth attempt, the student replicates the force of the third attempt, yet aims to the left. The ball then strikes the left rim and bounces away. Finally, the student replicates the force while aiming marginally to the right, thereby causing the ball to enter the basket. The shot was, by degrees, refined to an accurate form.

Shaping might be systematically applied in the case of a hyperactive student capable of concentrating upon a task for a mere couple of minutes before succumbing to distraction. The objective is to mould the student's behaviour such that they can labour uninterrupted for a duration of thirty minutes. Initially, the teacher bestows a reinforcer upon the student when they engage in productive work for a period of two minutes. Subsequent to several successful two-minute intervals, the criterion for reinforcement is elevated to three minutes. Assuming the student works uninterrupted for several three-minute periods, the criterion is further raised to four minutes. This process continues towards the ultimate goal of thirty minutes, contingent upon the student's reliable performance at the criterion level. Should the student encounter difficulty at any juncture, the criterion for reinforcement is diminished to a level at which they can perform successfully.

An academic skill amenable to shaping is the instruction of a student in the multiplication facts for 6. At present, he possesses knowledge only of and In order to merit reinforcement, he must correctly recite these two facts, together with Upon achieving reliable proficiency in this regard, the criterion for reinforcement is augmented to encompass This process persists until such time as he accurately recites all the facts up to

Chaining

The majority of human actions are complex, encompassing several three-term contingencies (A–B–C) linked in succession. For instance, the act of shooting a basketball necessitates dribbling, turning, assuming the set position, jumping, and releasing the ball. Each response effects an alteration in the environment, and this altered condition serves as the stimulus for the ensuing response. Chaining is the process whereby one generates or modifies certain variables that subsequently serve as stimuli for future responses (Skinner, 1953). A chain comprises a sequence of operants, each of which establishes the occasion for further responses.

Chains bear resemblance to Guthrie's acts, whereas individual three-term contingencies are analogous to movements. Certain chains acquire a functional unity, with the chain constituting an integrated sequence, such that successful execution thereof defines a skill. When skills are well-honed, the execution of the chain transpires automatically. The act of riding a bicycle comprises several discrete actions, yet an accomplished rider executes these with negligible or non-existent conscious effort. Such automaticity is frequently present in cognitive skills (e.g., reading, solving mathematical problems). Chaining plays a critical role in the acquisition of skills (Gollub, 1977; Skinner, 1978).

Behaviour Modification

Behaviour modification (or behaviour therapy) refers to the systematic application of behavioral learning principles to facilitate adaptive behaviours (Ullmann & Krasner, 1965). Behaviour modification hath been employed with adults and children in such diverse contexts as classrooms, counseling settings, prisons, and mental hospitals. It hath been used to treat phobias, dysfunctional language, disruptive behaviours, negative social interactions, poor child rearing, and low self-control (Ayllon & Azrin, 1968; Becker, 1971; Keller & Ribes-Inesta, 1974; Ulrich, Stachnik, & Mabry, 1966). Lovaas (1977) successfully employed behaviour modification to teach language to autistic children.

Behaviour Modification

Behaviour modification for disruptive students is difficult, inasmuch as such students may display few appropriate responses to be positively reinforced. A teacher might employ shaping to address a specific annoying behaviour. Mrs. Kathy Stone hath been experiencing problems with Master Erik, who continually pushes and shoves other students when the class gets in line to proceed somewhere within the building. When the class is proceeding only a short distance, Mrs. Stone could inform Master Erik that should he remain in line without pushing and shoving, he shall be the line leader on the return to the class; howbeit, should he push or shove, he shall immediately be removed from the line. This procedure can be repeated until Master Erik can manage short distances. Mrs. Stone can then allow him to walk with the class for progressively longer distances until he can behave in line for any distance.

Miss Sarah, another child in Mrs. Kathy Stone’s class, frequently submits untidy work. Mrs. Stone might employ generalized reinforcers, such as special stickers (exchangeable for various privileges) to assist Miss Sarah, whose work typically is dirty, torn, and barely legible. Miss Sarah is informed that should she submit a paper that is clean, she shall earn one sticker; should it not be torn, another sticker; and should the writing be neat, a third sticker. Once Miss Sarah commences to make improvements, Mrs. Stone may gradually shift the rewards to other areas for improvement (e.g., correct work, completing work on time).

Techniques

The basic techniques of behaviour modification include reinforcement of desired behaviours and extinction of undesired ones. Punishment is rarely employed, but when used, more often involves removing a positive reinforcer rather than presenting a negative reinforcer.

In deciding upon a program of change, behaviour modifiers typically focus upon the following three issues (Ullmann & Krasner, 1965):

Which of the individual’s behaviours are maladaptive, and which should be increased (decreased)?
What environmental contingencies currently support the individual’s behaviours (either to maintain undesirable behaviours or to reduce the likelihood of performing more adaptive responses)?
What environmental features can be altered to change the individual’s behaviour?

Change is most probable when modifiers and clients agree that a change is requisite and jointly determine the desired goals. The first step in establishing a program is to define the problem in behavioral terms. For example, the statement, “Master Keith is out of his seat too often,” refers to overt behaviour that can be measured: One can maintain a record of the duration for which Master Keith is out of his seat. General expressions referring to unobservables (“Master Keith possesses a disagreeable attitude”) do not permit objective problem definition.

The next step is to determine the reinforcers maintaining undesirable behaviour. Perhaps Master Keith is receiving teacher attention only when he exits his seat, and not when he is seated. A straightforward plan is to have the teacher attend to Master Keith whilst he is seated and engaged in academic work, and to disregard him when he exits his seat. Should the frequency with which Master Keith is out of his seat diminish, teacher attention serves as a positive reinforcer.

A behaviour modification program might employ such generalized reinforcers as points that students exchange for backup reinforcers, such as tangible rewards, free time, or privileges. Having more than one backup ensures that at least one shall be effective for each student at all times. A behavioral criterion must be established to earn reinforcement. The five-step shaping procedure (discussed previously) can be employed. The criterion is initially defined at the level of initial behaviour and progresses in small increments towards the desired behaviour. A point is given to the student each time the criterion is satisfied. To extinguish any undesirable behaviour by Master Keith, the teacher should not grant him excessive attention if he exits his seat, but rather should inform him privately that, inasmuch as he does not satisfy the criterion, he does not earn a point.

Punishment is used infrequently, but may be requisite when behaviour becomes so disruptive that it cannot be ignored (e.g., fisticuffs). A common punishment is time-out (from reinforcement). During time-out, the student is removed from the class social context. There the student continues to engage in academic work without peer social interaction or the opportunity to earn reinforcement. Another punishment is to remove positive reinforcers (e.g., free time, recess, privileges) for misbehaviour.

Critics have argued that behaviour modification shapes quiet and docile behaviours (Winett & Winkler, 1972). Although a reasonable measure of quiet is requisite to ensure that learning occurs, some teachers seek a quiet classroom at all times, even when some noise from social interactions would facilitate learning. The use of behaviour modification is inherently neither good nor bad. It can produce a quiet classroom or promote social initiations by withdrawn children (Strain, Kerr, & Ragland, 1981). Like the techniques themselves, the goals of behaviour modification need to be considered carefully by those implementing the procedures.

Cognitive Behaviour Modification

Researchers also have incorporated cognitive elements into behaviour modification procedures. In cognitive behaviour modification, learners’ thoughts (when verbalized) function as discriminative and reinforcing stimuli. Thus, learners may verbally instruct themselves what to do and then perform the appropriate behaviour. Cognitive behaviour modification techniques often are applied with students with handicaps (Hallahan, Kneedler, & Lloyd, 1983), and used to reduce hyperactivity and aggression (Robinson, Smith, Miller, & Brownell, 1999). Meichenbaum’s (1977) self-instructional training is an instance of cognitive behaviour modification.

Self-Regulation

Operant conditioning doth also address self-regulation (Mace, Belfiore, & Hutchinson, 2001; Mace, Belfiore, & Shea, 1989). This perspective is covered in depth in Chapter 9. Operant theory contendeth that self-regulated behaviour involveth choosing amongst alternative courses of action (Brigham, 1982), typically by deferring an immediate reinforcer in favour of a different, and usually greater, future reinforcer. For example, Trisha doth stay home on Friday night to study for an examination instead of going out with friends, and Kyle doth keep working on an academic task despite taunting peers nearby. They are deferring immediate reinforcement for anticipated future reinforcement, as is John in the next example.

John is having difficulty studying. Despite good intentions, he spendeth insufficient time studying and is easily distracted. A key to changing his behaviour is to establish discriminative stimuli (cues) for studying. With the assistance of his high-school counsellor, John doth establish a definite time and place for studying (7 P.M. to 9 P.M. in his room with one 10-minute break). To eliminate distracting cues, John agreeth not to use his cell phone, CD player, computer, or TV during this period. For reinforcement, John will award himself one point for each night he successfully accomplisheth his routine. When he receiveth 10 points, he can take a night off.

From an operant conditioning perspective, one decideth which behaviours to regulate, establisheth discriminative stimuli for their occurrence, evaluateth performance in terms of whether it matcheth the standard, and administereth reinforcement. As discussed, the three key subprocesses are self-monitoring (deliberate attention to selected aspects of one’s behaviour), self-instruction ( s that setteth the occasion for self-regulatory leading to , and self-reinforcement (reinforcing oneself for performing a correct response).