首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Relative reward preference in primate orbitofrontal cortex   总被引:33,自引:0,他引:33  
Tremblay L  Schultz W 《Nature》1999,398(6729):704-708
The orbital part of prefrontal cortex appears to be crucially involved in the motivational control of goal-directed behaviour. Patients with lesions of orbitofrontal cortex show impairments in making decisions about the expected outcome of actions. Monkeys with orbitofrontal lesions respond abnormally to changes in reward expectations and show altered reward preferences. As rewards constitute basic goals of behaviour, we investigated here how neurons in the orbitofrontal cortex of monkeys process information about liquid and food rewards in a typical frontal task, spatial delayed responding. The activity of orbitofrontal neurons increases in response to reward-predicting signals, during the expectation of rewards, and after the receipt of rewards. Neurons discriminate between different rewards, mainly irrespective of the spatial and visual features of reward-predicting stimuli and behavioural reactions. Most reward discriminations reflect the animals' relative preference among the available rewards, as expressed by their choice behaviour, rather than physical reward properties. Thus, neurons in the orbitofrontal cortex appear to process the motivational value of rewarding outcomes of voluntary action.  相似文献   

2.
Dopamine responses comply with basic assumptions of formal learning theory.   总被引:25,自引:0,他引:25  
P Waelti  A Dickinson  W Schultz 《Nature》2001,412(6842):43-48
According to contemporary learning theories, the discrepancy, or error, between the actual and predicted reward determines whether learning occurs when a stimulus is paired with a reward. The role of prediction errors is directly demonstrated by the observation that learning is blocked when the stimulus is paired with a fully predicted reward. By using this blocking procedure, we show that the responses of dopamine neurons to conditioned stimuli was governed differentially by the occurrence of reward prediction errors rather than stimulus-reward associations alone, as was the learning of behavioural reactions. Both behavioural and neuronal learning occurred predominantly when dopamine neurons registered a reward prediction error at the time of the reward. Our data indicate that the use of analytical tests derived from formal behavioural learning theory provides a powerful approach for studying the role of single neurons in learning.  相似文献   

3.
Komura Y  Tamura R  Uwano T  Nishijo H  Kaga K  Ono T 《Nature》2001,412(6846):546-549
Reward is important for shaping goal-directed behaviour. After stimulus-reward associative learning, an organism can assess the motivational value of the incoming stimuli on the basis of past experience (retrospective processing), and predict forthcoming rewarding events (prospective processing). The traditional role of the sensory thalamus is to relay current sensory information to cortex. Here we find that non-primary thalamic neurons respond to reward-related events in two ways. The early, phasic responses occurred shortly after the onset of the stimuli and depended on the sensory modality. Their magnitudes resisted extinction and correlated with the learning experience. The late responses gradually increased during the cue and delay periods, and peaked just before delivery of the reward. These responses were independent of sensory modality and were modulated by the value and timing of the reward. These observations provide new evidence that single thalamic neurons can code for the acquired significance of sensory stimuli in the early responses (retrospective coding) and predict upcoming reward value in the late responses (prospective coding).  相似文献   

4.
Burke KA  Franz TM  Miller DN  Schoenbaum G 《Nature》2008,454(7202):340-344
Cues that reliably predict rewards trigger the thoughts and emotions normally evoked by those rewards. Humans and other animals will work, often quite hard, for these cues. This is termed conditioned reinforcement. The ability to use conditioned reinforcers to guide our behaviour is normally beneficial; however, it can go awry. For example, corporate icons, such as McDonald's Golden Arches, influence consumer behaviour in powerful and sometimes surprising ways, and drug-associated cues trigger relapse to drug seeking in addicts and animals exposed to addictive drugs, even after abstinence or extinction. Yet, despite their prevalence, it is not known how conditioned reinforcers control human or other animal behaviour. One possibility is that they act through the use of the specific rewards they predict; alternatively, they could control behaviour directly by activating emotions that are independent of any specific reward. In other words, the Golden Arches may drive business because they evoke thoughts of hamburgers and fries, or instead, may be effective because they also evoke feelings of hunger or happiness. Moreover, different brain circuits could support conditioned reinforcement mediated by thoughts of specific outcomes versus more general affective information. Here we have attempted to address these questions in rats. Rats were trained to learn that different cues predicted different rewards using specialized conditioning procedures that controlled whether the cues evoked thoughts of specific outcomes or general affective representations common to different outcomes. Subsequently, these rats were given the opportunity to press levers to obtain short and otherwise unrewarded presentations of these cues. We found that rats were willing to work for cues that evoked either outcome-specific or general affective representations. Furthermore the orbitofrontal cortex, a prefrontal region important for adaptive decision-making, was critical for the former but not for the latter form of conditioned reinforcement.  相似文献   

5.
Environmental stimuli that are reliably associated with the effects of many abused drugs, especially stimulants such as cocaine, can produce craving and relapse in abstinent human substance abusers. In animals, such cues can induce and maintain drug-seeking behaviour and also reinstate drug-seeking after extinction. Reducing the motivational effects of drug-related cues might therefore be useful in the treatment of addiction. Converging pharmacological, human post-mortem and genetic studies implicate the dopamine D3 receptor in drug addiction. Here we have designed BP 897, the first D3-receptor-selective agonist, as assessed in vitro with recombinant receptors and in vivo with mice bearing disrupted D3-receptor genes. BP 897 is a partial agonist in vitro and acts in vivo as either an agonist or an antagonist. We show that BP 897 inhibits cocaine-seeking behaviour that depends upon the presentation of drug-associated cues, without having any intrinsic, primary rewarding effects. Our data indicate that compounds like BP 897 could be used for reducing the drug craving and vulnerability to relapse that are elicited by drug-associated environmental stimuli.  相似文献   

6.
Matsumoto M  Hikosaka O 《Nature》2007,447(7148):1111-1115
Midbrain dopamine neurons are key components of the brain's reward system, which is thought to guide reward-seeking behaviours. Although recent studies have shown how dopamine neurons respond to rewards and sensory stimuli predicting reward, it is unclear which parts of the brain provide dopamine neurons with signals necessary for these actions. Here we show that the primate lateral habenula, part of the structure called the epithalamus, is a major candidate for a source of negative reward-related signals in dopamine neurons. We recorded the activity of habenula neurons and dopamine neurons while rhesus monkeys were performing a visually guided saccade task with positionally biased reward outcomes. Many habenula neurons were excited by a no-reward-predicting target and inhibited by a reward-predicting target. In contrast, dopamine neurons were excited and inhibited by reward-predicting and no-reward-predicting targets, respectively. Each time the rewarded and unrewarded positions were reversed, both habenula and dopamine neurons reversed their responses as the bias in saccade latency reversed. In unrewarded trials, the excitation of habenula neurons started earlier than the inhibition of dopamine neurons. Furthermore, weak electrical stimulation of the lateral habenula elicited strong inhibitions in dopamine neurons. These results suggest that the inhibitory input from the lateral habenula plays an important role in determining the reward-related activity of dopamine neurons.  相似文献   

7.
Subsecond dopamine release promotes cocaine seeking   总被引:25,自引:0,他引:25  
Phillips PE  Stuber GD  Heien ML  Wightman RM  Carelli RM 《Nature》2003,422(6932):614-618
The dopamine-containing projection from the ventral tegmental area of the midbrain to the nucleus accumbens is critically involved in mediating the reinforcing properties of cocaine. Although neurons in this area respond to rewards on a subsecond timescale, neurochemical studies have only addressed the role of dopamine in drug addiction by examining changes in the tonic (minute-to-minute) levels of extracellular dopamine. To investigate the role of phasic (subsecond) dopamine signalling, we measured dopamine every 100 ms in the nucleus accumbens using electrochemical technology. Rapid changes in extracellular dopamine concentration were observed at key aspects of drug-taking behaviour in rats. Before lever presses for cocaine, there was an increase in dopamine that coincided with the initiation of drug-seeking behaviours. Notably, these behaviours could be reproduced by electrically evoking dopamine release on this timescale. After lever presses, there were further increases in dopamine concentration at the concurrent presentation of cocaine-related cues. These cues alone also elicited similar, rapid dopamine signalling, but only in animals where they had previously been paired to cocaine delivery. These findings reveal an unprecedented role for dopamine in the regulation of drug taking in real time.  相似文献   

8.
Tye KM  Stuber GD  de Ridder B  Bonci A  Janak PH 《Nature》2008,453(7199):1253-1257
What neural changes underlie individual differences in goal-directed learning? The lateral amygdala (LA) is important for assigning emotional and motivational significance to discrete environmental cues, including those that signal rewarding events. Recognizing that a cue predicts a reward enhances an animal's ability to acquire that reward; however, the cellular and synaptic mechanisms that underlie cue-reward learning are unclear. Here we show that marked changes in both cue-induced neuronal firing and input-specific synaptic strength occur with the successful acquisition of a cue-reward association within a single training session. We performed both in vivo and ex vivo electrophysiological recordings in the LA of rats trained to self-administer sucrose. We observed that reward-learning success increased in proportion to the number of amygdala neurons that responded phasically to a reward-predictive cue. Furthermore, cue-reward learning induced an AMPA (alpha-amino-3-hydroxy-5-methyl-isoxazole propionic acid)-receptor-mediated increase in the strength of thalamic, but not cortical, synapses in the LA that was apparent immediately after the first training session. The level of learning attained by individual subjects was highly correlated with the degree of synaptic strength enhancement. Importantly, intra-LA NMDA (N-methyl-d-aspartate)-receptor blockade impaired reward-learning performance and attenuated the associated increase in synaptic strength. These findings provide evidence of a connection between LA synaptic plasticity and cue-reward learning, potentially representing a key mechanism underlying goal-directed behaviour.  相似文献   

9.
Electrophysiological studies have utilized event-related brain potentials (ERPs) to investigate neural processes related to the evaluation of the outcome of behavioral performance or to the evaluation of external feedback. The feedback-related negativity (FRN) in brain potentials has been shown to be sensitive to information indicating monetary loss or negative feedback. Since monetary loss usually indicates both the consequence of previous performance and the reward value of stimuli, it is controversial whether the FRN reflects the cognitive process of error detection per se and/or the motivational/affective process related to the subjective evaluation of the error. This study manipulated the motivational/affective significance of negative feedback by penalizing errors in a context-dependent way in a line judgment task. Participants could lose more money in the loss incentive condition or win less money in the win incentive condition if their subsequent judgment of line segments was less accurate, whereas they could receive performance feedback but without monetary incentive in the neutral condition. Results showed that the size of the FRN effect as well as the size of the P300 effect, as assessed by comparing brain responses to the error trials with the responses to the correct trials, increased linearly over the loss, neutral, and win conditions, suggesting that the FRN is sensitive to the motivational/affective evaluation of the performance outcome.  相似文献   

10.
Electrophysiological studies have utilized event-related brain potentials (ERPs) to investigate neural processes related to the evaluation of the outcome of behavioral performance or to the evaluation of external feedback. The feedback-related negativity (FRN) in brain potentials has been shown to be sensitive to information indicating monetary loss or negative feedback. Since monetary loss usually indicates both the consequence of previous performance and the reward value of stimuli, it is controversial whether the FRN reflects the cognitive process of error detection per se and/or the motivational/affective process related to the subjective evaluation of the error. This study manipulated the motivational/affective significance of negative feedback by penalizing errors in a context-dependent way in a line judgment task. Participants could lose more money in the loss incentive condition or win less money in the win incentive condition if their subsequent judgment of line segments was less accurate, whereas they could receive performance feedback but without monetary incentive in the neutral condition. Results showed that the size of the FRN effect as well as the size of the P300 effect, as assessed by comparing brain responses to the error trials with the responses to the correct trials, increased linearly over the loss, neutral, and win conditions, suggesting that the FRN is sensitive to the motivational/affective evaluation of the performance outcome.  相似文献   

11.
S Bao  V T Chan  M M Merzenich 《Nature》2001,412(6842):79-83
Representations of sensory stimuli in the cerebral cortex can undergo progressive remodelling according to the behavioural importance of the stimuli. The cortex receives widespread projections from dopamine neurons in the ventral tegmental area (VTA), which are activated by new stimuli or unpredicted rewards, and are believed to provide a reinforcement signal for such learning-related cortical reorganization. In the primary auditory cortex (AI) dopamine release has been observed during auditory learning that remodels the sound-frequency representations. Furthermore, dopamine modulates long-term potentiation, a putative cellular mechanism underlying plasticity. Here we show that stimulating the VTA together with an auditory stimulus of a particular tone increases the cortical area and selectivity of the neural responses to that sound stimulus in AI. Conversely, the AI representations of nearby sound frequencies are selectively decreased. Strong, sharply tuned responses to the paired tones also emerge in a second cortical area, whereas the same stimuli evoke only poor or non-selective responses in this second cortical field in naive animals. In addition, we found that strong long-range coherence of neuronal discharge emerges between AI and this secondary auditory cortical area.  相似文献   

12.
Murtra P  Sheasby AM  Hunt SP  De Felipe C 《Nature》2000,405(6783):180-183
Modulation of substance P activity offers a radical new approach to the management of depression, anxiety and stress. The substance P receptor is highly expressed in areas of the brain that are implicated in these behaviours, but also in other areas such as the nucleus accumbens which mediate the motivational properties of both natural rewards such as food and of drugs of abuse such as opiates. Here we show a loss of the rewarding properties of morphine in mice with a genetic disruption of the substance P receptor. The loss was specific to morphine, as both groups of mice responded when cocaine or food were used as rewards. The physical response to opiate withdrawal was also reduced in substance P receptor knockout mice. We conclude that substance P has an important and specific role in mediating the motivational aspects of opiates and may represent a new pharmacological route for the control of drug abuse.  相似文献   

13.
Wynne CD 《Nature》2004,428(6979):140; discussion 140
Brosnan and de Waal report that capuchin monkeys show evidence of a sense of fairness or 'inequity aversion' because they rejected a less preferred reward when they saw a partner monkey receive a preferred reward for the same task. However, this does not show that monkeys are averse to inequity, only that they reject a lesser reward when better rewards are available. There are risks inherent in seeking anthropomorphic explanations for non-human behaviour.  相似文献   

14.
Mesolimbic dopamine-releasing neurons appear to be important in the brain reward system. One behavioural paradigm that supports this hypothesis is intracranial self-stimulation (ICS), during which animals repeatedly press a lever to stimulate their own dopamine-releasing neurons electrically. Here we study dopamine release from dopamine terminals in the nucleus accumbens core and shell in the brain by using rapid-responding voltammetric microsensors during electrical stimulation of dopamine cell bodies in the ventral tegmental area/substantia nigra brain regions. In rats in which stimulating electrode placement failed to elicit dopamine release in the nucleus accumbens, ICS behaviour was not learned. In contrast, ICS was acquired when stimulus trains evoked extracellular dopamine in either the core or the shell of the nucleus accumbens. In animals that could learn ICS, experimenter-delivered stimulation always elicited dopamine release. In contrast, extracellular dopamine was rarely observed during ICS itself. Thus, although activation of mesolimbic dopamine-releasing neurons seems to be a necessary condition for ICS, evoked dopamine release is actually diminished during ICS. Dopamine may therefore be a neural substrate for novelty or reward expectation rather than reward itself.  相似文献   

15.
Cohen JY  Haesler S  Vong L  Lowell BB  Uchida N 《Nature》2012,482(7383):85-88
Dopamine has a central role in motivation and reward. Dopaminergic neurons in the ventral tegmental area (VTA) signal the discrepancy between expected and actual rewards (that is, reward prediction error), but how they compute such signals is unknown. We recorded the activity of VTA neurons while mice associated different odour cues with appetitive and aversive outcomes. We found three types of neuron based on responses to odours and outcomes: approximately half of the neurons (type I, 52%) showed phasic excitation after reward-predicting odours and rewards in a manner consistent with reward prediction error coding; the other half of neurons showed persistent activity during the delay between odour and outcome that was modulated positively (type II, 31%) or negatively (type III, 18%) by the value of outcomes. Whereas the activity of type I neurons was sensitive to actual outcomes (that is, when the reward was delivered as expected compared to when it was unexpectedly omitted), the activity of type II and type III neurons was determined predominantly by reward-predicting odours. We 'tagged' dopaminergic and GABAergic neurons with the light-sensitive protein channelrhodopsin-2 and identified them based on their responses to optical stimulation while recording. All identified dopaminergic neurons were of type I and all GABAergic neurons were of type II. These results show that VTA GABAergic neurons signal expected reward, a key variable for dopaminergic neurons to calculate reward prediction error.  相似文献   

16.
针对强化学习算法收敛速度慢、奖赏函数的设计需要改进的问题,提出一种新的强化学习算法.新算法使用行动分值作为智能行为者选择动作的依据.行动分值比传统的状态值具有更高的灵活性,因此更容易针对行动分值设计更加优化的奖赏函数,提高学习的性能.以行动分值为基础,使用了指数函数和对数函数,动态确定奖赏值与折扣系数,加快行为者选择最优动作.从走迷宫的计算机仿真程序可以看出,新算法显著减少了行为者在收敛前尝试中执行的动作次数,提高了收敛速度.  相似文献   

17.
混合P2P环境下基于信度模型的激励策略   总被引:2,自引:0,他引:2  
针对困扰P2P文件共享系统的搭便车问题,构造了一种基于节点信度的激励模型.该模型中每个节点都是一个信度实体.引入信度收益函数使节点根据资源请求者的信度值分配资源,使信度收益最大化.并应用回溯算法求解信度收益最大化问题.引入信度衰减机制防止信度值"通货膨胀"且收到更好激励效果,该模型给出了节点信度的计算方法.针对当前激励机制评价标准的不足,补充了一个新的评价参数,即资源有效利用率.实验证明,该模型能够有效抑制P2P系统中的搭便车问题,提高了系统效率.  相似文献   

18.
Pessiglione M  Seymour B  Flandin G  Dolan RJ  Frith CD 《Nature》2006,442(7106):1042-1045
Theories of instrumental learning are centred on understanding how success and failure are used to improve future decisions. These theories highlight a central role for reward prediction errors in updating the values associated with available actions. In animals, substantial evidence indicates that the neurotransmitter dopamine might have a key function in this type of learning, through its ability to modulate cortico-striatal synaptic efficacy. However, no direct evidence links dopamine, striatal activity and behavioural choice in humans. Here we show that, during instrumental learning, the magnitude of reward prediction error expressed in the striatum is modulated by the administration of drugs enhancing (3,4-dihydroxy-L-phenylalanine; L-DOPA) or reducing (haloperidol) dopaminergic function. Accordingly, subjects treated with L-DOPA have a greater propensity to choose the most rewarding action relative to subjects treated with haloperidol. Furthermore, incorporating the magnitude of the prediction errors into a standard action-value learning algorithm accurately reproduced subjects' behavioural choices under the different drug conditions. We conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.  相似文献   

19.
综合考虑再励学习的两个重要子问题 :连续空间及语言评价问题 ,提出了一种新的学习方法 ,即面向语言评价的 Takagi-Sugeno(T-S)模糊再励学习。该学习智能体构建在 Q-学习方法和 Takagi-Sugeno模糊推理系统的基础上 ,适于处理连续域的复杂学习任务 ,亦可用于设计 Takagi-Sugeno模糊逻辑控制器。以二级倒立摆控制系统为例 ,仿真研究验证了学习算法的有效性  相似文献   

20.
The ability to use environmental stimuli to predict impending harm is critical for survival. Such predictions should be available as early as they are reliable. In pavlovian conditioning, chains of successively earlier predictors are studied in terms of higher-order relationships, and have inspired computational theories such as temporal difference learning. However, there is at present no adequate neurobiological account of how this learning occurs. Here, in a functional magnetic resonance imaging (fMRI) study of higher-order aversive conditioning, we describe a key computational strategy that humans use to learn predictions about pain. We show that neural activity in the ventral striatum and the anterior insula displays a marked correspondence to the signals for sequential learning predicted by temporal difference models. This result reveals a flexible aversive learning process ideally suited to the changing and uncertain nature of real-world environments. Taken with existing data on reward learning, our results suggest a critical role for the ventral striatum in integrating complex appetitive and aversive predictions to coordinate behaviour.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号