欢迎来到三一办公! | 帮助中心 三一办公31ppt.com(应用文档模板下载平台)
三一办公
全部分类
  • 办公文档>
  • PPT模板>
  • 建筑/施工/环境>
  • 毕业设计>
  • 工程图纸>
  • 教育教学>
  • 素材源码>
  • 生活休闲>
  • 临时分类>
  • ImageVerifierCode 换一换
    首页 三一办公 > 资源分类 > PPTX文档下载  

    深度学习ppt课件:深度强化学习.pptx

    • 资源ID:2125877       资源大小:2.88MB        全文页数:31页
    • 资源格式: PPTX        下载积分:16金币
    快捷下载 游客一键下载
    会员登录下载
    三方登录下载: 微信开放平台登录 QQ登录  
    下载资源需要16金币
    邮箱/手机:
    温馨提示:
    用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)
    支付方式: 支付宝    微信支付   
    验证码:   换一换

    加入VIP免费专享
     
    账号:
    密码:
    验证码:   换一换
      忘记密码?
        
    友情提示
    2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
    3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
    4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
    5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。

    深度学习ppt课件:深度强化学习.pptx

    Introduction to Deep Reinforcement Learning,Yen-Chen Wu2015/12/11,Outline,Reinforcement LearningMarkov Decision ProcessHow to Solve MDPsDPMCTDQ-learning(DQN)Paper Review,Reinforcement Learning,Branches of Machine Learning,What makes different?,There is no supervisor,only a reward signalFeedback is delayed,not instantaneousTime really matters(sequential,non i.i.d data)Agents actions affect the subsequent data it receives,Goal:Maximize Cumulative Reward,Actions may have long term consequencesReward may be delayedIt may be better to sacrifice immediate reward to gain more long-term reward,Agent&Enviroment,DefenseAttackJump,Full observability vs Partial observabilityLearning and PlanningExploration and ExploitationPrediction and Control,Markov Decision Process,Markov ProcessesMarkov Reward Processes Markov Decision Processes,Markov Process,Markov Reward Processes,Markov Decision Process,Markov Decision Process(MDP),S:finite set of states(observations)A:finite set of actionsP:transition probabilityR:immediate reward:discount factorGoal:Choose policy Maximize expected return:,How to Solve MDP,Dynamic ProgrammingMonte-CarloTemporal-DifferenceQ-Learning,Model-based,Dynamic ProgrammingEvaluate policyUpdate policy,Model Free,Unknown Transition Probability&RewardMC vs TD,Model Free:Q-learning,Instead of tabularoptimal action-value function(Q-learning)=Bellman equation,Basic idea:iterative update(lack of generalization)In practical:function approximatorLinear?Using DNN!,Deep Q-network(DQN),Video,https:/,Deep Q-Network,compute Q-values for all actions,Input:84x84x4,Convolves 32 filters of 8x8 with stride 4Convolves 64 filters of 4x4 with stride 2Convolves 64 filters of 3x3 with stride 1,Full-connected 512 nodes,Output a node for each action,Update DQN,Loss functionGradient,Two Technique,Experience ReplayExperiencePooled MemoryData efficiency(bootstrap)Avoid correlation between samples(variance between batches)Off policy is suitable for Q-learningRandom sampled mini-batchPrioritized sweeping(active learning)Separate Target Networkmore stable than online learning,DEMO,Paper review,Paper list,Massively Parallel Methods for Deep Reinforcement LearningContinuous control with deep reinforcement learningDeep Reinforcement Learning with Double Q-learningPolicy DistillationDueling Network Architectures for Deep Reinforcement LearningMultiagent Cooperation and Competition with Deep Reinforcement Learning,Massively Parallel Methods for Deep Reinforcement LearningArun NairarXiv:1507.04296,DDPG(Deterministic Policy Gradient),DDAC(Deep Deterministic Actor-Critic),Continuous control with deep reinforcement learningTimothy P.LillicraparXiv:1509.02971https:/goo.gl/J4PIAz,Double Q-learning,Policy Distillation,Soft target,Dueling Network,Multiagent,

    注意事项

    本文(深度学习ppt课件:深度强化学习.pptx)为本站会员(小飞机)主动上传,三一办公仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三一办公(点击联系客服),我们立即给予删除!

    温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。




    备案号:宁ICP备20000045号-2

    经营许可证:宁B2-20210002

    宁公网安备 64010402000987号

    三一办公
    收起
    展开