机器学习讲座ppt课件.pptx
,Machine Learning and its application,Relationships among AI,ML,DL人工智能、机器学习、深度学习的关系,人工智能:机器展现的人类智能,机器学习:实现人工智能的一种方法,深度学习:实现机器学习的一种技术,Outline,Introduction of Machine Learning,Why Deep?,How to learn it?,Application of deep learning,Products,BaiduEye百度识图Google Glass,Apple Siri微软小冰,Products (NLP),智能对话、百科、天气、星座、笑话、交通指南、餐饮点评等,JIMI智能机器人,售前咨询,售后服务,生活伴侣,场景,用户画像,提供个性化的产品服务,人口属性:地域、年龄、性别、文化、职业、收入、生活习惯、消费习惯等产品行为:产品类别、活跃频率、产品喜好、产品驱动、使用习惯、产品消费等,京东的JIMI智能机器人 DNN Lab首席科学家李成华:“用深度学习搞定80%的客服工作。”,Products (NLP),Handwriting recognition(LeNet-5),http:/,Yann Lecun于1989年提出的CNN原型,成功应用于欧洲很多国家的手写支票识别。,Land Cover Classification,Deep Dream, Given a photo, machine adds what it sees ,http:/,Deep Dream, Given a photo, machine adds what it sees ,http:/,Deep Style, Given a photo, make its style like famous paintings,https:/,Deep Style, Given a photo, make its style like famous paintings,https:/,Deep Style,CNNcontent,CNNstyle,CNN?,Outline,Introduction of Machine Learning,Why Deep?,How to learn it?,Application of deep learning,Machine Learning Looking for a Function Speech Recognition, Playing Go, Dialogue System,f Image Recognitionf,f,f, “How are you” “Cat”,“Hello”,“Hi”,(what the user said),(system response), “5-5” (next move),f1f1,“cat”“dog”,f2f2,“money”“snake”,FrameworkModelA set offunctionf1, f2,f,“cat”,Image Recognition:,“cat”,Image Recognition:,FrameworkModelA set offunctionf1, f2,TrainingData,fBetter!,“cat”,“dog”,function input:,function output: “monkey”,Goodness offunction fSupervised Learning,Framework,A set of,function,f1, f2,f,“cat”,Image Recognition:,Model,TrainingData,“monkey”,“cat”,“dog”,Using,f ,“cat”,Training,Testing,Step 1,Goodness offunction fStep 2,Pick the “Best” Functionf *Step 3,Step 1:define a setof function,Step 2:goodness offunction,Step 3: pickthe bestfunction,Three Steps for Deep Learning,Deep Learning is so simple ,Neural,Step 1:define a setfunction,of Network,Step 2:goodness offunction,Step 3: pickthe bestfunction,Three Steps for Deep Learning,Deep Learning is so simple ,Human Brains,PlayGround的网址是:http:/playground.tensorflow.org/,w1,a1,akaK,b,a,wkwKweights,Neural NetworkNeuronz a1w1akwk aKwK b,A simple function,z zActivationfunctionbias,Neural Network,z,Activationfunctionbias,Neuron,1,-2,-1weights,1,2,-1,1,4,z,z,z,11e,z,Sigmoid Function,0.98,z,z,z,z,Neural NetworkDifferent connections leads todifferent network structure,Each neurons can have different valuesof weights and biases.Weights and biases are network parameters ,Fully Connect FeedforwardNetwork,z,z,z,11e,z,Sigmoid Function,1,-1,1,-2,-11,1,4,-20,0.98,0.12,Fully Connect FeedforwardNetwork,1,-2,1,-1,1,0,4,-2,0.12,0.98 2,-1,-1,-2,-1,4,-1,0.86 3,0.11,0.62,0.83,0,0,-2,2,1,-1,Fully Connect FeedforwardNetwork,1,-2,-11,1,0,0.5,0.73 2,-1,-2-1,3,-1,-14,0.72,0.12,0.51,0.85,0,0,-2,2,00,=,0.510.85,11,=,0.620.83,0,0,This is a function.Input vector, output vector,Given parameters , define a functionGiven network structure, define a function set,Output,Layer,Hidden Layers,Input,Layer,Fully Connect Feedforward,Network,Layer 1,Inputx1x2xN,Layer 2,Layer L,Outputy1y2yM,Deep means many hidden layers,neuron,Output Layer (Option) Softmax layer as the output layerOrdinary Layer,y1 z1y2 z2y3 z3,z1z2z3,In general, the output ofnetwork can be any value.May not be easy to interpret,y1 e,e,z2,z2 2.7,0.05,e, e,e,e,z1, Softmax layer as the output layerSoftmax Layer,e,ee,z1,e,e,3j1,z1,z j,z33j1,z j,3,1z3 -3,20,0.88,Output Layer (Option)Probability:, 1 0 = 1,3j13j1,z2z3,z jz j,0.12y2 e0y3 e,Example Application,Input,Output,16 x 16 = 256,x1,x2,x256,Ink 1No ink 0,y,y2,y10,Each dimension representsthe confidence of a digit.,is 1,is 2,is 0,0.1,0.7,0.2,The image,is “2”,Machine,Example Application Handwriting Digit Recognition,x1,x2x256,y1,y2“2”y10,is 1,is 2is 0,function ,Input:256-dim vector,output:10-dim vector,NeuralNetworkWhat is needed is a,Example Application,Input,Output,Layer 1,x1x2xNInputLayer,Layer 2,Layer L,y1y2“2”y10,is 1is 2is 0,A function set containing thecandidates forHandwriting Digit RecognitionOutputHidden LayersLayer,You need to decide the network structure tolet a good function in your function set.,FAQ Q: How many layers? How many neurons for eachlayer?, Q: Can the structure be automatically determined?,Trial and Error,Intuition,+,Neural,of Network,Step 1:define a set,Step 2:goodness offunction,Step 3: pickthe bestfunction,Three Steps for Deep Learning,Deep Learning is so simple ,Training Data Preparing training data: images and their labels,The learning target is defined onthe training data.,“1”“3”,“4”“1”,“0”“2”,“5”“9”,Softmax,Learning Target,16 x 16 = 256,x1x2,x256,Ink 1,No ink 0,y1y2,y10,The learning target is ,y1 has the maximum valuey2 has the maximum value,Input:Input:,is 1is 2,is 0,Given a set of,Loss,x1,x2,xN,y2,y10,Loss,0,0,Loss can be the distance between thenetwork output and target,target,y1 As close as 1,possible,A good function should make the lossof all examples as small as possible.“1”,parameters,Total Loss,xR,NN,yR,x1x2x3,NNNNNN,y1y2y3,123,For all training data , =,=1,Find the network,parameters thatminimize total loss L,Total Loss:,123,As small as possibleFind a function infunction set thatminimizes total loss L,Neural,of Network,Step 1:define a setfunction,Step 2:goodness offunction,Step 3: pickthe bestfunction,Three Steps for Deep Learning,Deep Learning is so simple ,Gradient Descent,Total,Loss ,Network parameters =1,2,1,2, Compute ,Increase wDecrease ww,NegativePositivehttp:/,Find network parameters that minimize total loss L Pick an initial value for w,Gradient Descent,TotalLoss ,Network parameters =1,2,1,2,w, ,“learning rate”, Compute Repeat is called,Find network parameters that minimize total loss L Pick an initial value for w,Gradient Descent,TotalLoss ,Network parameters =1,2,1,2, Compute Repeat Until is approximately small(when update is little)w,Find network parameters that minimize total loss L Pick an initial value for w,1,2,Gradient Descent - Difficulty Gradient descent never guarantee global minimaDifferent initial point,Reach different minima,so different resultsThere are some tips tohelp you avoid local,minima, no guarantee.,Outline,Introduction of Deep Learning,Why Deep?,How to learn it?,Application of deep learning,Deeper is Better?,Seide, Frank, Gang Li, and Dong Yu. Conversational Speech TranscriptionUsing Context-Dependent Deep Neural Networks. Interspeech. 2011.,Reference for the reason:http:/,Universality TheoremAny continuous function ff :RN RMCan be realized by a networkwith one hidden layer,(given enough hiddenneurons),Why “Deep” neural network not “Fat” neural network?,x1,x2,xN,Deep,x1,x2,xN,Shallow,Fat + Short v.s. Thin + TallThe same numberof parameters,Fat + Short v.s. Thin + Tall,Seide, Frank, Gang Li, and Dong Yu. Conversational Speech TranscriptionUsing Context-Dependent Deep Neural Networks. Interspeech. 2011.,weak,長長髮,Little examples,短 短髮,Modularization Deep Modularization,Image,Girls withlong hairBoys withlong hairGirls withshort hairBoys withshort hair,長髮髮女女 長髮女女長髮男短髮髮女女短髮 髮女 女短髮髮男男 短髮男男,Classifier1Classifier2Classifier3Classifier4,長長髮,短 短髮,女女短髮 髮,女 女,短 短髮,Modularization Deep Modularization,Image,髮,長髮髮短髮髮 女女 長髮髮女女短髮 女女女 女長髮髮女女 長髮女女長髮男,長髮男v.s. 短髮髮男男 短髮男男短髮髮v.s.短髮髮男男 短髮男男,Each basic classifier can havesufficient training examples.,Boy or Girl?BasicClassifierLong orshort?Classifiers for theattributes,fine,longLittle data,Modularization,Image, Deep ModularizationBoy or Girl?,following classifiersas module,can be trained by little data,Girls withlong hairBoys with,Classifier1Classifier,hairGirls withshort hairBoys withshort hair,2Classifier3Classifier4,BasicClassifierLong orshort?Sharing by the,x1,x2,xN,The most basicclassifiers,Use 1st layer as moduleto build classifiers,Use 2nd layer asmodule ,The modularization is,automatically learned from data.,Modularization Deep Modularization Less training data?,Modularization Deep Modularizationx1x2xN,The most basicclassifiers,Use 1st layer as moduleto build classifiers,Use 2nd layer asmodule ,Reference: Zeiler, M. D., & Fergus, R.,(2014). Visualizing and understandingconvolutional networks. In ComputerVisionECCV 2014 (pp. 818-833),Outline,Introduction of Machine Learning,Why Deep?,How to learn it?,Application of deep learning,Basic Knowledge,数学基础:线性代数,矩阵分析概率论,编程基础:Matlab编程Python编程软件编程思维,深度学习库:TensorFlow编程Theano编程Keras编程,使用 Keras 心得,Thank you for listening,