深度学习下的图像视频处理技术课件.pptx
技术创新,变革未来深度学习下的图像视频处理技术,看得更清,看得更懂,目录,夜景增强图像视频去模糊视频超分辨率,1.夜景图像增强,Taking photos is easy,Amateur photographers typically create underexposed photos,Photo Enhancement is required,Image Enhancement,Input,“Auto Enhance”on iPhone,“Auto Tone”in Lightroom,Ours,Existing Photo Editing Tools,Retinex-based MethodsLIME:TIP 17WVM:CVPR 16JieP:ICCV 17 Learning-based MethodsHDRNet:SIGGRAPH 17White-Box:ACM TOG 18Distort-and-Recover:CVPR 18DPE:CVPR 18,Previous Work,Input,WVM CVPR16,JieP ICCV17,HDRNet Siggraph17,DPE CVPR18,White-Box TOG18,Distort-and-Recover CVPR18,Ours,Limitations of Previous Methods,Illumination maps for natural images typically have relatively simple forms with known priors.The model enables customizing the enhancement results by formulating constraints on the illumination.,Why This Model?,Advantage:Effective Learning and Efficient Learning,Network Architecture,Input,Nave Regression,Expert-retouched,Ablation Study,Motivation:The benchmark dataset is collected for enhancing general photos instead of underexposed photos,and contains a small number of underexposed images that cover limited lighting conditions.,OurDataset,Quantitative Comparison:OurDataset,Quantitative Comparison:MIT-Adobe FiveK,Visual Comparison:OurDataset,Input,JieP,HDRNet,DPE,White-box,Distort-and-Recover,Our result,Expert-retouched,Visual Comparison:MIT-Adobe FiveK,Input,JieP,HDRNet,DPE,White-box,Distort-and-Recover,Our result,Expert-retouched,More Comparison Results:User Study,Input,WVM,JieP,HDRNet,DPE,White-Box,Distort-and-Recover,Our result,Limitaion,Input,Our result,演示者2019-05-08 03:51:53-Our work also exists some limitations,the first limitation is the region is almost black without any trace of texture.We can see the top two images.The second limitation is our method doent clear noise in the enhanced result.,More Results,Input,White-box,Distort-and-Recover,Our result,Expert-retouched,JieP,HDRNet,DPE,More Results,Input,White-box,Distort-and-Recover,Our result,Expert-retouched,JieP,HDRNet,DPE,More Results,Input,White-box,Distort-and-Recover,Our result,Expert-retouched,JieP,HDRNet,DPE,More Results,Input,White-box,Distort-and-Recover,Our result,Expert-retouched,JieP,HDRNet,DPE,More Results,Input,WVM,JieP,HDRNet,DPE,White-Box,Distort-and-Recover,Our result,More Results,Input,WVM,JieP,HDRNet,DPE,White-Box,Distort-and-Recover,Our result,More Results,Our result,iPhone,Lightroom,Input,More Results,Our result,iPhone,Lightroom,Input,2.视频超分辨率,Old and FundamentalSeveral decades ago Huang et al,1984 near recent Many ApplicationsHD video generation from low-res sources,Motivation,演示者2019-05-08 03:51:55-The target of video super-resolution is to increase the resolution of videos with rich details.clickIt is an old and fundamental problem that has been studied since several decades ago.clickVideo SR enables many applications,such as High-definition video generation from low-res sources.click,32,Old and FundamentalSeveral decades ago Huang et al,1984 near recent Many ApplicationsHD video generation from low-res sourcesVideo enhancement with details,Motivation,演示者2019-05-08 03:51:55-clickVideo enhancement with details.In this example,characters on the roof and textures of the tree in SR result are much clearer then input.click,33,Old and FundamentalSeveral decades ago Huang et al,1984 near recent Many ApplicationsHD video generation from low-res sourcesVideo enhancement with detailsText/object recognition in surveillance videos,Motivation,演示者2019-05-08 03:51:55-clickAnd also,it can benefit text or object recognition in low-quality surveillance videos.In this example,numbers on the car become recognizable only in the super-resolved result.,34,Image SRTraditional:Freeman et al,2002,Glasner et al,2009,Yang et al,2010,etc.CNN-based:SRCNN Dong et al,2014,VDSR Kim et al,2016,FSRCNN Dong et al,2016,etc.Video SRTraditional:3DSKR Takeda et al,2009,BayesSR Liu et al,2011,MFSR Ma et al,2015,etc.CNN-based:DESR Liao et al,2015,VSRNet Kappeler,et al,2016,Caballeroet al,2016,etc.,35,Previous Work,演示者2019-05-08 03:51:56-Previously,lots of work and methods have been proposed in super-resolution.clickWe list several representative methods here.,EffectivenessHow to make good use of multiple frames?,Remaining Challenges,39,Data from Vid4 Ce Liu et al.,Bicubic x4,Misalignment Large motion Occlusion,演示者2019-05-08 03:51:56-Although video sr has long been studied,there are still remaining challenges in this task.clickThe most important one is effectiveness.clickHow to make good use of multiple frames?clickclickAs shown in this example,objects in neighboring frames are not aligned.And in some extreme cases,there even exist large motion or occlusion,which are very hard to handle.So are multiple frames useful or harmful to super-resolution?,EffectivenessHow to make good use of multiple frames?Are the generated details real?,Remaining Challenges,40,Image SR,Bicubic x4,演示者2019-05-08 03:51:56-clickOn the other hand,are the generated details real details?clickclickCNN-based SR methods incorporate external data.Using only one frame,they can also produce sharp structures.In this example,on the right-hand-side,one SR method generates some clear window patterns on the building,clickbut they are far from real on the left.The problem is,details from external data,may not be true for input image.,EffectivenessHow to make good use of multiple frames?Are the generated details real?,Remaining Challenges,Image SR,Truth,演示者2019-05-08 03:51:56-clickOn the other hand,are the generated details real details?clickclickCNN-based SR methods incorporate external data.Using only one frame,they can also produce sharp structures.In this example,on the right-hand-side,one SR method generates some clear window patterns on the building,clickbut they are far from real on the left.The problem is,details from external data,may not be true for input image.,38,EffectivenessHow to make good use of multiple frames?Are the generated details real?Model IssuesOne model for one setting,Remaining Challenges,VDSR Kim et al.,2016,ESPCN Shi et al.,2016,VSRNet Kappeler et al,2016,演示者2019-05-08 03:51:56-clickThere are also model issues in current methods.clickFor all recent CNN-based SR methods,model parameters are fixed for certain scale factors,or number of frames.If you want to change scale factors,you need to change network configuration and train another one.,39,EffectivenessHow to make good use of multiple frames?Are the generated details real?Model IssuesOne model for one setting Intensive parameter tuning Slow,40,Remaining Challenges,演示者2019-05-08 03:51:56-click clickAnd most traditional video SR methods involve intensive parameter tuning and may be slow.All the issues mentioned above prevent them from practical usage.,AdvantagesBetter use of sub-pixel motionPromising results both visually and quantitativelyFully Scalable Arbitrary input size Arbitrary scale factorArbitrary temporal frames,41,OurMethod,演示者2019-05-08 03:51:57-The goals of our method are as follows.clickWe are trying to make better use of sub-pixel motion between frames and produce high-quality results with real details.clickWe also hope the designed framework be fully scalable,in terms of input image size,scale factors and frame number.click,45,Data from Vid4 Ce Liu et al.,演示者2019-05-08 03:51:57-Here is one video example.Characters,numbers and textures are hard to recognize in bicubic result.And ours results are much better and clearer.,Motion Estimation,OurMethod,0,ME,0,演示者2019-05-08 03:51:57-Due to time limit,here we briefly describe our method.Audiences are welcome to our poster session for more details.Our method contains 3 components.clickThe first module is a motion estimation network.clickThis module take 2 low-res images as input.clickAnd outputs a low-res motion field.click,43,Sub-pixel Motion Compensation(SPMC)Layer,OurMethod,0,ME,0,SPMC,演示者2019-05-08 03:51:57-clickThe second module is newly designed.We call it sub-pixel motion compensation layer.clickThis module takes as input the ith low-res frame and the estimated motion field.The output of this module is a high-res image.Unlike previous methods,this layer simultaneously achieve resolution enhancement and motion compensation,which can better keep subpixel information in frames.,44,Detail Fusion Net,OurMethod,0,ME,0,SPMC,Encoder,Decoder,ConvL STM,=1,=+1,skip connections,演示者2019-05-08 03:51:57-clickIn the last stage,we design a Detail Fusion Network to combine all frames.clickHere we use encoder-decoder structure in this module,since it is proved very effective in image regression tasks.Skip connections are used for better convergence.clickThe important change is that,we insert a convLSTM module insider the network.It is a natural choice since we are handlingsequential inputs and hoping to utilize temporal information.clickThe ConvLSTM considers information from previous time step,and pass hidden state to next time step.,45,Arbitrary Input Size,0,ME,0,SPMC,Encoder,Conv LSTM,=1,=+1,skip connections,Input size:,Fully convolutional,Decoder,演示者2019-05-08 03:51:57-Our proposed framework has the advantageof fully scalability.clickInput videos may be of different sizes in practise.clickSince our network is fully convolutional,it can natural handle this.,46,Arbitrary Scale Factors,2,3,4,Parameter Free,0,ME,0,SPMC,Encoder,Conv LSTM,=1,=+1,skip connections,Decoder,演示者2019-05-08 03:51:58-clickWhen dealing with different scale factors,previous networks need to change network parameters.clickOur network is different since the resolutionincrease happens in SPMC layer,and it is parameter free.clickThis property enables us to use one single model configuration to handle all scale factors,including non-integer values.,47,Arbitrary Temporal Length,3 frames,5 frames,0,ME,0,SPMC,Encoder,Conv LSTM,=1,=+1,skip connections,Decoder,演示者2019-05-08 03:51:58-clickFor practical systems,we may want to choose the number of frames in testing phase,in order to achieve balance between quality and efficiency.Our framework uses ConvLSTM to handle frames in a sequential way.clickTherefore,it can accept arbitrary temporal length.,48,Details from multi-frames,Analysis,Output(identical)3 identicalframes,演示者2019-05-08 03:51:58-We do analysis to evaluate our method.clickFirst,are our recovered details real?clickHere we use three identical frames as input to our network.The information contained in this input is no more than one single low-res image.clickAs expected,although sharper,the output contains no more details.And the characters and logo are still unrecognizable.,49,Details from multi-frames,Analysis,3 consecutive frames,Output(consecutive),Output(identical),50,演示者2019-05-08 03:51:58-clickHowever,if we take 3 consecutive frames from the video as input.clickOur network produces much better results.Characters and logo are very clear to be read.This experiment proves that the sharp structures recovered come from real information of inputs,rather then from external information in the network.We will be safe to trust the SR results.,Ablation Study:SPMC Layer v.s.Baseline,Analysis,Output(baseline),0BWResize,Backward warping+Resize(baseline),51,演示者2019-05-08 03:51:58-clickIn the next experiment,we do ablation study of our SPMC layer.clickWe substitute SPMC layer with a baselinemodule,that is a backward warping followed by upsampling.This baseline method can also compensate motion and increase resolution.It is widely adopted in previous CNN-based methods.clickIn this example,the tiles on the roof contain severely false structures due to aliasing.,Ablation Study:SPMC Layer v.s.Baseline,Analysis,Output(SPMC),0SPMC,SPMC,Output(baseline),52,演示者2019-05-08 03:51:58-clickWith our designed SPMC layer,clickthe structures of tiles in the result arevery faithful to the ground truth.We believe only by properly handling motion in sub-pixel precision,can we recover good results.,Comparisons,Bicu5b6ic x4,演示者2019-05-08 03:51:59-We further compare with current state-of-the-arts.This is the bicubic interpolated version of input.The windows and glass of the building are severely blurred.,Comparisons,BayesSR Liu et al,257011;Ma et al.,2015,演示者2019-05-08 03:51:59-The result of Bayesian SR is sharp,but the structures are still missing.,Comparisons,DESR Liao58et al.,2015,演示者2019-05-08 03:51:59-Draft-ensemble SR recovers a few details,but with artifacts.,Comparisons,VSRNet Kapp5e9ler et al,2016,演示者2019-05-08 03:51:59-One recent CNN-based VSRNet produces smooth result.,Comparisons,Ou60rs,演示者2019-05-08 03:51:59-Visually,our result is much better.The edges of the buildings and windows are easy to distinguish.We then go back to input.click,Comparisons,Bicu6b1ic x4,演示者2019-05-08 03:51:59-Then our results.click,Comparisons,Ou62rs,演示者2019-05-08 03:51:59-The changes are obvious.,Running Time,60,演示者2019-05-08 03:52:00-We compare running time with most of the current methods click,BayesSR Liu et al,2011,Running Time,2,hour/frame,Frames:31Scale Factor:4,演示者2019-05-08 03:52:00-BayesianSR method needs 2 hours to produce one frame,as reported in their paper.,61,MFSR Ma et al,2015,Running Time,10,62,min/frame,Frames:31Scale Factor:4,演示者2019-05-08 03:52:00-MFSR method requires 10 min per frame.,DESR Liao et al,2015,Running Time,Frames:31Scale Factor:4,63,8,min/frame,演示者2019-05-08 03:52:00-Draft ensamble SR requires 8 minutes.,VSRNet Kappeler et al,2016,Running Time,40,64,s/frame,Frames:5Scale Factor:4,演示者2019-05-08 03:52:00-VSRNet needs 40 second per frame.,Ours(5 frames),Running Time,0.19s/frameFrames:5Scale Factor:4,65,演示者2019-05-08 03:52:00-Our framework is much faster since all components can be efficiently computed on GPU.It requires 0.19 s using neighboring 5 frames.,Ours(3 frames),Running Time,0.14s/frameFrames:3Scale Factor:4,66,演示者2019-05-08 03:52:00-It can be further accelerated to 0.14 second if we use 3 frames.,More Results,67,演示者2019-05-08 03:52:01-Here we show more video results.,演示者2019-05-08 03:52:01-In this first result,our method works very well,especially on edges of the building.,68,演示者2019-05-08 03:52:01-In the next result,tiles of the temple and carves on the lamp are mostly recovered.,69,Summary,End-to-end&fully scalableNew SPMC layerHigh-quality&fast speed,0,ME,0,SPMC,Encoder,Conv LSTM,=1,=+1,skip connections,Decoder,演示者2019-05-08 03:52:01-In summary,we propose a new end-to-end CNN-based framework for video SR,which is fully scalable.clickOur framework includes a new SPMC layer that can better handle inter-frame motion.clickOur method produces high-quality results with fast speed.,70,3.图像视频去模糊,图像去模糊问题,75,Data from previous work,演示者2019-05-08 03:52:02-The target of video super-resolution is to increase the resolution of videos with rich details.clickIt is an old and fundamental problem that has been studied since several decades ago.clickVideo SR enables many applications,such as High-definition video generation from low-res sources.click,Different Blur AssumptionsUniform:Fergus et al,2006,Shan et al,2009,Cho et al,2009,Xu et al,2010,etc.,Previous Work,76