深度學習是一門實踐科學,實驗發展遠遠甩開了理論研究,因此本文的架構采用理論與實踐相結合的模式。
首先可以去專知深度學習條目下看看相關文章,
針對卷積神經網絡,我們可以通過如下文章了解基本概念
卷積神經網絡工作原理直觀的解釋?https://www.zhihu.com/question/39022858
技術向:一文讀懂卷積神經網絡CNNhttp://dataunion.org/11692.html
深度學習元老Yann Lecun詳解卷積神經網絡https://www.leiphone.com/news/201608/zaB48AcZ1AFm1TaP.html
CNN筆記:通俗理解卷積神經網絡https://www.2cto.com/kf/201607/522441.html
了解完基本概念之後,還需要對CNN有一個直觀理解,深度學習可視化是一個非常不錯的選擇
Visualizing and Understanding Convolutional Networks中文筆記http://www.gageet.com/2014/10235.php
英文原文,感興趣的可以看一下https://arxiv.org/abs/1311.2901
在開始具體的實踐之前,可以先去tensorflow的playground嚐試一番,地址http://playground.tensorflow.org/,指導http://f.dataguru.cn/article-9324-1.html
之後就可以在自己的電腦上實驗了,首先,使用GPU是必須的:
安裝cudahttp://blog.csdn.net/u010480194/article/details/54287335
安裝cudnnhttp://blog.csdn.net/lucifer_zzq/article/details/76675239
之後就是選擇適合自己的框架
現在最火的深度學習框架是什麼?https://www.zhihu.com/question/52517062?answer_deleted_redirect=true
深度 | 主流深度學習框架對比:看你最適合哪一款?http://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650719118&idx=2&sn=fad8b7cad70cc6a227f88ae07a89db66#rd
當然,還有一個專門評價框架的github項目,更新比較勤https://github.com/hunkim/DeepLearningStars
如果有選擇困難症的話,不負責任地推薦兩個框架:tensorflow和pytorch,tensorflow可視化和工程銜接做得很好,pytorch實現比較自由,用起來很舒服
tensorflow官網http://www.tensorflow.org/
pytorch 官網http://pytorch.org/
基本按照官網上的指示一步步地安裝就沒啥大問題了,如果真遇到問題,可以上一個神奇的網站https://stackoverflow.com/搜索解決方法,基本上都能找到
還需要熟悉一個重要的工具githubhttps://github.com/,不論是自己管理代碼還是借鑒別人的代碼都很方便,想要教程的話可以參考這篇回答https://www.zhihu.com/question/20070065
當然,要是偷懶不想看的話,可以用IDE來輔助管理,例如pycharmhttp://www.jetbrains.com/pycharm/,教程http://blog.csdn.net/u013088062/article/details/50349833
一個可視化的交互工具也是非常重要的,這裏推薦神器jupyter notebookhttp://python.jobbole.com/87527/?repeat=w3tc
以上準備工作都做好了,就可以開始自己的入門教程了。事實上官網的教程非常不錯,但要是嫌棄全英文看著困難的話,也可以看看以下教程
tensorflow
TensorFlow 如何入門?https://www.zhihu.com/question/49909565
TensorFlow入門http://hacker.duanshishi.com/?p=1639
穀歌的官方tutorial其實挺完善的,不想看英文可以看看這個中文翻譯http://wiki.jikexueyuan.com/project/tensorflow-zh/
pytorch
PyTorch深度學習:60分鍾入門(Translation)https://zhuanlan.zhihu.com/p/25572330
新手如何入門pytorch?https://www.zhihu.com/question/55720139
超簡單!pytorch入門教程(一):Tensorhttp://www.jianshu.com/p/5ae644748f21
如果對python不熟悉的話,可以先看看這兩個教程python2:http://www.runoob.com/python/python-tutorial.html,python3:http://www.runoob.com/python3/python3-tutorial.html
如果隻是玩票性質的,不想在框架上浪費太多時間的話,可以試試keras
Keras入門教程http://www.360doc.com/content/17/0624/12/1489589_666148811.shtml
經過了前麵的入門,相信大家已經對卷積神經網絡有了一個基本概念了,同時對如何實現CNN也有了基本的了解。而進階學習的學習同樣也是兩個方麵
首先是反向傳播算法,入門時雖然用不著看,因為常用的框架都有自動求導,但是想要進一步一定要弄清楚。教程http://blog.csdn.net/u014313009/article/details/51039334
接著熟悉一下CNN的幾個經典模型
文章:ImageNet Classification with Deep Convolutional Neural Networkshttp://ml.informatik.uni-freiburg.de/former/_media/teaching/ws1314/dl/talk_simon_group2.pdf
講解:http://blog.csdn.net/u014088052/article/details/50898842
代碼:tensorflowhttps://github.com/kratzert/finetune_alexnet_with_tensorflowpytorchhttps://github.com/aaron-xichen/pytorch-playground
文章:Deep Residual Learning for Image Recognitionhttps://arxiv.org/abs/1512.03385
講解:http://blog.csdn.net/wspba/article/details/56019373
代碼:tensorflowhttps://github.com/ry/tensorflow-resnetpytorchhttps://github.com/isht7/pytorch-deeplab-resnet
文章:Densely Connected Convolutional Networkshttps://arxiv.org/pdf/1608.06993.pdf
講解:http://blog.csdn.net/u014380165/article/details/75142664
代碼:原版https://github.com/liuzhuang13/DenseNettensorflowhttps://github.com/YixuanLi/densenet-tensorflowpytorchhttps://github.com/bamos/densenet.pytorch
推薦先看講解,然後閱讀源碼,一方麵可以加深對模型的理解,另一方麵也可以從別人的源碼中學習各種框架新姿勢。
當然,我們不能僅僅停留在表麵上,這裏推薦一本非常有名的書《Deep Learning》,這裏是中文版的鏈接https://github.com/exacity/deeplearningbook-chinese
更為基礎的理論研究目前還處於缺失狀態
要是有耐心的同學,可以學習一下斯坦福新開的課程https://www.bilibili.com/video/av9156347/
具體到實踐中,有非常多需要學習的點。在學習之前,最好先看看調參技巧
深度學習調參有哪些技巧?https://www.zhihu.com/question/25097993
過去有本調參聖經Neural Networks: Tricks of the Trade ,太老了,不推薦看。
dropout,lrn這些過去常用的模塊最近已經用得越來越少了,就不贅述了,有關正則化,推薦BatchNormhttps://www.zhihu.com/question/38102762, 思想簡單,效果好
雖然有了BatchNorm之後訓練基本已經非常穩定了,但最好還是學習一下梯度裁剪http://blog.csdn.net/zyf19930610/article/details/71743291
激活函數也是一個非常重要的點,不過在卷積神經網絡中基本無腦用ReLuhttp://www.cnblogs.com/neopenx/p/4453161.html就行了,計算快,ReLu+BatchNorm可以說是萬金油。當然,像一些具體的任務還是需要具體分析,例如GAN就不適合用這種簡單粗暴的激活函數。
結構上基本完善了,接下來就是優化了,優化的算法有很多,最常見的是SGD與Adam。
所有優化算法概覽http://www.mamicode.com/info-detail-1931210.html
好的算法可以更快地收斂或者有更好的效果,不過大多數實驗中SGD與Adam已經夠用了。
大神們的經驗也是要看一下的:Yoshua Bengio等大神傳授:26條深度學習經驗http://www.csdn.net/article/2015-09-16/2825716
前麵的這些學完之後,就是具體的研究項目了,大家可以去這個github上找自己感興趣的論文https://github.com/terryum/awesome-deep-learning-papers,下麵列舉了一些和卷積神經網絡相關的優秀論文。
Distilling the knowledge in a neural network (2015), G. Hinton et al.http://arxiv.org/pdf/1503.02531
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al.http://arxiv.org/pdf/1412.1897
How transferable are features in deep neural networks? (2014), J. Yosinski et al.http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
CNN features off-the-Shelf: An astounding baseline for recognition (2014), A. Razavian et al.http://www.cv-foundation.org//openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf
Learning and transferring mid-Level image representations using convolutional neural networks (2014), M. Oquab et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Oquab_Learning_and_Transferring_2014_CVPR_paper.pdf
Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergushttp://arxiv.org/pdf/1311.2901
Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al.http://arxiv.org/pdf/1310.1531
Training very deep networks (2015), R. Srivastava et al.http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf
Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedyhttp://arxiv.org/pdf/1502.03167
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015), K. He et al.http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf
Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al.http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Adam: A method for stochastic optimization (2014), D. Kingma and J. Bahttp://arxiv.org/pdf/1412.6980
Improving neural networks by preventing co-adaptation of feature detectors (2012), G. Hinton et al.http://arxiv.org/pdf/1207.0580.pdf
Random search for hyper-parameter optimization (2012) J. Bergstra and Y. Bengiohttp://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a
Rethinking the inception architecture for computer vision (2016), C. Szegedy et al.http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf
Inception-v4, inception-resnet and the impact of residual connections on learning (2016), C. Szegedy et al.http://arxiv.org/pdf/1602.07261
Identity Mappings in Deep Residual Networks (2016), K. He et al.https://arxiv.org/pdf/1603.05027v2.pdf
Deep residual learning for image recognition (2016), K. He et al.http://arxiv.org/pdf/1512.03385
Spatial transformer network (2015), M. Jaderberg et al.,http://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf
Going deeper with convolutions (2015), C. Szegedy et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
Very deep convolutional networks for large-scale image recognition (2014), K. Simonyan and A. Zissermanhttp://arxiv.org/pdf/1409.1556
Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al.http://arxiv.org/pdf/1405.3531
OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al.http://arxiv.org/pdf/1312.6229
Maxout networks (2013), I. Goodfellow et al.http://arxiv.org/pdf/1302.4389v4
Network in network (2013), M. Lin et al.http://arxiv.org/pdf/1312.4400
ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al.http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
You only look once: Unified, real-time object detection (2016), J. Redmon et al.http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf
Fully convolutional networks for semantic segmentation (2015), J. Long et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015), S. Ren et al.http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf
Fast R-CNN (2015), R. Girshickhttp://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf
Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf
Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al.http://arxiv.org/pdf/1406.4729
Semantic image segmentation with deep convolutional nets and fully connected CRFs, L. Chen et al.https://arxiv.org/pdf/1412.7062
Learning hierarchical features for scene labeling (2013), C. Farabet et al.https://hal-enpc.archives-ouvertes.fr/docs/00/74/20/77/PDF/farabet-pami-13.pdf
Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al.https://arxiv.org/pdf/1501.00092v3.pdf
A neural algorithm of artistic style (2015), L. Gatys et al.https://arxiv.org/pdf/1508.06576
Deep visual-semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei-Feihttp://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Karpathy_Deep_Visual-Semantic_Alignments_2015_CVPR_paper.pdf
Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al.http://arxiv.org/pdf/1502.03044
Show and tell: A neural image caption generator (2015), O. Vinyals et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf
Long-term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al.http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Donahue_Long-Term_Recurrent_Convolutional_2015_CVPR_paper.pdf
VQA: Visual question answering (2015), S. Antol et al.http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf
DeepFace: Closing the gap to human-level performance in face verification (2014), Y. Taigman et al.http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Taigman_DeepFace_Closing_the_2014_CVPR_paper.pdf
Large-scale video classification with convolutional neural networks (2014), A. Karpathy et al.http://vision.stanford.edu/pdf/karpathy14.pdf
Two-stream convolutional networks for action recognition in videos (2014), K. Simonyan et al.http://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf
3D convolutional neural networks for human action recognition (2013), S. Ji et al.http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_JiXYY10.pdf
更多的需要可以參考專知的另一篇deeplearning相關的文章//www.webtourguide.com/topic/2001228999615594/awesome,其中有很多具體細化的領域以及相關文章,這裏就不重複了。
摘要: 視頻超分辨率是根據給定的低分辨率視頻序列恢複其對應的高分辨率視頻幀的過程。近年來,VSR在深度學習的驅動下取得了重大突破。為了進一步促進VSR的發展,文中對基於深度學習的VSR算法進行了歸類、分析和比較。首先,根據網絡結構將現有方法分為兩大類,即基於迭代網絡的VSR和基於遞歸網絡的VSR,並對比分析了不同網絡模型的優缺點。然後,全麵介紹了VSR數據集,並在一些常用的公共數據集上對已有算法進行了總結和比較。最後,對VSR算法中的關鍵問題進行了分析,並對其應用前景進行了展望。
Active speaker detection (ASD) is a multi-modal task that aims to identify who, if anyone, is speaking from a set of candidates. Current audio-visual approaches for ASD typically rely on visually pre-extracted face tracks (sequences of consecutive face crops) and the respective monaural audio. However, their recall rate is often low as only the visible faces are included in the set of candidates. Monaural audio may successfully detect the presence of speech activity but fails in localizing the speaker due to the lack of spatial cues. Our solution extends the audio front-end using a microphone array. We train an audio convolutional neural network (CNN) in combination with beamforming techniques to regress the speaker's horizontal position directly in the video frames. We propose to generate weak labels using a pre-trained active speaker detector on pre-extracted face tracks. Our pipeline embraces the "student-teacher" paradigm, where a trained "teacher" network is used to produce pseudo-labels visually. The "student" network is an audio network trained to generate the same results. At inference, the student network can independently localize the speaker in the visual frames directly from the audio input. Experimental results on newly collected data prove that our approach significantly outperforms a variety of other baselines as well as the teacher network itself. It results in an excellent speech activity detector too.