A Fast Learning Algorithm for Deep Belief Nets

专业资料 > IT/计算机 > 深度学习 > 文档预览

28 页 1 下载 1732 浏览 0 评论 0 收藏 3.0分

温馨提示：如果当前文档出现乱码或未能正常浏览，请先下载原文档进行浏览。

A Fast Learning Algorithm for Deep Belief Nets 第 1 页

A Fast Learning Algorithm for Deep Belief Nets 第 2 页

A Fast Learning Algorithm for Deep Belief Nets 第 3 页

A Fast Learning Algorithm for Deep Belief Nets 第 4 页

A Fast Learning Algorithm for Deep Belief Nets 第 5 页

下载文档到电脑，方便使用

下载文档

还有 23 页可预览，继续阅读

A Fast Learning Algorithm for Deep Belief Nets内容摘要：

LETTER Communicated by Yann Le Cun A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton hinton@cs.toronto.edu osindero@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Canada M5S 3G4 Yee-Whye Teh tehyw@comp.nus.edu.sg Department of Computer Science, National University of Singapore, Singapore 117543 We show how to use “complementary priors” to eliminate the explainingaway effects that make inference difﬁcult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that ﬁne-tunes the weights using a contrastive version of the wake-sleep algorithm. After ﬁne-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classiﬁcation than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind. 1 Introduction Learning is difﬁcult in densely connected, directed belief nets that have many hidden layers because it is difﬁcult to infer the conditional distribution of the hidden activities when given a data vector. Variational methods use simple approximations to the true conditional distribution, but the approximations may be poor, especially at the deepest hidden layer, where the prior assumes independence. Also, variational learning still requires all of the parameters to be learned together and this makes the learning time scale poorly as the number of parameters increases. We describe a model in which the top two hidden layers form an undirected associative memory (see Figure 1) and the remaining hidden layers Neural Computation 18, 1527–1554 (2006) C 2006 Massachusetts Institute of Technology Downloaded from http://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf by guest on 08 May 2021 Simon Osindero 1528 G. Hinton, S. Osindero, and Y.-W. Teh 2000 top-level units 10 label units 500 units 28 x 28 pixel image Figure 1: The network used to model the joint distribution of digit images and digit labels. In this letter, each training case consists of an image and an explicit class label, but work in progress has shown that the same learning algorithm can be used if the “labels” are replaced by a multilayer pathway whose inputs are spectrograms from multiple different speakers saying isolated digits. The network then learns to generate pairs that consist of an image and a spectrogram of the same digit class. form a directed acyclic graph that converts the representations in the associative memory into observable variables such as the pixels of an image. This hybrid model has some attractive features: r r r r There is a fast, greedy learning algorithm that can ﬁnd a fairly good set of parameters quickly, even in deep networks with millions of parameters and many hidden layers. The learning algorithm is unsupervised but can be applied to labeled data by learning a model that generates both the label and the data. There is a ﬁne-tuning algorithm that learns an excellent generative model that outperforms discriminative methods on the MNIST database of hand-written digits. The generative model makes it easy to interpret the distributed representations in the deep hidden layers. Downloaded from http://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf by guest on 08 May 2021 This could be the top level of another sensory pathway 500 units A Fast Learning Algorithm for Deep Belief Nets r r r 1529 The inference required for forming a percept is both fast and accurate. The learning algorithm is local. Adjustments to a synapse strength depend on only the states of the presynaptic and postsynaptic neuron. The communication is simple. Neurons need only to communicate their stochastic binary states. Downloaded from http://direct.mit.edu/neco/article-pdf/18/7/1527/816558/neco.2006.18.7.1527.pdf by guest on 08 May 2021 Section 2 introduces the idea of a “complementar

本文档由 sddwt 于 2021-05-26 02:50:24上传分享

下载原文档(811.57 KB)

收藏分享

给文档打分

评论列表

暂时还没有评论，期待您的金玉良言