AI 논문 추천 자료

[인공지능 논문 Review - 06] Reformer: The efficient transformer

[Reformer: The efficient transformer]


Since first introduced in 2017, there has been a great success in applying transformer models to various tasks where deep learning handles sequence data, in particular including natural language processing.

Self-attention is a critical ingredient, modeling dependency among tokens in a sequence, without recurrent connections or convolutional kernels. Large transformer models often achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. Recently [Kitaev et al., 2020] proposed the ‘reformer’ which scales up the vanilla transformer model.

Two key ingredients in the reformer include:

(1) LSH attention (instead of dot-product attention) which reduces its complexity from O(T^2) to O(T logT) where T is the length of sequence ;

(2) reversible residual layers which allow storing activation only once in the training process.


효율적인 Transformer에 대한 논문이 궁금하시다면??

논문보기 링크↓

https://arxiv.org/pdf/2001.04451.pdf


최승진 석학의 대표적인 논문

1. Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh (2019),
"Set transformer: A framework for attention-based permutation-invariant neural networks,"
Proceedings of the Thirty-Sixth International Conference on Machine Learning (ICML-2019),
Long Beach, California, USA, June 9-15, 2019.
(earlier version in preprint arXiv:1810.00825 )

2. Juho Lee, Lancelot James, Seungjin Choi, and François Caron (2019),
"A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure,"
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS-2019),
Naha, Okinawa, Japan, April 16-18, 2019. (oral)
(earlier version in preprint arXiv:1810.01778 )

3. Yoonho Lee and Seungjin Choi (2018),
"Gradient-based meta-learning with adaptive layerwise metric and subspace,"
in Proceedings of the Thirty-Fifth International Conference on Machine Learning (ICML-2018),
Stockholm, Sweden, July 10-15, 2018.
(earlier version in preprint arXiv:1810.05558 )

4. Saehoon Kim, Jungtaek Kim, and Seungjin Choi (2018),
"On the optimal bit complexity of circulant binary embedding,"
in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018),

5. Juho Lee, Creighton Heaukulani, Zoubin Ghahramani, Lancelot James, and Seungjin Choi (2017),
"Bayesian inference on random simple graphs with power law degree distributions,"
in Proceedings of the International Conference on Machine Learning (ICML-2017),
Sydney, Australia, August 6-11, 2017.
(earlier version in preprint arXiv:1702.08239 )

6. Saehoon Kim and Seungjin Choi (2017),
"Binary embedding with additive homogeneous kernels,"
in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-2017),
San Francisco, California, USA, February 4-9, 2017.

7. Juho Lee, Lancelot F. James, and Seungjin Choi (2016),
"Finite-dimensional BFRY priors and variational Bayesian inference for power law models,"
in Advances in Neural Information Processing Systems 29 (NIPS-2016),
Barcelona, Spain, December 5-10, 2016.

8. Suwon Suh and Seungjin Choi (2016),
"Gaussian copula variational autoencoders for mixed data,"
Preprint arXiv:1604.04960, 2016.

9. Yong-Deok Kim, Taewoong Jang, Bohyung Han, and Seungjin Choi (2016),
"Learning to select pre-trained deep representations with Bayesian evidence framework,"
in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR-2016),
Las Vegas, Nevada, USA, June 27-30, 2016. (oral)

10. Juho Lee and Seungjin Choi (2015),
"Tree-guided MCMC inference for normalized random measure mixture models,"
in Advances in Neural Information Processing Systems 28 (NIPS-2015),
Montreal, Canada, December 7-12, 2015.

최승진 석학의 대표적인 논문

1. Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh (2019),

"Set transformer: A framework for attention-based permutation-invariant neural networks,"

Proceedings of the Thirty-Sixth International Conference on Machine Learning (ICML-2019),

Long Beach, California, USA, June 9-15, 2019.

(earlier version in preprint arXiv:1810.00825 )


2. Juho Lee, Lancelot James, Seungjin Choi, and François Caron (2019), 

"A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure,"

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS-2019),
Naha, Okinawa, Japan, April 16-18, 2019. (oral)
(earlier version in preprint arXiv:1810.01778 )

3. Yoonho Lee and Seungjin Choi (2018), 

"Gradient-based meta-learning with adaptive layerwise metric and subspace,"

in Proceedings of the Thirty-Fifth International Conference on Machine Learning (ICML-2018),
Stockholm, Sweden, July 10-15, 2018.
(earlier version in preprint arXiv:1810.05558 )

4. Saehoon Kim, Jungtaek Kim, and Seungjin Choi (2018), 

"On the optimal bit complexity of circulant binary embedding," 

in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-2018),
New Orleans, Louisiana, USA, February 2-7, 2018.

5. Juho Lee, Creighton Heaukulani, Zoubin Ghahramani, Lancelot James, and Seungjin Choi (2017),

"Bayesian inference on random simple graphs with power law degree distributions,"

in Proceedings of the International Conference on Machine Learning (ICML-2017),
Sydney, Australia, August 6-11, 2017.
(earlier version in preprint arXiv:1702.08239 )

6. Saehoon Kim and Seungjin Choi (2017),

"Binary embedding with additive homogeneous kernels,"

in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-2017),
San Francisco, California, USA, February 4-9, 2017.