https://arxiv.org/abs/2006.03236 Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language ProcessingWith the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancyarxiv.orgTransformer가..