https://arxiv.org/abs/2305.16843 Randomized Positional Encodings Boost Length Generalization of TransformersTransformers have impressive generalization capabilities on tasks with a fixed context length. However, they fail to generalize to sequences of arbitrary length, even for seemingly simple tasks such as duplicating a string. Moreover, simply training on lonarxiv.org 학습 혹은 추론 때 토큰 길이에 대한 논문이..