Great article. Quick question:
Furthermore, when adding positional information to the cache tokens, the approach uses positions inside of the cache rather than the positions in the real text.
What do you mean by adding positional information to the cache tokens?