# Roadmap

TBD, in the meantime, we recommend reading this paper:

> This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents’ most sensible choice in this setting would be to employ a no-regret learning algorithm.

{% embed url="<https://proceedings.neurips.cc/paper/2018/file/47fd3c87f42f55d4b233417d49c34783-Paper.pdf>" %}
