TBD, in the meantime, we recommend reading this paper:

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agentsโ€™ most sensible choice in this setting would be to employ a no-regret learning algorithm.

Last updated