The Epsilon Greedy Algorithm - a Performance Review |
( Volume 6 issue 9,September 2020 ) OPEN ACCESS |
Author(s): |
Riti Agarwal |
Keywords: |
exploration, exploitation, regret, reward function, local maxima. |
Abstract: |
Multi-Armed Bandit (MAB) is a class of reinforcement learning algorithms. A multi-armed bandit implementation has a agent (learner) that chooses between k different uncertain actions and receives a reward based on the chosen action. This paper focuses mainly on the Epsilon Greedy Algorithm in comparison to Thompson Sampling and UCB-1 (Upper Confidence Bound). It talks about the benefits of using bandit algorithms over A/B testing and evaluates the effectiveness of the 3 main solutions. It experimentally shows the best use cases for the Epsilon Greedy Algorithm - when the experimentation period is longer than that of A/B testing and you want to exploit the best performing variant. It also talks about when the algorithm does not provide statistically correct results - when the sample size, on each path of the experiment, is very small. |
Paper Statistics: |
Cite this Article: |
Click here to get all Styles of Citation using DOI of the article. |