Click Here for
Track Your Paper
ISSN:2454-4116

International Journal of New Technology and Research

Impact Factor 3.953

(An ISO 9001:2008 Certified Online Journal)
India | Germany | France | Japan

The Epsilon Greedy Algorithm - a Performance Review

( Volume 6 issue 9,September 2020 ) OPEN ACCESS
Author(s):

Riti Agarwal

Keywords:

exploration, exploitation, regret, reward function, local maxima.

Abstract:

Multi-Armed Bandit (MAB) is a class of reinforcement learning algorithms. A multi-armed bandit implementation has a agent (learner) that chooses between k different uncertain actions and receives a reward based on the chosen action. This paper focuses mainly on the Epsilon Greedy Algorithm in comparison to Thompson Sampling and UCB-1 (Upper Confidence Bound). It talks about the benefits of using bandit algorithms over A/B testing and evaluates the effectiveness of the 3 main solutions. It experimentally shows the best use cases for the Epsilon Greedy Algorithm - when the experimentation period is longer than that of A/B testing and you want to exploit the best performing variant. It also talks about when the algorithm does not provide statistically correct results - when the sample size, on each path of the experiment, is very small.

Paper Statistics:

Total View : 1171 | Downloads : 1162 | Page No: 01-03 |

Cite this Article:
Click here to get all Styles of Citation using DOI of the article.