Invention Grant
US07707131B2 Thompson strategy based online reinforcement learning system for action selection
失效
基于Thompson战略的在线强化学习系统的行动选择
- Patent Title: Thompson strategy based online reinforcement learning system for action selection
- Patent Title (中): 基于Thompson战略的在线强化学习系统的行动选择
-
Application No.: US11169503Application Date: 2005-06-29
-
Publication No.: US07707131B2Publication Date: 2010-04-27
- Inventor: David M. Chickering , Timothy S. Paek , Eric J. Horvitz
- Applicant: David M. Chickering , Timothy S. Paek , Eric J. Horvitz
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Lee & Hayes, PLLC
- Main IPC: G06N5/04
- IPC: G06N5/04 ; G06N7/00 ; G06N7/02

Abstract:
A system and method for online reinforcement learning is provided. In particular, a method for performing the explore-vs.-exploit tradeoff is provided. Although the method is heuristic, it can be applied in a principled manner while simultaneously learning the parameters and/or structure of the model (e.g., Bayesian network model).The system includes a model which receives an input (e.g., from a user) and provides a probability distribution associated with uncertainty regarding parameters of the model to a decision engine. The decision engine can determine whether to exploit the information known to it or to explore to obtain additional information based, at least in part, upon the explore-vs.-exploit tradeoff (e.g., Thompson strategy). A reinforcement learning component can obtain additional information (e.g., feedback from a user) and update parameter(s) and/or the structure of the model. The system can be employed in scenarios in which an influence diagram is used to make repeated decisions and maximization of long-term expected utility is desired.
Public/Granted literature
- US20060224535A1 Action selection for reinforcement learning using influence diagrams Public/Granted day:2006-10-05
Information query