Thompson strategy based online reinforcement learning system for action selection

Invention Grant

US07707131B2 Thompson strategy based online reinforcement learning system for action selection 失效

Title translation: 基于Thompson战略的在线强化学习系统的行动选择

Please log in to see more content

Patent Title: Thompson strategy based online reinforcement learning system for action selection
Patent Title (中): 基于Thompson战略的在线强化学习系统的行动选择
Application No.: US11169503

Application Date: 2005-06-29
Publication No.: US07707131B2

Publication Date: 2010-04-27
Inventor: David M. Chickering , Timothy S. Paek , Eric J. Horvitz
Applicant: David M. Chickering , Timothy S. Paek , Eric J. Horvitz
Applicant Address: US WA Redmond
Assignee: Microsoft Corporation
Current Assignee: Microsoft Corporation
Current Assignee Address: US WA Redmond
Agency: Lee & Hayes, PLLC
Main IPC: G06N5/04
IPC: G06N5/04 ; G06N7/00 ; G06N7/02

Thompson strategy based online reinforcement learning system for action selection

Abstract:

A system and method for online reinforcement learning is provided. In particular, a method for performing the explore-vs.-exploit tradeoff is provided. Although the method is heuristic, it can be applied in a principled manner while simultaneously learning the parameters and/or structure of the model (e.g., Bayesian network model).The system includes a model which receives an input (e.g., from a user) and provides a probability distribution associated with uncertainty regarding parameters of the model to a decision engine. The decision engine can determine whether to exploit the information known to it or to explore to obtain additional information based, at least in part, upon the explore-vs.-exploit tradeoff (e.g., Thompson strategy). A reinforcement learning component can obtain additional information (e.g., feedback from a user) and update parameter(s) and/or the structure of the model. The system can be employed in scenarios in which an influence diagram is used to make repeated decisions and maximization of long-term expected utility is desired.

Abstract(Chinese):

提供了一种在线强化学习的系统和方法。特别地，提供了用于执行探索与利用的权衡的方法。尽管该方法是启发式的，但是它可以以原则的方式应用，同时学习模型的参数和/或结构（例如，贝叶斯网络模型）。该系统包括接收输入（例如，来自用户）并且向决策引擎提供与关于模型的参数的不确定性相关联的概率分布的模型。决策引擎可以确定是否利用已知的信息，或者至少部分地基于探索与利用权衡（Thompson策略）来探索获取附加信息。强化学习组件可以获得附加信息（例如，来自用户的反馈）和更新参数和/或模型的结构。该系统可用于使用影响图进行重复决策的场景，并期望实现长期预期效用的最大化。

Public/Granted literature

US20060224535A1 Action selection for reinforcement learning using influence diagrams Public/Granted day:2006-10-05

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06N	基于特定计算模型的计算机系统
G06N5/00	利用基于知识的模式的计算机系统
G06N5/04	.推理方法或设备