Generating multi modal image representation for an image

Invention Grant

US11244205B2 Generating multi modal image representation for an image 有权

Please log in to see more content

Patent Title: Generating multi modal image representation for an image
Application No.: US16370509

Application Date: 2019-03-29
Publication No.: US11244205B2

Publication Date: 2022-02-08
Inventor: Sumit Srivastava , Suhit Sinha , Rushi P. Bhatt
Applicant: Microsoft Technology Licensing, LLC
Applicant Address: US WA Redmond
Assignee: Microsoft Technology Licensing, LLC
Current Assignee: Microsoft Technology Licensing, LLC
Current Assignee Address: US WA Redmond
Agency: NDWE, LLP.
Main IPC: G06K9/62
IPC: G06K9/62 ; G06K9/34 ; G06N20/00 ; G06N3/08

Generating multi modal image representation for an image

Abstract:

Technologies for generating a multi-modal representation of an image based on the image content are provided. The disclosed techniques include receiving an image, to be classified, that comprises one or more embedded text characters. The one or more embedded text characters are identified from the image and a first machine learning model is used to generate a text vector that represents a numerical representation of the one or more embedded text characters. A second machine learning model is used to generate an image vector that represents a numerical representation of the graphical portion of the image. The text vector and the image vector are used as input to generate a multi-modal vector that contains information from both the text vector and the image vector. The image may be classified into one of a plurality of image classifications based upon the information in the multi-modal vector.

Public/Granted literature

US20200311467A1 GENERATING MULTI MODAL IMAGE REPRESENTATION FOR AN IMAGE Public/Granted day:2020-10-01

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )
G06K9/62	.应用电子设备进行识别的方法或装置