Efficient tagging of content items using multi-granular embeddings
Abstract:
Efficient tagging of content items using content embeddings are provided. In one technique, multiple content items are stored a content embedding for content item is stored. Entity names are also stored along with an entity name embedding for each entity name. For each content item, (1) multiple content embeddings that are associated with the content item are identified; (2) a subset of the entity names is identified; and (3) for each entity name in the subset, (i) an embedding of the entity name is identified, (ii) similarity measures are generated based on the entity name embedding and the multiple content embeddings, (iii), a distribution of the similarity measures is generated, (iv) feature values are generated based on the distribution, (v) the feature values are input into a machine-learned classifier, and (vi) based on output from the classifier, it is determined whether to associate the entity name with the content item.
Public/Granted literature
Information query
Patent Agency Ranking
0/0