Product listing recognizer
Abstract:
In one embodiment, a method includes extracting a document object model (DOM) for a content page, wherein the DOM comprises a hierarchical tree-based data structure. The method also includes identifying candidate nodes in the DOM based on a context of the nodes, wherein the candidate nodes may correspond to listing items. The method additionally includes for each of the candidate nodes, locating its parent and child nodes by traversing the DOM from the candidate node, extracting information from the candidate node and its parent and child nodes, and assessing whether the candidate node qualifies as a listing item based on whether the extracted information fulfills a required set of characteristics for a listing item.
Public/Granted literature
Information query
Patent Agency Ranking
0/0