Word Embeddings and Virtual Terms

    公开(公告)号:US20210027024A1

    公开(公告)日:2021-01-28

    申请号:US17060198

    申请日:2020-10-01

    Abstract: A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.

    Sparse matrix storage in a database

    公开(公告)号:US10275479B2

    公开(公告)日:2019-04-30

    申请号:US14633915

    申请日:2015-02-27

    Abstract: Methods, processes and computer-program products are disclosed for use in a parallelized computing system in which representations of large sparse matrices are efficiently encoded and communicated between grid-computing devices. A sparse matrix can be encoded and stored as a collection of character strings wherein each character string is a Base64 encoded string representing the non-zero elements of a single row of the sparse matrix. On a per-row basis, non-zero elements can be identified by column indices and error correction metadata can be included. The resultant row data can be converted to IEEE 754 8-byte representations and then encoded into Base64 characters for storage as strings. These character strings of even very large-dimensional sparse matrices can be efficiently stored in databases or communicated to grid-computing devices.

    Acceleration of sparse support vector machine training through safe feature screening
    4.
    发明授权
    Acceleration of sparse support vector machine training through safe feature screening 有权
    通过安全特征筛选加快稀疏支持向量机训练

    公开(公告)号:US09495647B2

    公开(公告)日:2016-11-15

    申请号:US14834365

    申请日:2015-08-24

    CPC classification number: G06N99/005

    Abstract: A system for machine training can comprise one or more data processors and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including: accessing a dataset comprising data tracking a plurality of features; determining a series of values for a regularization parameter of a sparse support vector machine model, the series including an initial regularization value and a next regularization value; computing an initial solution to the sparse support vector machine model for the initial regularization value; identifying, using the initial solution, inactive features of the sparse support vector machine model for the next regularization value; and computing a next solution to the sparse support vector machine model for the next regularization value, wherein computing the next solution includes excluding the inactive features.

    Abstract translation: 用于机器训练的系统可以包括一个或多个数据处理器和包含指令的非暂时计算机可读存储介质,所述指令在所述一个或多个数据处理器上执行时使所述一个或多个数据处理器执行操作,所述操作包括:访问 数据集,其包括跟踪多个特征的数据; 确定稀疏支持向量机模型的正则化参数的一系列值,该系列包括初始正则化值和下一个正则化值; 计算初始正则化值的稀疏支持向量机模型的初始解; 使用初始解决方案识别用于下一个正则化值的稀疏支持向量机模型的非活动特征; 并计算下一个正则化值的稀疏支持向量机模型的下一个解决方案,其中计算下一个解决方案包括排除非活动特征。

    SYSTEM AND METHODS FOR INTERACTIVE DISPLAYS BASED ON ASSOCIATIONS FOR MACHINE-GUIDED RULE CREATION
    5.
    发明申请
    SYSTEM AND METHODS FOR INTERACTIVE DISPLAYS BASED ON ASSOCIATIONS FOR MACHINE-GUIDED RULE CREATION 有权
    基于机器引导规则创建协会的互动显示系统和方法

    公开(公告)号:US20150193523A1

    公开(公告)日:2015-07-09

    申请号:US14662443

    申请日:2015-03-19

    Abstract: This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules.

    Abstract translation: 本公开提供了一种用于访问类别或项目的表示并访问一组多个事务的计算机程序产品,系统,方法和装置。 处理事务以识别在事务之间找到的项目,并且这些项目是基于信息增益启发式排序的。 然后使用最佳先验启发式和约束条件进行一组最佳关联规则的深度优先搜索,从而使搜索更有效率。 然后可以在搜索过程中找到的最佳规则显示给用户,以及随附的统计信息。 然后,用户可以选择似乎最相关的规则,并且可以对所选规则应用进一步的分析,以获得关于由这些规则提供的信息的进一步的信息。

    SYSTEM FOR EFFICIENTLY GENERATING K-MAXIMALLY PREDICTIVE ASSOCIATION RULES WITH A GIVEN CONSEQUENT
    6.
    发明申请
    SYSTEM FOR EFFICIENTLY GENERATING K-MAXIMALLY PREDICTIVE ASSOCIATION RULES WITH A GIVEN CONSEQUENT 有权
    有效率地产生K最大预测关联规则的系统

    公开(公告)号:US20140337271A1

    公开(公告)日:2014-11-13

    申请号:US14337195

    申请日:2014-07-21

    CPC classification number: G06N5/025 G06F17/30 G06F17/30289 G06N5/04 G06Q40/00

    Abstract: This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules.

    Abstract translation: 本公开提供了一种用于访问类别或项目的表示并访问一组多个事务的计算机程序产品,系统,方法和装置。 处理事务以识别在事务之间找到的项目,并且基于信息增益启发式对项目进行排序。 然后使用最佳先验启发式和约束条件进行一组最佳关联规则的深度优先搜索,从而使搜索更有效率。 然后可以在搜索过程中找到的最佳规则显示给用户,以及随附的统计信息。 然后,用户可以选择似乎最相关的规则,并且可以对所选规则应用进一步的分析,以获得关于由这些规则提供的信息的进一步的信息。

    Graphical user interface for visualizing contributing factors to a machine-learning model's output

    公开(公告)号:US11501084B1

    公开(公告)日:2022-11-15

    申请号:US17747139

    申请日:2022-05-18

    Abstract: In one example, a system can execute a first machine-learning model to determine an overall classification for a textual dataset. The system can also determine classification scores indicating the level of influence that each token in the textual dataset had on the overall classification. The system can select a first subset of the tokens based on their classification scores. The system can also execute a second machine-learning model to determine probabilities that the textual dataset falls into various categories. The system can determine category scores indicating the level of influence that each token had on a most-likely category determination. The system can select a second subset of the tokens based on their category scores. The system can then generate a first visualization depicting the first subset of tokens color-coded to indicate their classification scores and a second visualization depicting the second subset of tokens color-coded to indicate their category scores.

    Word embeddings and virtual terms

    公开(公告)号:US11048884B2

    公开(公告)日:2021-06-29

    申请号:US17060198

    申请日:2020-10-01

    Abstract: A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.

Patent Agency Ranking