Abstract:
A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.
Abstract:
Methods, processes and computer-program products are disclosed for use in a parallelized computing system in which representations of large sparse matrices are efficiently encoded and communicated between grid-computing devices. A sparse matrix can be encoded and stored as a collection of character strings wherein each character string is a Base64 encoded string representing the non-zero elements of a single row of the sparse matrix. On a per-row basis, non-zero elements can be identified by column indices and error correction metadata can be included. The resultant row data can be converted to IEEE 754 8-byte representations and then encoded into Base64 characters for storage as strings. These character strings of even very large-dimensional sparse matrices can be efficiently stored in databases or communicated to grid-computing devices.
Abstract:
Interactive visualizations of a convolutional neural network are provided. For example, a graphical user interface (GUI) can include a matrix having symbols indicating feature-map values that represent likelihoods of particular features being present or absent at various locations in an input to a convolutional neural network. Each column in the matrix can have feature-map values generated by convolving the input to the convolutional neural network with a respective filter for identifying a particular feature in the input. The GUI can detect, via an input device, an interaction indicating that that the columns in the matrix are to be combined into a particular number of groups. Based on the interaction, the columns can be clustered into the particular number of groups using a clustering method. The matrix in the GUI can then be updated to visually represent each respective group of columns as a single column of symbols within the matrix.
Abstract:
A system for machine training can comprise one or more data processors and a non-transitory computer-readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform operations including: accessing a dataset comprising data tracking a plurality of features; determining a series of values for a regularization parameter of a sparse support vector machine model, the series including an initial regularization value and a next regularization value; computing an initial solution to the sparse support vector machine model for the initial regularization value; identifying, using the initial solution, inactive features of the sparse support vector machine model for the next regularization value; and computing a next solution to the sparse support vector machine model for the next regularization value, wherein computing the next solution includes excluding the inactive features.
Abstract:
This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules.
Abstract:
This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules.
Abstract:
In one example, a system can execute a first machine-learning model to determine an overall classification for a textual dataset. The system can also determine classification scores indicating the level of influence that each token in the textual dataset had on the overall classification. The system can select a first subset of the tokens based on their classification scores. The system can also execute a second machine-learning model to determine probabilities that the textual dataset falls into various categories. The system can determine category scores indicating the level of influence that each token had on a most-likely category determination. The system can select a second subset of the tokens based on their category scores. The system can then generate a first visualization depicting the first subset of tokens color-coded to indicate their classification scores and a second visualization depicting the second subset of tokens color-coded to indicate their category scores.
Abstract:
A computing system receives a collection comprising multiple sets of ordered terms, including a first set. The system generates a dataset indicating an association between each pair of terms within a same set of the collection by generating co-occurrence score(s) for the first set. The system generates computed probabilities based on the co-occurrence score(s) for the first set. The computed probabilities indicate a likelihood that one term in a given pair of terms of the collection appears in a given set of the collection given that another term in the given pair of terms of the collection occurs. The system smoothes the computed probabilities by adding one or more random observations. The system generates one or more association indications for the first set based on the smoothed computed probabilities. The system outputs an indication of the dataset. Additionally, or alternatively, based on association measure(s), the system generates a virtual term.
Abstract:
Recurrent neural networks (RNNs) can be visualized. For example, a processor can receive vectors indicating values of nodes in a gate of a RNN. The values can result from processing data at the gate during a sequence of time steps. The processor can group the nodes into clusters by applying a clustering method to the values of the nodes. The processor can generate a first graphical element visually indicating how the respective values of the nodes in a cluster changed during the sequence of time steps. The processor can also determine a reference value based on multiple values for multiple nodes in the cluster, and generate a second graphical element visually representing how the respective values of the nodes in the cluster each relate to the reference value. The processor can cause a display to output a graphical user interface having the first graphical element and the second graphical element.
Abstract:
Training data for training a neural network usable for electronic sentiment analysis can be automatically constructed. For example, an electronic communication usable for training the neural network and including multiple characters can be received. A sentiment dictionary including multiple expressions mapped to multiple sentiment values representing different sentiments can be received. Each expression in the sentiment dictionary can be mapped to a corresponding sentiment value. An overall sentiment for the electronic communication can be determined using the sentiment dictionary. Training data usable for training the neural network can be automatically constructed based on the overall sentiment of the electronic communication. The neural network can be trained using the training data. A second electronic communication including an unknown sentiment can be received. At least one sentiment associated with the second electronic communication can be determined using the neural network.