Ground truth generation for machine learning based quality assessment of corpora
Abstract:
A mechanism is provided in a computing device configured with instructions executing on a processor of the computing device to implement a ground truth generation system for quality assessment scoring of articles in a corpus. The ground truth generation system receives recommendations of a set of recommended articles from subject matter experts. The ground truth generation system identifies a set of non-recommended articles. A topic clustering component within the ground truth generation system performs topic clustering on a combination of the set of recommended articles and the set of non-recommended articles to form a set of topic clusters containing recommended articles and non-recommended articles. The ground truth generation system identifies a first number of recommended articles and a second number of non-recommended articles in each of the set of topic clusters to form a quality assessment training set. The mechanism trains a quality assessment machine learning model using the quality assessment training set.
Information query
Patent Agency Ranking
0/0