Systems and methods for training an information extraction transformer model architecture
Abstract:
Certain aspects of the disclosure provide systems and methods for training an information extraction transformer model architecture directed to pre-training a first multimodal transformer model on an unlabeled dataset, training a second multimodal transformer model on a first labeled dataset to perform a key information extraction task processing the unlabeled dataset with the second multimodal transformer model to generate pseudo-labels for the unlabeled dataset, training the first multimodal transformer model based on a second labeled dataset comprising one or more labels, the pseudo-labels generated, or combinations thereof to generate a third multimodal transformer model, generating updated pseudo-labels based on label completion predictions from the third multimodal transformer model, and training the third multimodal transformer model using a noise-aware loss function and the updated pseudo-labels to generate an updated third multimodal transformer model.
Information query
Patent Agency Ranking
0/0