Split elimination in mapreduce systems

Invention Grant

US10691646B2 Split elimination in mapreduce systems 有权

Please log in to see more content

Patent Title: Split elimination in mapreduce systems
Application No.: US15912410

Application Date: 2018-03-05
Publication No.: US10691646B2

Publication Date: 2020-06-23
Inventor: Mohamed Eltabakh , Peter J. Haas , Fatma Ozcan , Mir Hamid Pirahesh , John (Yannis) Sismanis , Jan Vondrak
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Foley Hoag, LLP
Agent Erik Huestis; Stephen Kenny
Main IPC: G06F7/00
IPC: G06F7/00 ; G06F16/182

Abstract:

Embodiments of the present invention relate to elimination of blocks such as splits in distributed processing systems such as MapReduce systems using the Hadoop Distributed Filing System (HDFS). In one embodiment, a method of and computer program product for optimizing queries in distributed processing systems are provided. A query is received. The query includes at least one predicate. The query refers to data. The data includes a plurality of records. Each record comprises a plurality of values in a plurality of attributes. Each record is located in at least one of a plurality of blocks of a distributed file system. Each block has a unique identifier. For each block of the distributed file system, at least one value cluster is determined for an attribute of the plurality of attributes. Each value cluster has a range. The predicate of the query is compared with the at least one value cluster of each block. The query is executed against only those blocks where the predicate is met by at least one value cluster.

Public/Granted literature

US20180196828A1 SPLIT ELIMINATION IN MAPREDUCE SYSTEMS Public/Granted day:2018-07-12

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F7/00	通过待处理的数据的指令或内容进行运算的数据处理的方法或装置（逻辑电路入H03K19/00）