-
公开(公告)号:US08612696B2
公开(公告)日:2013-12-17
申请号:US13592746
申请日:2012-08-23
Applicant: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
Inventor: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
CPC classification number: G06F3/0619 , G06F3/0608 , G06F3/064 , G06F3/0641 , G06F3/065 , G06F3/0683 , G06F3/0689 , G06F11/1435 , G06F11/1453 , G06F11/1464
Abstract: A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system preliminarily checks in a memory having a relatively low latency whether one of the plurality of data segments may have been stored previously in a data segment repository. The memory having the relatively low latency stores data segment information. In the event that the preliminary check determines that one of the plurality of data segments may have been stored in the data segment repository, a memory having a relatively higher latency is checked to determine whether the data segment has been stored previously in the data segment repository.
-
公开(公告)号:US08527455B2
公开(公告)日:2013-09-03
申请号:US12890688
申请日:2010-09-26
Applicant: R. Hugo Patterson
Inventor: R. Hugo Patterson
IPC: G06F17/30
CPC classification number: G06F17/30174 , G06F17/30159 , G06F17/30212
Abstract: Seeding replication is disclosed. One or more but not all files stored on a deduplicated storage system are selected to be replicated. One or more segments referred to by the selected one or more but not all files are determined. A data structure is created that is used to indicate that at least the one or more segments are to be replicated. In the event that an indication based at least in part on the data structure indicates that a candidate segment stored on the deduplicating storage system is to be replicated, the candidate segment is replicated.
Abstract translation: 公开了播种复制。 选择存储在重复数据删除的存储系统上的一个或多个但不是全部文件进行复制。 确定所选择的一个或多个但不是全部文件所引用的一个或多个段。 创建用于指示至少一个或多个段被复制的数据结构。 在至少部分基于数据结构的指示指示将复制存储在重复数据删除存储系统上的候选片段的情况下,复制候选片段。
-
公开(公告)号:US08234413B2
公开(公告)日:2012-07-31
申请号:US13152110
申请日:2011-06-02
Applicant: Kai Li , Umesh Maheshwari , R. Hugo Patterson
Inventor: Kai Li , Umesh Maheshwari , R. Hugo Patterson
IPC: G06F15/16
CPC classification number: G06F17/30156
Abstract: Selecting a segment boundary within block b is disclosed. A first anchor location j|j+1 is identified wherein a value of f(b[j−A+1 . . . j+B]) satisfies a constraint and wherein A and B are non-negative integers. A segment boundary location k|k+1 is determined wherein k is greater than minimum distance from j.
Abstract translation: 公开了在块b内选择段边界。 识别第一锚定位置j | j + 1,其中f(b [j-A + 1 ... j + B])的值满足约束,并且其中A和B是非负整数。 确定分段边界位置k | k + 1,其中k大于距j的最小距离。
-
公开(公告)号:US20120041957A1
公开(公告)日:2012-02-16
申请号:US13280195
申请日:2011-10-24
Applicant: Windsor W. Hsu , R. Hugo Patterson
Inventor: Windsor W. Hsu , R. Hugo Patterson
IPC: G06F17/30
CPC classification number: G06F17/30964 , Y10S707/99956
Abstract: Techniques for efficiently indexing and searching similar data are described herein. According to one embodiment, in response to a query for one or more terms received from a client, a query index is accessed to retrieve a list of one or more super files. Each super file is associated with a group of similar files. Each super file includes terms and/or sequences of terms obtained from the associated group of similar files. Thereafter, the super files representing groups of similar files are presented to the client, where each of the super files includes at least one of the queried terms. Other methods and apparatuses are also described.
Abstract translation: 本文描述了用于有效地索引和搜索类似数据的技术。 根据一个实施例,响应于从客户端接收的对一个或多个条件的查询,访问查询索引以检索一个或多个超级文件的列表。 每个超级文件与一组相似的文件相关联。 每个超级文件包括从相关联的相似文件组获得的术语和/或术语序列。 此后,将表示相似文件的组的超级文件呈现给客户端,其中每个超级文件包括至少一个查询的术语。 还描述了其它方法和装置。
-
公开(公告)号:US08099401B1
公开(公告)日:2012-01-17
申请号:US11779486
申请日:2007-07-18
Applicant: Windsor W. Hsu , R. Hugo Patterson
Inventor: Windsor W. Hsu , R. Hugo Patterson
CPC classification number: G06F17/30964 , Y10S707/99956
Abstract: Techniques for efficiently indexing and searching similar data are described herein. According to one embodiment, in response to a query for one or more terms received from a client, a query index is accessed to retrieve a list of one or more super files. Each super file is associated with a group of similar files. Each super file includes terms and/or sequences of terms obtained from the associated group of similar files. Thereafter, the super files representing groups of similar files are presented to the client, where each of the super files includes at least one of the queried terms. Other methods and apparatuses are also described.
Abstract translation: 本文描述了用于有效地索引和搜索类似数据的技术。 根据一个实施例,响应于从客户端接收的对一个或多个条件的查询,访问查询索引以检索一个或多个超级文件的列表。 每个超级文件与一组相似的文件相关联。 每个超级文件包括从相关联的相似文件组获得的术语和/或术语序列。 此后,将表示相似文件的组的超级文件呈现给客户端,其中每个超级文件包括至少一个查询的术语。 还描述了其它方法和装置。
-
公开(公告)号:US20110302326A1
公开(公告)日:2011-12-08
申请号:US13152110
申请日:2011-06-02
Applicant: Kai Li , Umesh Maheshwari , R. Hugo Patterson
Inventor: Kai Li , Umesh Maheshwari , R. Hugo Patterson
IPC: G06F15/16
CPC classification number: G06F17/30156
Abstract: Selecting a segment boundary within block b is disclosed. A first anchor location j|j+1 is identified wherein a value of f(b[j−A+1 . . . j+B]) satisfies a constraint and wherein A and B are non-negative integers. A segment boundary location k|k+1 is determined wherein k is greater than minimum distance from j.
Abstract translation: 公开了在块b内选择段边界。 识别第一锚定位置j | j + 1,其中f(b [j-A + 1 ... j + B])的值满足约束,并且其中A和B是非负整数。 确定分段边界位置k | k + 1,其中k大于距j的最小距离。
-
公开(公告)号:US07769967B2
公开(公告)日:2010-08-03
申请号:US12079766
申请日:2008-03-28
Applicant: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
Inventor: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
CPC classification number: G06F3/0619 , G06F3/0608 , G06F3/064 , G06F3/0641 , G06F3/065 , G06F3/0683 , G06F3/0689 , G06F11/1435 , G06F11/1453 , G06F11/1464
Abstract: A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system determines whether a data segment has been stored previously in a low latency memory. In the event that the data segment is determined to have been stored previously, an identifier for the previously stored data segment is returned.
Abstract translation: 公开了一种用于提供有效数据存储的系统和方法。 在数据流中接收多个数据段。 系统确定数据段是否先前存储在低延迟存储器中。 在数据段被确定为先前存储的情况下,返回先前存储的数据段的标识符。
-
公开(公告)号:US07689633B1
公开(公告)日:2010-03-30
申请号:US10942174
申请日:2004-09-15
Applicant: Kai Li , R. Hugo Patterson , Ming Benjamin Zhu , Allan Bricker , Richard Johnsson , Sazzala Reddy , Jeffery Zabarsky
Inventor: Kai Li , R. Hugo Patterson , Ming Benjamin Zhu , Allan Bricker , Richard Johnsson , Sazzala Reddy , Jeffery Zabarsky
CPC classification number: G06F17/30067 , G06F3/0608 , G06F3/0641 , G06F3/0659 , G06F3/067
Abstract: A network file system-based data storage system that converts random I/O requests into a piecewise sequential data structure to facilitate variable length data segment redundancy identification and elimination. For one embodiment of the invention a stateless network file system is employed. For one such embodiment, that provides multiple-client access to stored data, multiple Writes are buffered and then broken into variable length data segments. Redundant segment elimination is then effected. One embodiment of the invention allows sharing of the variable length data segments among files.
Abstract translation: 一种基于网络文件系统的数据存储系统,可将随机I / O请求转换为分段顺序数据结构,以便可变长度的数据段冗余识别和消除。 对于本发明的一个实施例,采用无状态网络文件系统。 对于一个这样的实施例,其提供对存储的数据的多客户端访问,多个写入被缓冲,然后被分解成可变长度的数据段。 然后进行冗余段消除。 本发明的一个实施例允许在文件之间共享可变长度数据段。
-
公开(公告)号:US07599932B2
公开(公告)日:2009-10-06
申请号:US11584433
申请日:2006-10-20
Applicant: R. Hugo Patterson
Inventor: R. Hugo Patterson
CPC classification number: G06F17/30153 , Y10S707/99936 , Y10S707/99952 , Y10S707/99953
Abstract: A system and method are disclosed for processing a data stream. A data segment is received. It is determined whether the data segment has been previously stored. In the event that the data segment is determined not to have been previously stored, a unique identifier for specifying the data segment in a representation of the data stream is generated.
Abstract translation: 公开了一种用于处理数据流的系统和方法。 接收数据段。 确定数据段是否已经被预先存储。 在确定数据段不被预先存储的情况下,生成用于在数据流的表示中指定数据段的唯一标识符。
-
公开(公告)号:US07434015B2
公开(公告)日:2008-10-07
申请号:US11974961
申请日:2007-10-16
Applicant: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
Inventor: Ming Benjamin Zhu , Kai Li , R. Hugo Patterson
IPC: G06F12/00
CPC classification number: G06F11/1453 , G06F11/1464 , G06F12/0866 , Y10S707/99952
Abstract: A system and method are disclosed for providing efficient data storage. A data stream comprising a plurality of data segments is received. The system determines whether one of the plurality of data segments has been stored previously using a summary in a low latency memory; in the event that the data segment is determined not to have been stored previously, assigning an identifier to the data segment.
-
-
-
-
-
-
-
-
-