-
1.
公开(公告)号:US11297031B2
公开(公告)日:2022-04-05
申请号:US16921568
申请日:2020-07-06
Applicant: Microsoft Technology Licensing, LLC
Inventor: Georgi M. Chalakov , Shane Kumar Mainali , Thomas Leo Marquardt , Zichen Sun , Maneesh Sah , Esfandiar Manii , Saurabh Pant , Dana Yulian Kaban , Saher B. Ahwal , Jun Chen , Da Zhou , Amit Pratap Singh , Junhua Gu , Shaoyu Zhang , Wei Chen , Jingchao Zhang , Quan Zhang , Arild Einar Skjoldsvold
IPC: H04L29/12 , H04L61/4505 , G06F16/13 , G06F16/178 , G06F16/172 , G06F16/14 , G06F16/185 , H04L67/1097 , H04L67/568 , G06F16/16 , G06F16/957
Abstract: A service enables a command that refers to a file system object using a hierarchical namespace identifier to be executed against the file system object in a flat namespace. The service selectively distributes the command to one of a plurality of name resolution nodes based on a directory name included in the hierarchical namespace identifier. The identified node resolves the directory name to a flat namespace identifier that is used to execute the command against the flat namespace. After communicating with at least one storage node to resolve a directory name, each name resolution node stores a mapping of the directory name to the corresponding flat namespace identifier in a cache, so that subsequent resolutions of that directory name may be performed more efficiently. Cache entries may be invalidated when an operation occurs that impacts the relevant mapping and/or based on system considerations such as cache expiry.
-
公开(公告)号:US10901648B2
公开(公告)日:2021-01-26
申请号:US16202283
申请日:2018-11-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shane Kumar Mainali , Quan Zhang , Kaviyarasan Rajendran , Sundar P. Subramani , Andrew Edwards , Maneesh Sah , Krisjan David Fritz , Michael Hauss , Jianhua Yan , Michael Roberson
Abstract: A cloud storage system includes a processor and a non-transitory computer-readable medium to store blob table management instructions for execution by the processor. The blob table management instructions are configured to manage a plurality of storage requests for a blob stored in a storage stamp as snapshots in a blob table and selectively create a user snapshot of at least one of the snapshots in the blob table. When automatic snapshots are enabled, the blob table management instructions are configured to receive a first request to overwrite the blob. If the first request does not further specify a key of the one of the snapshots in the blob table, the blob table management instructions are configured to add a new snapshot to the blob table and maintain storage of a prior snapshot of the blob for a maximum period.
-
3.
公开(公告)号:US20190394162A1
公开(公告)日:2019-12-26
申请号:US16015774
申请日:2018-06-22
Applicant: Microsoft Technology Licensing, LLC
Inventor: Georgi M. Chalakov , Shane Kumar Mainali , Thomas Leo Marquardt , Zichen Sun , Maneesh Sah , Esfandiar Manii , Saurabh Pant , Dana Yulian Kaban , Saher B. Ahwal , Jun Chen , DA Zhou , Amit Pratap Singh , Junhua Gu , Shaoyu Zhang , Wei Chen , Jingchao Zhang , Quan Zhang , Arild Einar Skjoldsvold
Abstract: A service enables a command that refers to a file system object using a hierarchical namespace identifier to be executed against the file system object in a flat namespace. The service selectively distributes the command to one of a plurality of name resolution nodes based on a directory name included in the hierarchical namespace identifier. The identified node resolves the directory name to a flat namespace identifier that is used to execute the command against the flat namespace. After communicating with at least one storage node to resolve a directory name, each name resolution node stores a mapping of the directory name to the corresponding flat namespace identifier in a cache, so that subsequent resolutions of that directory name may be performed more efficiently. Cache entries may be invalidated when an operation occurs that impacts the relevant mapping and/or based on system considerations such as cache expiry.
-
公开(公告)号:US20180307736A1
公开(公告)日:2018-10-25
申请号:US15497022
申请日:2017-04-25
Applicant: Microsoft Technology Licensing, LLC
Inventor: Venkates Paramasivam Balakrishnan , Krishnan Varadarajan , Maneesh Sah , Jegan Devaraju , Advait Kumar Mishra , Zichen Sun , Shane Kumar Mainali
IPC: G06F17/30
CPC classification number: G06F17/30575 , G06F17/30283 , G06F17/30318 , G06F17/30327 , G06F2201/84
Abstract: A snapshot of data from a table associated with a particular user may be generated. Tree data structures that are distributed across multiple computer systems may be accessed. Each of the tree structures may include data associated with one or more users. At least one tree data structure of the tree data structures that includes data associated with the particular user of the one or more users may be identified. The at least one tree data structure may then be filtered. Filtering may comprise identifying only data that is associated with the particular user. A snapshot of the data associated with the particular user may be generated. Generating the snapshot of the data associated with the particular user comprises generating a data structure that is configured to map to each data page of the at least one tree data structure that includes data associated with the particular user.
-
公开(公告)号:US10909074B2
公开(公告)日:2021-02-02
申请号:US15490741
申请日:2017-04-18
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Shane Kumar Mainali , Krishnan Varadarajan , Quan Zhang , Jegan Devaraju , Ziehen Sun , Hao Feng , Ju Wang , Manish Chablani
IPC: G06F16/13 , G06F16/901
Abstract: Embodiments provide a method to collect aggregate information or usage data quickly and efficiently with minimal lag. Additionally, the system can use this aggregate information internally for improved load balancing, better data placement, optimization, and enhanced debugging. The system can quickly look at aggregate information across a huge amount of data and drill down cheaply because the aggregate information is generated using existing processes. Aggregated statistics storage and collection may be built on top of an LSM tree used to store a persistent index for a cloud storage system. The statistics may also represent the result of an operation (e.g., max, min, sum, average) on selected parameter(s) or attribute(s) of stored data. Aggregate statistics values may be efficiently maintained during index merge and garbage collection processes or any other index management. As delta LSM trees are merged into a base LSM tree, the aggregates are updated in delta fashion.
-
公开(公告)号:US10789217B2
公开(公告)日:2020-09-29
申请号:US16015823
申请日:2018-06-22
Applicant: Microsoft Technology Licensing, LLC
Inventor: Georgi M. Chalakov , Shane Kumar Mainali , Thomas Leo Marquardt , Zichen Sun , Maneesh Sah , Esfandiar Manii , Saurabh Pant , Dana Yulian Kaban , Saher B. Ahwal , Jun Chen , Da Zhou , Amit Pratap Singh , Junhua Gu , Shaoyu Zhang , Wei Chen , Jingchao Zhang , Quan Zhang
IPC: G06F16/188 , G06F16/13 , G06F16/185
Abstract: Methods, systems, and apparatuses are provided for a storage system that implements a hierarchical namespace service. A storage system includes a plurality of physical nodes and a plurality of sets of virtual nodes. Each set of virtual nodes is managed by a corresponding physical node. Each virtual node is configured to manage a respective set of directory blocks. Each directory block is a respective partition of a storage namespace and is managed by a corresponding single virtual node. Each virtual node maintains a directory block map. The directory block map maps file system object names in a hierarchical namespace to entity block identifiers in the flat namespace for entity blocks (files and folders) stored in directories corresponding to the managed set of directory blocks. Load balancing may be performed by moving virtual nodes between physical nodes, and by splitting directory blocks.
-
公开(公告)号:US20190392053A1
公开(公告)日:2019-12-26
申请号:US16015823
申请日:2018-06-22
Applicant: Microsoft Technology Licensing, LLC
Inventor: Georgi M. Chalakov , Shane Kumar Mainali , Thomas Leo Marquardt , Zichen Sun , Maneesh Sah , Esfandiar Manii , Saurabh Pant , Dana Yulian Kaban , Saher B. Ahwal , Jun Chen , Da Zhou , Amit Pratap Singh , Junhua Gu , Shaoyu Zhang , Wei Chen , Jingchao Zhang , Quan Zhang
IPC: G06F17/30
Abstract: Methods, systems, and apparatuses are provided for a storage system that implements a hierarchical namespace service. A storage system includes a plurality of physical nodes and a plurality of sets of virtual nodes. Each set of virtual nodes is managed by a corresponding physical node. Each virtual node is configured to manage a respective set of directory blocks. Each directory block is a respective partition of a storage namespace and is managed by a corresponding single virtual node. Each virtual node maintains a directory block map. The directory block map maps file system object names in a hierarchical namespace to entity block identifiers in the flat namespace for entity blocks (files and folders) stored in directories corresponding to the managed set of directory blocks. Load balancing may be performed by moving virtual nodes between physical nodes, and by splitting directory blocks.
-
公开(公告)号:US10248562B2
公开(公告)日:2019-04-02
申请号:US15640349
申请日:2017-06-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Shane Kumar Mainali , Rushi Srinivas Surla , Peter Bodik , Ishai Menache , Yang Lu
Abstract: In an embodiment, a partition cost of one or more of the plurality of partitions and a data block cost for one or more data blocks that may be subjected to a garbage collection operation are determined. The partition cost and the data block cost are combined into an overall reclaim cost by specifying both the partition cost and the data block cost in terms of a computing system latency. A byte constant multiplier that is configured to modify the overall reclaim cost to account for the amount of data objects that may be rewritten during the garbage collection operation may be applied. The one or more partitions and/or one or more data blocks that have the lowest overall reclaim cost while reclaiming an acceptable amount of data block space may be determined and be included in a garbage collection schedule.
-
公开(公告)号:US11055010B2
公开(公告)日:2021-07-06
申请号:US16561985
申请日:2019-09-05
Applicant: Microsoft Technology Licensing, LLC
Inventor: Rushi Srinivas Surla , Maneesh Sah , Shane Kumar Mainali , Wei Lin , Girish Saini , Arild Einar Skjolsvold
Abstract: One example provides a method of migrating a data partition from a first storage cluster to a second storage cluster, the method including determining that the data partition meets a migration criteria for migrating from the first storage cluster to the second storage cluster, on the first storage cluster, preparing partition metadata to be transferred, the partition metadata describing one or more streams within the data partition and one or more extents within each stream, transferring the partition metadata from the first storage cluster to the second storage cluster, directing new transactions associated with the data partition to the second storage cluster, including while the one or more extents reside at the first storage cluster, on the first storage cluster, changing an access attribute of the one or more extents within the data partition to read-only, and on the second storage cluster, performing new ingress for the data partition.
-
公开(公告)号:US10817498B2
公开(公告)日:2020-10-27
申请号:US16018553
申请日:2018-06-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Georgi Chalakov , Shane Kumar Mainali , Thomas Leo Marquardt , Zichen Sun , Maneesh Sah , Wei Chen , Dana Yulian Kaban , Saher B. Ahwal , Shaoyu Zhang , Jingchao Zhang , Quan Zhang , Jun Chen , Esfandiar Manii , Saurabh Pant , Da Zhou , Amit Pratap Singh , Junhua Gu
IPC: G06F17/30 , G06F16/23 , G06F16/16 , G06F16/182 , G06F16/22
Abstract: Methods, systems, and programs provide for executing distributed transactions in a cloud storage system with a hierarchical namespace. One method includes receiving a request with operations to be executed atomically. Further, nodes are identified for executing the operations, each node having a respective clock and having at least part of a transactions table for controlling updates to entities. Each clock is one of a loosely-synchronized, a strictly-synchronized clock, a logical, or a physical clock. Additionally, the nodes process the operations, which includes setting a commit timestamp (CS) to a value of the clock in the node if the node is a first node in the processing. One node coordinates the transactions, and may be one of the nodes executing transactions. If the clock in the node is less than a current value of the CS, the node waits for the clock to reach the current value of the CS and the CS is updated. The transactions table is updated based on the value of the CS, the atomic execution is committed utilizing the final value of the CS, and a status is returned.
-
-
-
-
-
-
-
-
-