Patent Issued for Managing access control of data pipelines configured on a cloud platform (USPTO 11843664): Humana Inc.
2024 JAN 01 (NewsRx) -- By a
The patent’s assignee for patent number 11843664 is
News editors obtained the following quote from the background information supplied by the inventors: “Organizations are increasingly storing and processing large amounts of data. Organizations often store their data in a repository that allows storage of unstructured and structured data, referred to as data lakes. Monolithic data lakes and other large, centralized data storage and access systems complicate the management of resources used for processing data. Large data lakes can also make it difficult for non-technical users to access and interact with relevant data. Organizations often find it difficult to predict the number of resources needed for processing the data stored in the data lakes. Organizations are increasingly using cloud platforms for their infrastructure needs. Cloud platforms provide infrastructure that can be scaled on demand including computing resources, storage resources, networking, software, and so on. Examples of such cloud platforms include MICROSOFT AZURE®, GOOGLE CLOUD PLATFORM (GCP)®,
“Conventional techniques for complex data processing use monolithic architecture that integrates large systems for an organization such as relational databases, extract transform and load (ETL) tools, and data analytics tools. Such monolithic architectures are complex and difficult to manage. Furthermore, users that use or develop the data pipelines typically have access to all the data processed by the data pipeline or large portions of data processed by the data pipeline. As a result, if a user account is compromised, the possible exposure to the data can be very significant. For example, the user account may be compromised by a malicious user thereby exposing large amount of data including sensitive information to the malicious user. Even if the data is not exposed to a malicious user, there may be other issues caused due to users having access to more information than they need. For example, if a developer causes data corruption due to a defect in a program, the amount of data that can get corrupted can be very large. For example, if a defect in a program or a script causes the program or script to overwrite data, to delete data, or to incorrectly modify data, the amount of data that is affected by the defect can be large.”
As a supplement to the background information on this patent, NewsRx correspondents also obtained the inventors’ summary information for this patent: “A system configures and executes data pipelines on cloud platforms. The system performs continuous integration/continuous delivery of updates to the data pipeline based on changes to declarative specifications based on schemas that define the data model output by the data pipeline.
“The system manages access control of a data pipeline deployed on a cloud platform. The system receives a specification of the data pipeline. The specification of the data pipeline specifies a plurality of data pipeline units. At least some of the data pipeline units receive data output by a previous data pipeline unit and provide data as input to a next data pipeline unit. The system identifies a cloud platform for deployment and execution of the data pipeline. The system generates instructions from the specification of the data pipeline for configuring the data pipeline units on the cloud platform. The system creates a connection with the cloud platform. For each of the plurality of data pipeline units, the system creates a runtime system account on the cloud platform. The runtime system account has access to one or more storage units of the data pipeline unit and is used by the system during execution of the data pipeline. The system provisions computing infrastructure on the cloud platform for the data pipeline unit. The system configures the data pipeline on the cloud platform by performing the following steps for each data pipeline unit. The system creates a group of runtime system accounts and adds the following system accounts to the group (1) the runtime system account created for the data pipeline unit and (2) each runtime system account created for a data pipeline unit receiving as input, data output by the data pipeline unit. The system grant-grants read access to the output data of the data pipeline unit to each system account of the group. The system executes the data pipeline by executing instructions of each data pipeline unit as input data becomes available for the data pipeline unit. The use of different user groups including different runtime system accounts for different data pipeline units of the data pipeline results in limiting the scope of data access of different system accounts, thereby implementing a least privilege policy for executing the data pipeline.
“According to an embodiment, the system creates an infrastructure system account for each data pipeline unit of the plurality of data pipeline units. The infrastructure system account has privileges to configure resources associated with the data pipeline unit. The system uses the infrastructure account is for provisioning computing infrastructure on the cloud platform for the data pipeline unit.
“The techniques disclosed herein provide various benefits including distributed execution of the data pipeline, modular upgrades to portions of the data pipeline, selecting re-execution of the data pipeline, decentralized ownership and scaling using autonomous teams and individual data pipeline development, and so on. The infrastructure accounts and the runtime accounts are system accounts for use by system processes, for example, processes that execute the data pipeline on the cloud platform.
“According to an embodiment, the data pipeline unit has a plurality of output ports and the system creates multiple groups of runtime system accounts for a data pipeline unit, each group of runtime system accounts for an output port of the data pipeline unit. For example, the plurality of output ports may include output port O1 and O2, and system creates a group G1 of runtime system accounts associated with the output port O1 and a group G2 of runtime system for the output port O2. The group G1 of runtime system accounts includes the runtime system account created for the data pipeline unit and each runtime system account created for a data pipeline unit receiving data output using the output port O1. The group G2 of runtime system accounts includes the runtime system account created for the data pipeline unit and each runtime system account created for a data pipeline unit receiving data output using the output port O2. The system grants read access to the data generated by the output port O1 to the runtime system accounts of the group G1 and grants read access to the data generated by the output port O2 to the runtime system accounts of the group G2. A port may also be referred to herein as an interface. Accordingly, an output port is an output interface of the data pipeline unit and an input port is an input interface of the data pipeline unit.”
The claims supplied by the inventors are:
“1. A computer-implemented method for managing access control of a data pipeline deployed on a cloud platform, the method comprising: receiving a specification of the data pipeline comprising a plurality of data pipeline units, wherein at least some of the data pipeline units receive data output by a previous data pipeline unit and provide data as input to a next data pipeline unit, wherein a data pipeline unit is configured to store data in one or more storage units; identifying a cloud platform for deployment and execution of the data pipeline; generating instructions from the specification of the data pipeline for configuring the plurality of data pipeline units on the cloud platform; creating a connection with the cloud platform; for each of the plurality of data pipeline units: creating a runtime system account on the cloud platform, the runtime system account having access to the one or more storage units of the data pipeline unit, and provisioning computing infrastructure on the cloud platform for the data pipeline unit; configuring the data pipeline on the cloud platform, comprising, for each data pipeline unit: creating a group of runtime system accounts including (1) the runtime system account created for the data pipeline unit and (2) each runtime system account created for a data pipeline unit receiving as input, data output by the data pipeline unit, and granting read access to the output data of the data pipeline unit to each runtime system account in the group of runtime system accounts; executing the data pipeline comprising, executing instructions of each data pipeline unit responsive to data being available as input to the data pipeline unit; receiving a modified specification of the data pipeline unit of the data pipeline, wherein the data pipeline unit provides input to a first set of data pipeline units, wherein the data pipeline unit is associated with a first group of system accounts having read access to output ports of the data pipeline unit; reconfiguring the data pipeline unit to conform to the modified specification of the data pipeline unit, wherein the reconfigured data pipeline unit provides input to a second set of data pipeline units; and modifying the group of system accounts having read access to output ports of the data pipeline unit according to a difference between the second set of data pipeline units and the first set of data pipeline units; wherein modifying the group of system accounts comprises, responsive to determining that the second set of data pipeline units includes a particular data pipeline unit that is absent from the first set of data pipeline units, adding a system account corresponding to the particular data pipeline unit to the group of system accounts.
“2. The computer-implemented method of claim 1, further comprising: for each data pipeline unit of the plurality of data pipeline units, creating an infrastructure system account with privileges to configure resources associated with the data pipeline unit, wherein the infrastructure account is used for provisioning computing infrastructure on the cloud platform for the data pipeline unit.
“3. The computer-implemented method of claim 2, wherein the infrastructure accounts and the runtime accounts are system accounts for use by system processes.
“4. The computer-implemented method of claim 1, wherein the data pipeline unit has a plurality of output ports comprising a first output port and a second output port, wherein the group of runtime system accounts is a first group of runtime system accounts associated with the first output port, wherein the first group of runtime system accounts includes runtime system accounts created for data pipeline units receiving as input, data output by the first output port of the data pipeline unit, the method further comprising: creating a second group of runtime system accounts including (1) the runtime system account created for the data pipeline unit and (2) each runtime system account created for a data pipeline unit receiving as input, data output by the second output port of the data pipeline unit, and granting read access to the output data of the second output port of the data pipeline unit to each system account of the second group.
“5. The computer-implemented method of claim 1, wherein the data pipeline unit outputs a first data set categorized as having a first level of sensitivity and a second data set categorized as having a second level of sensitivity, wherein the group of runtime system accounts is a first group of runtime system accounts that has access to data categorized as having a first level of sensitivity, the method further comprising: creating a second group of runtime system accounts including (1) the runtime system account created for the data pipeline unit and (2) one or more runtime system accounts created for a data pipeline unit receiving as input, data output by the data pipeline unit and categorized as having a second level of sensitivity.
“6. The computer-implemented method of claim 1, further comprising: for each data pipeline unit of at least a subset of the plurality of data pipeline units, creating a group of user accounts with privileges to access the output data generated by the data pipeline unit.
“7. The computer-implemented method of claim 1, wherein generated instructions for the data pipeline comprise instructions for each data pipeline unit, wherein the instructions for a data pipeline unit comprise: a system configuration for the data pipeline unit, the system configuration comprising instructions for configuring: one or more storage units on the cloud platform, a cluster of servers for execution of the data pipeline unit on the cloud platform, and one or more processing engines for executing instructions of the data pipeline unit, and a deployment package comprising: data flow instructions for orchestrating the flow of data across resources of the data pipeline unit, and a transformation processing instructions package for performing the one or more data transformations of the data pipeline unit.
“8. The computer-implemented method of claim 1, wherein an output of the data pipeline is one of: a data stream that provides data elements at various time intervals; or a batch input that provides a data set comprising a plurality of data elements at one point in time.
“9. The computer-implemented method of claim 1, wherein the specification of a data pipeline unit comprises: inputs of the data pipeline unit, outputs of the data pipeline unit, one or more storage units used by the data pipeline unit, and one or more data transformations performed by the data pipeline unit.
“10. The computer-implemented method of claim 1, wherein the plurality of data pipeline units comprises: a set of input data pipeline units configured to receive input data processed by the data pipeline from one or more data sources; a set of output data pipeline units configured to provide output data processed by the data pipeline to one or more consumer systems; and a set of internal data pipeline units, wherein each internal data pipeline unit receives data output by a previous data pipeline unit and provides input to a next data pipeline unit of the data pipeline.”
There are additional claims. Please visit full patent to read further.
For additional information on this patent, see: Lai, Tian. Managing access control of data pipelines configured on a cloud platform.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Patent Issued for Strong authentication via distributed stations (USPTO 11842803): Imprivata Inc.
Patent Application Titled “Method Of Controlling For Undesired Factors In Machine Learning Models” Published Online (USPTO 20230401647): State Farm Mutual Automobile Insurance Company
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News