“Privacy Preserving Generative Mechanism For Industrial Time-Series Data Disclosure” in Patent Application Approval Process (USPTO 20230281427): Tata Consultancy Services Limited

2023 SEP 22 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- A patent application by the inventors RUNKANA, VENKATARAMANA (Pune, IN); SAKHINANA, SAGAR SRINIVAS (Pune, IN); SARKAR, RAJAT KUMAR (Pune, IN), filed on May 31, 2022, was made available online on September 7, 2023, according to news reporting originating from Washington, D.C., by NewsRx correspondents.

This patent application is assigned to Tata Consultancy Services Limited (Mumbai, India).

The following quote was obtained by the news editors from the background information supplied by the inventors: “In the era of rapid advances in Artificial Intelligence (AI) deployment of deep learning techniques in the cloud for production, it has several complexities and risks involved relating to privacy, security, fairness, and accountability of data. Usually, regulators and policymakers across the globe present governance protocols such as the General Data Protection Regulation (GDPR), US Health Insurance Portability and Accountability Act (HIPAA), California Consumer Privacy Law (CCPA), European Commission AI Act, etc., to protect the ownership, and confidentiality of sensitive individual user information. These regulations present a Catch-22 of privacy and are mandatory for tech companies to comply with to avoid lawsuits and penalties. These regulations protect the privacy of individuals, encourage anonymization of the sensitive personal information for data-disclosure to be shared with third-parties.

“The gold standards for security techniques in deep learning include cryptography techniques such as Homomorphic Encryption (HE), Secure Multi-party Computation (SMC), Differential Privacy (DP) & Information-Theoretic Privacy for data disclosure, Federated ML, Ethereum blockchain, and Smart contracts. There’s a growing awareness and interest across several industrial data behemoths such as FMCG, oil & gas, aviation, power, semiconductor engineering, manufacturing etc. to prevent membership inference, model inversion, attribute inference, hyperparameter and parameter inference, and property inference by a third-party (adversary) to access unauthorized process plant operational data, which embeds the trade secrets, the product formulations & simultaneously adopting privacy embedded-AI techniques for digital twins to leverage the big data for process control, optimization, uncertainty quantification, etc.

“There is a need and necessity for a mathematical framework to enhance privacy-preserving, trade-off to preserve utility for data monetization of the large-scale industrial and manufacturing plants multivariate mixed-variable time series data. The existing techniques of enabling privacy-preserving mechanisms for data disclosure have lack luster utility and suffer from inherent drawbacks of preserving the original data characteristics in the private dataset generated for data-disclosure.”

In addition to the background information obtained for this patent application, NewsRx journalists also obtained the inventors’ summary information for this patent application: “Embodiments of the disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method and system for privacy preserving generative mechanism for data-disclosure of industrial data is provided.

“In one aspect, a processor-implemented method for privacy preserving generative mechanism for data-disclosure of industrial data is provided. The method includes one or more steps such as receiving, via an input/output interface, a multivariate mixed-variable time series data of a plurality of sensory observations, the cluster-labels associated with the multivariate mixed-variable time series data and a cluster-independent random noise, pre-processing, via one or more hardware processors, the received multivariate mixed-variable time series data, training, via a one or more hardware processors, a plurality of neural networks of a privacy preserving adversarial neural network architecture in two phases, providing, via the one or more hardware processors, a test data to generate a synthetic private dataset for data disclosure using the trained privacy preserving adversarial neural network architecture, and estimating, via the one or more hardware processors, identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset.

“In another aspect, a system for privacy preserving generative mechanism for data-disclosure of industrial data is provided. The system includes an input/output interface configured to receive a multivariate mixed-variable time series data of a plurality of sensory observations, the cluster-labels associated with the multivariate mixed-variable time series data and a cluster-independent random noise, one or more hardware processors and at least one memory storing a plurality of instructions, wherein the one or more hardware processors are configured to execute the plurality of instructions stored in the at least one memory.

“Further, the system is configured to pre-process the received a multivariate mixed-variable time series data, wherein the pre-process includes normalizing continuous feature variables by bounding heterogeneous measurements between a predefined range through a min-max scaling technique; and transforming discreate feature variables by representing as a sparse binary vector through a one-hot encoding technique. Further, the system is configured to train a plurality of neural networks of a privacy preserving adversarial neural network architecture in two phases, provide a test data to generate a synthetic private dataset for data disclosure using the trained privacy preserving adversarial neural network architecture and estimate an identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset, wherein the estimation satisfies a predefined process-identifiability criteria.

“In yet another aspect, one or more non-transitory machine-readable information storage mediums are provided comprising one or more instructions, which when executed by one or more hardware processors causes a method for privacy preserving generative mechanism for data-disclosure of industrial data is provided. The method includes one or more steps such as receiving, via an input/output interface, a multivariate mixed-variable time series data of a plurality of sensory observations, the cluster-labels associated with the multivariate mixed-variable time series data and a cluster-independent random noise, pre-processing, via one or more hardware processors, the received multivariate mixed-variable time series data, training, via a one or more hardware processors, a plurality of neural networks of a privacy preserving adversarial neural network architecture in two phases, providing, via the one or more hardware processors, a test data to generate a synthetic private dataset for data disclosure using the trained privacy preserving adversarial neural network architecture, and estimating, via the one or more hardware processors, identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset.

“It is to be understood that the foregoing general descriptions and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.”

The claims supplied by the inventors are:

“1. A processor-implemented method comprising steps of: receiving, via an input/output interface, a multivariate mixed-variable time series data of a plurality of sensory observations, cluster-labels associated with the multivariate mixed-variable time series data, and a cluster-independent random noise, wherein multivariate mixed-variable time series data comprises continuous and discrete feature variables; pre-processing, via one or more hardware processors, the received multivariate mixed-variable time series data, wherein the pre-processing comprising steps of: normalizing the continuous feature variables by bounding heterogeneous measurements between a predefined range using a min-max scaling technique; and transforming the discreate feature variables by representing as a sparse binary vector using a one-hot encoding technique. training, via the one or more hardware processors, a plurality of neural networks of a privacy preserving generative adversarial network (ppGAN) in a first phase and a second phase using pre-processed multivariate mixed-variable time series data, wherein the plurality of neural networks including an embedding neural network, a recovery neural network, a generator neural network, a critic neural network and a discriminator neural network; providing, via the one or more hardware processors, a test data to generate a synthetic private dataset for data disclosure using the trained plurality of neural networks of the ppGAN; and estimating, via the one or more hardware processors, an identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset, wherein the estimation satisfies a predefined process-identifiability criteria.

“2. The processor-implemented method of claim 1, wherein the first phase training of the plurality of neural networks of the ppGAN comprising steps of: training, via the one or more hardware processors, the embedding neural network using a predefined low-dimensional mixed feature training dataset to obtain a high-dimensional mixed feature embeddings; training, via the one or more hardware processors, the recovery neural network using the obtained high-dimensional mixed feature embeddings to reconstruct the low-dimensional mixed feature dataset; training, via the one or more hardware processors, the supervisor neural network using the obtained high-dimensional mixed-feature embeddings for a single step ahead predictions of the high-dimensional mixed-feature embeddings, wherein the supervisor neural network is utilized to model a temporal dynamics of the low-dimensional mixed feature training dataset; and training, via the one or more hardware processors, the critic neural network using the high-dimensional mixed feature embeddings to predict a target high-dimensional feature embedding, wherein the critic neural network is utilized to model the relationship between independent and dependent variables of the low-dimensional mixed feature training dataset.

“3. The processor-implemented method of claim 1, wherein a second phase training of the plurality of neural networks of a ppGAN comprising steps of: transforming, via the one or more hardware processors, cluster-independent random noise using one or more cluster-labels associated with a predefined training dataset to obtain a cluster-dependent random noise; performing, via the one or more hardware processors, a linear transformation on a concatenation of the low-dimensional mixed feature training dataset and the cluster-dependent random noise to obtain a synthetic-private noise; training, via the one or more hardware processors, the generator neural network using the obtained synthetic-private noise to obtain a high-dimensional synthetic-private mixed feature embeddings; training, via the one or more hardware processors, the critic neural network using the high-dimensional synthetic-private mixed feature embeddings to predict the synthetic-private target feature embedding; training, via the one or more hardware processors, the discriminator neural network using the high-dimensional synthetic-private mixed feature embeddings to assign a label, wherein the discriminator neural network classifies the high-dimensional synthetic-private mixed feature embeddings as fake; training, via the one or more hardware processors, the supervisory neural network using the high-dimensional synthetic-private mixed feature embeddings to generate a single-step ahead predictions of the high-dimensional synthetic-private mixed feature embeddings; and training, via the one or more hardware processors, the recovery neural network using the single-step ahead high-dimensional synthetic-private mixed feature embeddings to obtain the low-dimensional synthetic-private mixed feature dataset.

“4. The processor-implemented method of claim 1, wherein the low-dimensional mixed feature validation dataset is utilized for the hyper-parameter tuning of the ppGAN.

“5. A system comprising: an input/output interface to a multivariate mixed-variable time series data of a plurality of sensory observations, cluster-labels associated with the multivariate mixed-variable time series data, and a cluster-independent random noise, wherein multivariate mixed-variable time series data comprises continuous and discrete feature variables; a memory in communication with the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the memory to: pre-process the received multivariate mixed-variable time series data, wherein the pre-process includes normalizing the continuous feature variables by bounding heterogeneous measurements between a predefined range through a min-max scaling technique, and transforming discreate feature variables by representing as a sparse binary vector through a one-hot encoding technique; train a plurality of neural networks of a privacy preserving generative adversarial network (ppGAN) in a first phase and a second phase using pre-processed multivariate mixed-variable time series data, wherein the plurality of neural networks including an embedding neural network, a recovery neural network, a generator neural network, a critic neural network, and a discriminator neural network; provide a test data to generate a synthetic private dataset for data disclosure using the trained plurality of neural networks of the ppGAN; and estimate an identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset, wherein the estimation satisfies a predefined process-identifiability criteria.

“6. The system of claim 5, wherein the low-dimensional mixed feature validation dataset is utilized for the hyper-parameter tuning of the ppGAN.

“7. A non-transitory computer readable medium storing one or more instructions which when executed by one or more processors on a system, cause the one or more processors to perform method comprising steps of: receiving, via an input/output interface, a multivariate mixed-variable time series data of a plurality of sensory observations, cluster-labels associated with the multivariate mixed-variable time series data, and a cluster-independent random noise, wherein multivariate mixed-variable time series data comprises continuous and discrete feature variables; pre-processing, via one or more hardware processors, the received multivariate mixed-variable time series data, wherein the pre-processing comprising steps of: normalizing the continuous feature variables by bounding heterogeneous measurements between a predefined range through a min-max scaling technique; and transforming the discreate feature variables by representing as a sparse binary vector using a one-hot encoding technique. training, via the one or more hardware processors, a plurality of neural networks of a privacy preserving generative adversarial network (ppGAN) in a first phase and a second phase using pre-processed multivariate mixed-variable time series data, wherein the plurality of neural networks including an embedding neural network, a recovery neural network, a generator neural network, a critic neural network, and a discriminator neural network; providing, via the one or more hardware processors, a test data to generate a synthetic private dataset for data disclosure using the trained plurality of neural networks of the ppGAN; and estimating, via the one or more hardware processors, an identifiability of the multivariate mixed-variable time series data from the generated synthetic private dataset, wherein the estimation satisfies a predefined process-identifiability criteria.”

URL and more information on this patent application, see: RUNKANA, VENKATARAMANA; SAKHINANA, SAGAR SRINIVAS; SARKAR, RAJAT KUMAR. Privacy Preserving Generative Mechanism For Industrial Time-Series Data Disclosure. U.S. Patent Application Number 20230281427, filed May 31, 2022 and posted September 7, 2023. Patent URL (for desktop use only): https://ppubs.uspto.gov/pubwebapp/external.html?q=(20230281427)&db=US-PGPUB&type=ids

(Our reports deliver fact-based news of research and discoveries from around the world.)

“Privacy Preserving Generative Mechanism For Industrial Time-Series Data Disclosure” in Patent Application Approval Process (USPTO 20230281427): Tata Consultancy Services Limited

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

“Privacy Preserving Generative Mechanism For Industrial Time-Series Data Disclosure” in Patent Application Approval Process (USPTO 20230281427): Tata Consultancy Services Limited

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account