“Systems And Methods For Computing With Private Healthcare Data” in Patent Application Approval Process (USPTO 20240119176): Nference Inc.

2024 MAY 01 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- A patent application by the inventors ARAVAMUDAN, Murali (Andover, MA, US); ARDHANARI, Sankar (Chapel Hill, NC, US); MURUGADOSS, Karthik (Cambridge, MA, US); RAJASEKHARAN, Ajit (West Windsor, NJ, US), filed on October 19, 2023, was made available online on April 11, 2024, according to news reporting originating from Washington, D.C., by NewsRx correspondents.

This patent application is assigned to Nference Inc. (Cambridge, Massachusetts, United States).

The following quote was obtained by the news editors from the background information supplied by the inventors: “Hospitals, healthcare providers and care givers collect large amounts of data from patients. It is a necessary part of the processes by which healthcare is provided to members of the public. Typically, a patient provides data to the care giver as a part of receiving treatment for his/her ailments. This data is stored by the care giver and may be used later, inter alia, for research purposes. In another typical scenario data may be collected from consumers via one or more devices, e.g., pulse oximeter, glucose monitor, smart watch, fitness bracelet, etc. In such use cases, the collected data is often used to analyze a patient’s health in a continuous manner or over a period of time. Consequently, huge amounts of patient information may be accumulated by service providers.

“Many aspects of patient data collected by care givers and service providers may be subject to privacy regulations. The usefulness and benefit of processing data collected from patients is clear and acknowledged by the public. However, there is a growing concern of maintaining the privacy of user data, particularly when the data can be used to identify the patient. Such concerns are the basis of HIPAA (Health Insurance Portability and Accountability Act) regulations initially passed in 1996 by the US Congress. Many other countries have also promulgated similar regulations and legislations. Generally, HIPAA and other regulations limit the release of personal information that may result in identification of members of the public or details of their physical attributes or biometric data.

“There is thus a need to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society.”

In addition to the background information obtained for this patent application, NewsRx journalists also obtained the inventors’ summary information for this patent application: “In an aspect, a de-identification method is disclosed. The de-identification method includes receiving a plurality of data sets, wherein the plurality of data sets includes a first data set, wherein the first data set includes a labeled data set for one or more entity types and a second data set, wherein the training data set includes an unlabeled data set for the one or more entity types, determining one machine-learning model from a plurality of machine-learning models for each of one or more entity types, fine-tuning the determined machine-learning model for each of the one or more entity types, wherein fine-tuning the determined machine-learning model includes creating a plurality of training data sets, wherein the plurality of training data sets includes a first training data set, wherein the first training data set includes the first data set and a second training data set, wherein the second training data set includes the second data set, training the determined machine-learning model using the first training data set, validating the trained machine-learning model and updating the trained machine-learning model using the second training data set as a function of the validation and obfuscating the second data set using the fine-tuned machine-learning model.

“These and other aspects and features of non-limiting embodiments of the present invention will become apparent to those skilled in the art upon review of the following description of specific non-limiting embodiments of the invention in conjunction with the accompanying drawings.

“The drawings are not necessarily to scale and may be illustrated by phantom lines, diagrammatic representations and fragmentary views. In certain instances, details that are not necessary for an understanding of the embodiments or that render other details difficult to perceive may have been omitted.”

The claims supplied by the inventors are:

“1. A de-identification method comprising: receiving a plurality of data sets, wherein the plurality of data sets comprises: a first data set, wherein the first data set comprises a labeled data set for one or more entity types; and a second data set, wherein the training data set comprises an unlabeled data set for the one or more entity types; determining one machine-learning model from a plurality of machine-learning models for each of one or more entity types; fine-tuning the determined machine-learning model for each of the one or more entity types, wherein fine-tuning the determined machine-learning model comprises: creating a plurality of training data sets, wherein the plurality of training data sets comprises: a first training data set, wherein the first training data set comprises the first data set; and a second training data set, wherein the second training data set comprises the second data set; training the determined machine-learning model using the first training data set; validating the trained machine-learning model; and updating the trained machine-learning model using the second training data set as a function of the validation; and obfuscating the second data set using the fine-tuned machine-learning model.

“2. The de-identification method of claim 1, wherein obfuscating the second data set further comprises replacing two or more entities that refer to a common subject with a common surrogate.

“3. The de-identification method of claim 2, further comprising: selecting the common surrogate based one or more attributes of the two or more entities.

“4. The de-identification method of claim 2, further comprising: selecting the common surrogate based on a gender associated with the two or more entities.

“5. The de-identification method of claim 2, further comprising: selecting the common surrogate based on an ethnicity associated with the two or more entities.

“6. The de-identification method of claim 1, wherein validating the trained machine learning model further comprises: generating a recall score for each entity type of the one or more entity types; comparing the recall score to a threshold for the recall score for each entity type of the one or more entity types.

“7. The de-identification method of claim 1, wherein validating the trained machine learning model further comprises: generating a precision score for each entity type of the one or more entity types; and comparing the precision score to a threshold for the precision score for each entity type of the one or more entity types.

“8. The de-identification method of claim 1, wherein validating the trained machine learning model further comprises: generating a F-score for each entity type of the one or more entity types; and comparing the F-score to a threshold for the F-score for each entity type of the one or more entity types.

“9. The de-identification method of claim 1, further comprising: updating the trained machine-learning model as a function of comparison of an average of F-score, precision score and recall score to a threshold success percentage.

“10. The de-identification method of claim 1, wherein: the one or more entity types comprises two or more personal names; and obfuscating the second data set further comprises replacing each of the two or more personal names with a different surrogate.

“11. The de-identification method of claim 1, wherein obfuscating the second data set further comprises replacing two or more entities that refer to a common person with surrogates that match a gender associated with the common person.

“12. The de-identification method of claim 1, wherein obfuscating the second data set further comprises replacing two or more entity types that refer to a common person with surrogates that match an ethnicity associated with the common person.

“13. The de-identification method of claim 1, wherein: obfuscating the second data set further comprises replacing two or more entities that represent dates with surrogate dates, wherein: the surrogate dates are based on the two or more entity types altered by a random value.

“14. The de-identification method of claim 13, wherein dates associated with a common patient are altered by the same random value.

“15. The de-identification method of claim 1, wherein obfuscating the second data set further comprises scrambling two or more entities that represent numeric identifiers with random values to scramble the numeric identifiers.

“16. The de-identification method of claim 1, wherein the one or more entity types comprises at least a portion of an electronic health record.

“17. The de-identification method of claim 1, further comprising: receiving a text sequence; and tagging one or more entities in the text sequence.

“18. The de-identification method of claim 17, further comprising: aggregating the tagged entities from the text sequence; and passing the aggregated tagged entities through a one or more dreg filters, wherein each of the one or more dreg filters is configured to filter a corresponding entity type based on a rule-based template.

“19. The de-identification method of claim 18, further comprising, creating the rule-based template, wherein creating the rule-based template comprises: mapping each of the one or more portions of the text sequence to a corresponding syntax template; identifying a candidate syntax template based on a machine learning model that infers one or more candidate syntax templates based on the one or more portions of the text sequence; and creating the rule-based template from the candidate syntax template by replacing each of the one or more tagged entities in the portion of the text sequence corresponding to the candidate template with a corresponding syntax token.

“20. The de-identification method of claim 18, wherein each of the one or more dreg filters is further configured to filter the corresponding entity type based on a pattern-based filter.”

URL and more information on this patent application, see: ARAVAMUDAN, Murali; ARDHANARI, Sankar; MURUGADOSS, Karthik; RAJASEKHARAN, Ajit. Systems And Methods For Computing With Private Healthcare Data. U.S. Patent Application Number 20240119176, filed October 19, 2023 and posted April 11, 2024. Patent URL (for desktop use only): https://ppubs.uspto.gov/pubwebapp/external.html?q=(20240119176)&db=US-PGPUB&type=ids

(Our reports deliver fact-based news of research and discoveries from around the world.)

“Systems And Methods For Computing With Private Healthcare Data” in Patent Application Approval Process (USPTO 20240119176): Nference Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

“Systems And Methods For Computing With Private Healthcare Data” in Patent Application Approval Process (USPTO 20240119176): Nference Inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account