Patent Issued for Systems and methods for computing with private healthcare data (USPTO 11487902): nference inc.

2022 NOV 22 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- From Alexandria, Virginia, NewsRx journalists report that a patent by the inventors Aravamudan, Murali (Andover, MA, US), Ardhanari, Sankar (Andover, MA, US), Murugadoss, Karthik (Cambridge, MA, US), Rajasekharan, Ajit (West Windsor, NJ, US), filed on March 4, 2021, was published online on November 1, 2022.

The patent’s assignee for patent number 11487902 is nference inc. (Cambridge, Massachusetts, United States).

News editors obtained the following quote from the background information supplied by the inventors: “Hospitals, healthcare providers and care givers collect large amounts of data from patients. It is a necessary part of the processes by which healthcare is provided to members of the public. Typically, a patient provides data to the care giver as a part of receiving treatment for his/her ailments. This data is stored by the care giver and may be used later, inter alia, for research purposes. In another typical scenario data may be collected from consumers via one or more devices, e.g., pulse oximeter, glucose monitor, smart watch, fitness bracelet, etc. In such use cases, the collected data is often used to analyze a patient’s health in a continuous manner or over a period of time. Consequently, huge amounts of patient information may be accumulated by service providers.

“Many aspects of patient data collected by care givers and service providers may be subject to privacy regulations. The usefulness and benefit of processing data collected from patients is clear and acknowledged by the public. However, there is a growing concern of maintaining the privacy of user data, particularly when the data can be used to identify the patient. Such concerns are the basis of HIPAA (Health Insurance Portability and Accountability Act) regulations initially passed in 1996 by the US Congress. Many other countries have also promulgated similar regulations and legislations. Generally, HIPAA and other regulations limit the release of personal information that may result in identification of members of the public or details of their physical attributes or biometric data.

“There is thus a need to enable biomedical (and other types of) data to be analyzed by computational processes under the constraint of maintaining the privacy of the individual patient or consumer. Such a system and methods will consequently be of great commercial, social and scientific benefit to society.

“Various objectives, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.”

As a supplement to the background information on this patent, NewsRx correspondents also obtained the inventors’ summary information for this patent: “A truly astonishing amount of information has been collected from patients and consumers pertaining to their health status, habits, environment, surroundings, and homes. Increasingly, this information is being processed by computer programs utilizing machine learning and artificial intelligence models. Such computer programs have shown remarkable progress in analyzing and predicting consumer health status, incidence and treatment of diseases, user behavior, etc. Furthermore, since the collected data may contain patient biometric and other personal identification attributes, there is a growing concern that such computer programs may allow the identities of patients and consumers to be learned. Accordingly, enterprises interested in analyzing healthcare data containing private attributes are concerned with maintaining privacy of individuals and observing the relevant regulations pertaining to private and personal data, such as HIPAA (Health Insurance Portability and Accountability Act 1996) regulations.

“In addition to HIPAA, many other regulations have been enacted in various jurisdictions, such as GDPR (General Data Protection Regulations) in the European Union, PSD2 (Revised Payment Services Directive), CCPA (California Consumer Privacy Act 2018), etc.

“In the following descriptions, the terms “user information,” personal information,” “personal health information (“PHI”),” “healthcare information data or records,” “identifying information,” and PII (Personally Identifiable Information) may be used interchangeably. Likewise, the terms “electronic health records (“EHR”)” and “data records” may be used interchangeably.

“One approach to handling private data is to encrypt all the records of a dataset. Encrypted text is sometimes referred to as ciphertext; decrypted text is also referred to as plaintext. Encryption may be described, by way of analogy, as putting the records of the dataset in a locked box. Access to the records of the locked box is then controlled by the key to the locked box. The idea is that only authorized entities are allowed access to the (decryption) key.

“Some regulations (e.g., HIPAA) require that healthcare data be stored in encrypted form. This is also sometimes referred to as “encryption at rest.””

The claims supplied by the inventors are:

“1. A de-identification method comprising: receiving a text sequence; providing the text sequence to a plurality of entity tagging models, each of the plurality of entity tagging models being trained to tag one or more portions of the text sequence having a corresponding entity type; tagging one or more entities in the text sequence using the plurality of entity tagging models; aggregating tagged entities from the text sequence identified by the plurality of entity tagging models; passing the aggregated tagged entities through a one or more dreg filters, each of the one or more dreg filters being configured to filter a corresponding entity type based on at least one of a rule-based template or a pattern matching filter, wherein the rule-based template is created by: mapping each of the one or more portions of the text sequence to a corresponding syntax template; identifying a candidate syntax template based on a machine learning model that infers one or more candidate syntax templates based on the one or more portions of the text sequence; and creating the rule-based template from the candidate syntax template by replacing each of the one or more tagged entities in the portion of the text sequence corresponding to the candidate template with a corresponding syntax token; and obfuscating each entity among the one or more tagged entities by replacing the entity with a surrogate, the surrogate being selected based on one or more attributes of the entity and maintaining characteristics similar to the entity being replaced.

“2. The de-identification method of claim 1, wherein obfuscating each entity among the one or more tagged entities comprises replacing two or more entities that refer to a common subject with a common surrogate.

“3. The de-identification method of claim 2, wherein the common surrogate is selected based one or more attributes of the two or more entities.

“4. The de-identification method of claim 2, wherein the common surrogate is selected based on a gender associated with the two or more entities.

“5. The de-identification method of claim 2, wherein the common surrogate is selected based on an ethnicity associated with the two or more entities.

“6. The de-identification method of claim 1, wherein the tagging one or more entities comprises tagging two or more personal names; and the obfuscating each entity among the one or more tagged entities comprises replacing each of the two or more personal names with a different surrogate.

“7. The de-identification method of claim 1, wherein the obfuscating each entity among the one or more tagged entities comprises replacing two or more entities that refer to a common person with surrogates that match a gender associated with the common person.

“8. The de-identification method of claim 1, wherein the obfuscating each entity among the one or more tagged entities comprises replacing two or more entities that refer to a common person with surrogates that match an ethnicity associated with the common person.

“9. The de-identification method of claim 1, wherein the obfuscating each entity among the one or more tagged entities comprises replacing two or more tagged entities that represent dates with surrogate dates, and wherein the surrogate dates are based on the two or more tagged entities altered by a random value.

“10. The de-identification method of claim 9, wherein dates associated with a common patient are altered by the same random value.

“11. The de-identification method of claim 1, wherein the obfuscating each entity among the one or more tagged entities comprises scrambling two or more entities that represent numeric identifiers with random values to scramble the numeric identifiers.

“12. The de-identification method of claim 1, wherein the text sequence comprises at least a portion of an electronic health record.

“13. The de-identification method of claim 1, wherein at least one of the plurality of entity tagging models is trained to tag entities of an entity type, the entity type including at least one of a personal name, an organization name, an age, a date, a time, a phone number, a pager number, a clinical identification number, an email address, an IP address, a web URL, a vehicle number, a physical address, a zip code, a social security number, or a date of birth.

“14. The de-identification method of claim 1, wherein at least one of the plurality of entity tagging models tags entities based on a rule-based algorithm.

“15. The de-identification method of claim 1, wherein at least one of the plurality of entity tagging models includes a machine learning model based on learning from sequences of text.

“16. The de-identification method of claim 1, further comprising whitelisting one or more portions of the text sequence, wherein the one or more whitelisted portions are not provided to the plurality of entity tagging models.

“17. The de-identification method of claim 1, each of the plurality of entity tagging models is trained to tag one or more portions of the text sequence to achieve a performance metric above a predetermined threshold.”

For additional information on this patent, see: Aravamudan, Murali. Systems and methods for computing with private healthcare data. U.S. Patent Number 11487902, filed March 4, 2021, and published online on November 1, 2022. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=11487902.PN.&OS=PN/11487902RS=PN/11487902

(Our reports deliver fact-based news of research and discoveries from around the world.)

Patent Issued for Systems and methods for computing with private healthcare data (USPTO 11487902): nference inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Patent Issued for Systems and methods for computing with private healthcare data (USPTO 11487902): nference inc.

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account