Researchers Submit Patent Application, “Referential Data Grouping And Tokenization For Longitudinal Use Of De-Identified Data”, for Approval (USPTO 20220343021): Patent Application

2022 NOV 16 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- From Washington, D.C., NewsRx journalists report that a patent application by the inventors Diamond, Jeffrey M. (Milton, GA, US); Luber, Jill (Atlanta, GA, US); Mortimer, Emily (Bloomington, MN, US); Morton, Charles Edwards (Austin, TX, US); Mullin, Brian Richard (Budd Lake, NJ, US); Sultan, Jay (Bogart, GA, US); Tavernini, Victor E. (Davie, FL, US); Valluri, Raghunandan (Cumming, GA, US); Wu, Theresa Roseanne (Winter Garden, FL, US), filed on April 20, 2022, was made available online on October 27, 2022.

No assignee for this patent application has been made.

News editors obtained the following quote from the background information supplied by the inventors: “Healthcare professionals utilize electronic medical records (EMR) and certain protected health information (PHI) to provide appropriate care for identified patients and to assess their treatments. PHI can include demographic information., medical histories, lab results, mental health evaluations, insurance information, and other data that can be used to identify an individual. Much of this data is highly protected by numerous regulations and other privacy rules, most notably the Health Insurance Portability and Accountability Act (HIPAA). Researches and policy makers often need to utilize clinical data (that can include PHI and/or EMR) for clinical trials, pandemic response studies, drug interaction studies, utilization reviews, establishment of guidelines, etc. However, such data is often siloed, isolated, and/or subject to privacy regulations that can collectively inhibit valuable clinical information that could be derived by sharing patient-specific information with other researchers or healthcare businesses.

“HIPAA and the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 regulate collected PHI data so that it is protected and not shared with anyone (except for certain exceptions). Such regulations also limit what organizations can do with the data in terms of marketing. Thus, the widespread use of healthcare data to drive insights and build a safer healthcare system has been severely limited due to the lack of solutions that enable sharing of data without running afoul of privacy rules.

“For over 20 years, healthcare businesses have relied on basic tokenization as a means to work with de-identified data in a manner that protects patient privacy. Clinical and research scientists use anonymized PHI to study health and healthcare trends. Researchers can use PHI that is stripped of identifying features and added anonymously to large databases of patient information for population health management efforts. The identifying parts of the data are typically removed and replaced with a “token” which is often a hash (one-way cryptographic function) of the PH values. Multiple fields can be hashed together to reduce reidentification risk. If different data sets from different sources are hashed identically, then the dataset can be linked by the common token value.

“There are numerous limitations in the current technology that are driven. by the limits of data and tokenization or matching methodology being used. Imagine two entities (trading partners) that want to exchange (trade) healthcare data, for example, to understand trends, costs, therapy outcomes, etc. Each trading partner has data to identify the patient in their associated records, but that data has two important limitations: (1) the identifying data elements/fields and methods used by each trading partner may be different; and (2) the values of data elements/fields possessed by each trading partner may not match for the same person.

“Current methods use common data attributes/elements/fields, such as first name, last name, address, zip code, date of birth, gender, etc., to create the tokens used to match data. The use of a social security number (SSN) could be used as an identifier, but SSN information is not always readily available, it can be prone to error (mis-keyed data) and it is often discouraged/prohibited by government regulation. Each trading partner might have different identity data available (i.e., trading partner A has last name and zip code, and trading partner B has last name and date of birth). To enable the broadest ability to combine data between such trading partners, it is necessary to create multiple tokens of different permutations of the identity data. Thus, an individual may have multiple tokens associated. with their identity:

“Token 1: first name+last name+address.

“Token 2: last name+address+zip code

“Token 3: last name+zip code

“Token 4: last name+zip code+date of birth

“-and so on, for numerous other permutations, hoping for a combination that is available and matches with the data possessed by the other trading partner.

“Furthermore, in addition to the different field values used, there are different methodologies used to translate (or hash) the values to a token. The same method (often the same software) must be used by both trading partners

“In a person’s life, the data used to identify them often changes many times. People change their name, through marriage or through use (John S. Doe becomes J. Steven Doe). People move to multiple addresses and multiple zip codes during their life. Current methods can use probabilistic matching to try to address such differences (Jon Doe may be the same as Jonathan Doe), but these methods have severe limits, especially with difficult cases, such as a father and son with the same name at the same address, or twins with similar names). Because it is dangerous to “assume too much” when doing probabilistic matching, the current methods create different token values, for example, John Doe at zip code 12345 and Jonathan Doe at zip code 12345 may have separate associated tokens although they may be the same person in reality. Once created this way, data for Jon Doe across these two addresses cannot be combined since they appear to be two separate people.

“The need to create numerous tokens using multiple methods is a problem that causes inefficiencies and inaccuracies in data matching because it is impractical to create a token for every possible permutation and method. Working with many different permutations leads to reduced coverage in any given combination of data sets, as a given data set may not have values for all the rows. For example, a dataset may have names for all records but addresses for only some. If an address is being used to link to another data set (because it is the only data field in common), then a smaller number of rows will be combined between the two data sets, greatly reducing their value and utility. Another problem that greatly reduces the value of the data exchange occurs when the data is not completely longitudinal.

“The goal of tokenization is to create associations of all records for a single person in the resulting data set. However, this goal is thwarted because current processes separate data about a single person across multiple tokens (due to changes or differences in a person’s identity data) that cannot be combined in the resulting data set.

“A need exists for a token that is patient-centric, that does not depend upon both trading partners having the same identity fields, can solve for differences or changes in a single person’s identity vales (such as name or address), and that can enable sharing of data without running afoul of privacy regulations. A need exists for patient-centric token systems and methods that can more efficiently harvest data from medical records, clinical trials, pandemic responses, etc. while protecting personally identifiable information.”

As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventors’ summary information for this patent application: “The disclosed technology utilizes tokenization as a method of replacing Personally Identifiable information (PII) and/or identifying Protected Health Information (PHI) and/or non-specific industry information with non-sensitive placeholder tokens. Certain implementations of the disclosed technology may de-identify (or anonymize) such data so that cannot be re-identified (tied back to a specific, identifiable individual) and so that it is no longer subject to HIPAA or many other privacy restrictions. Certain implementations of the disclosed technology may then can be linked to other de-identified data with a common token. Thus, tokenization allows entities to link data assets together or link external data assets with internal data assets without violating privacy rules.

“Certain exemplary implementations of the disclosed technology may utilize patient-centric tokenization and referential data to more quickly and efficiently harvest valuable information associated with clinical trials and/or post-trial evaluations. Certain exemplary implementations of the disclosed technology may utilize patient-centric tokenization and referential data to more effectively harvest valuable information associated with pandemic responses. Certain exemplary implementations of the disclosed technology may utilize pharmacy data for the prevention of adverse drug interactions. Certain exemplary implementations may use such data to detect and/or prevent prescription fraud. Certain implementations of the disclosed technology may be suitable for health care applications involving Protected Health information (PHI). Certain implementations of the disclosed technology may be utilized for non-industry-specific applications to protect Personally Identifiable information (PII).

“In accordance with certain exemplary implementations of the disclosed technology, a computer-implemented method is provided for the creation of a dataset that can be used for detecting and preventing potential adverse reactions of prescription drug combinations. The method can include receiving, at a main tokenizer, and from at a trusted 3^rd party in communication with one or more pharmacies, one or more corresponding data sets comprising: a subset of a plurality of PII fields corresponding to a patient seeking a prescription drug; and an identifier corresponding to the prescription drug. The method includes resolving, by the main tokenizer in communication with a universal reference database, an identity of the patient based on the subset of the plurality of PII fields, linking a unique patient-centric token (PCT) to the patient based on the resolving, and determining, based on the one or more data sets, one or more prescription drug combinations associated with the resolved identity of the patient. For each of the one or more prescription drug combinations, the method includes comparing the identifiers against safety data, and determining, based on the comparing, a potential adverse reaction associated with the one or more prescription drug combinations. The method includes outputting the PCT and an indication of the potential adverse reaction.

“A computer-implemented method is provided for detecting and preventing prescription drug fraud. The method includes receiving, at a main tokenizer, and from at a trusted 3^rd party in communication with one or more pharmacies, one or more corresponding data sets that can include a subset of a plurality of PII fields corresponding to a patient seeking a prescription drug prescribed by a physician, an identifier corresponding to the prescription drug, and an identifier corresponding to the physician. The method can include resolving, by the main tokenizer in communication with a universal reference database, an identity of the patient based on. the subset of the plurality of PII fields, linking a unique patient-centric token (PCT) to the patient based on the resolving, determining, based on the one or more data sets, one or more over-prescription conditions of the prescription drug associated with the resolved identity of the patient, and outputting the PCT and an indication of the over-prescription condition.

“Another computer-implemented method is provided. The method includes receiving, at a main tokenizer and from a first trading partner, a first data set comprising a first subset of a plurality of PII fields corresponding to an individual, receiving, at the main tokenizer and from a second trading partner, a second data set comprising a second subset of the plurality of PII fields corresponding to the individual, resolving, at the main tokenizer, the individual based on the first subset of the plurality of PII fields and the second subset of the plurality of PII fields, linking a unique patient-centric token (PCT) to the individual based on the resolving, and outputting the PCT for generating a non-PII token linked to the individual, wherein the non-PII token is linked to the universal identifier corresponding to the individual.

“In accordance with certain exemplary implementations of the disclosed technology, a computer-implemented method is provided for the creation of a dataset that can be used for evaluating post-clinical trials. The method can include receiving, at a main tokenizer, and from at a trusted 3^rd party, one or more corresponding data sets comprising: a subset of a plurality of PII fields corresponding to a patient in a clinical trial, post-clinical trial health record information, and an identifier corresponding to a treatment. The method includes resolving, by the main tokenizer in communication with a universal reference database, an identity of the patient based on the subset of the plurality of PII fields, linking a unique patient-centric token (PCT) to the patient based on the resolving, and determining, based on the one or more data sets, one or more outcomes associated with the resolved identity of the patient. For each of the one or more outcomes, the method includes determining the efficacy of the clinical trial. The method includes outputting the PCT and an indication of the efficacy,

“In accordance with certain exemplary implementations of the disclosed technology, a computer-implemented method is provided for the creation of a dataset that can be used for evaluating response to a public health issue, such as a pandemic. The method can include receiving, at a main tokenizer, and from a trusted 3^rd party, one or more corresponding data sets comprising: a subset of a plurality of PII fields corresponding to a patient who has received treatment, and an identifier corresponding to a treatment. The method includes resolving, by the main tokenizer in communication with a universal reference database, an identity of the patient based on the subset of the plurality of PII fields, linking a unique patient-centric token (PCT) to the patient based on the resolving, and determining, based on the one or more data sets, one or more treatment outcomes associated with the resolved identity of the patient. The method includes outputting the PCT and an indication of the one or more treatment outcomes.

“Other implementations, features, and aspects of the disclosed technology are described in detail herein and are considered a part of the claimed disclosed technology. Other implementations, features, and aspects can be understood with reference to the following detailed description, accompanying drawings, and claims.”

The claims supplied by the inventors are:

“1. A computer-implemented method for detecting and preventing potential adverse reactions of prescription drug combinations, the method comprising: receiving, at a main tokenizer, and from at a trusted 3^rd party in communication with one or more pharmacies, one or more corresponding data sets comprising: a subset of a plurality of personally identifiable information (PII) fields corresponding to a patient seeking a prescription drug; and an identifier corresponding to the prescription drug; resolving, by the main tokenizes in communication with a universal reference database, an identity of the patient based on the subset of the plurality of PII fields; linking a unique patient-centric token (PCT) to the patient based on the resolving; determining, based on the one or more data sets, one or more prescription drug combinations associated with the resolved identity of the patient; for each of the one or more prescription drug combinations: comparing the identifiers against safety data; determining, based on the comparing, a potential adverse reaction associated with the one or more prescription drug combinations; and outputting the PCT and an indication of the potential adverse reaction.

“2. The method of claim 1, further comprising linking, by the trusted 3^rd party, a non-PII token to the PCT corresponding to the patient.

“3. The method of claim 2, wherein the linking comprises: generating a new non-PII token when the patient is determined to be a new patient; and saving the new non-PII token in a repository.

“4. The method of claim 2, wherein the linking comprises retrieving a previously-generated non-PII token when the patient is determined to be a repeat patient.

“5. The method of claim 2, further comprising sending the non-PII token and the indication of the potential adverse reaction to the one or more phamacies.

“6. The method of claim 1, wherein the identifier corresponding to the prescription drug comprises one or more of: an imprint code; a generic name, a brand name, a chemical name, or a recommended international non-proprietary name.

“7. The method of claim 1, wherein the universal identifier is persistent.

“8. A computer-implemented method for detecting and preventing prescription drug fraud, the method comprising: receiving, at a main tokenizer, and from at a trusted 3^rd party in communication with one or more pharmacies, one or more corresponding data sets comprising: a subset of a plurality of personally identifiable information (PII) fields corresponding to a patient seeking a prescription drug prescribed by a physician; an identifier corresponding to the prescription drug; and an identifier corresponding to the physician; resolving, by the main tokenizer in communication with a universal reference database, an identity of the patient based on the subset of the plurality of PII fields; linking a unique patient-centric token (PCT) to the patient based on the resolving; determining, based on the one or more data sets, one or more over-prescription conditions of the prescription drug associated with the resolved identity of the patient; and outputting the PCT and an indication of the over-prescription condition.

“9. The method of claim 8, further comprising linking, by the trusted 3^rd party, a non-PII token to the PCT corresponding to the patient.

“10. The method of claim 9, wherein the linking comprises: generating a new non-PII token when the patient is determined to be a new patient; and saving the new non-PII token in a repository.

“11. The method of claim 8, wherein the linking comprises retrieving a previously-generated non-PII token when the patient is determined to be a repeat patient.

“12. The method of claim 8, further comprising sending the non-PII token and the indication of the over-prescription condition to the one or more pharmacies.

“13. The method of claim 8, wherein the identifier corresponding to the prescription drug comprises one or more of: an imprint code; a generic name, a brand name, a chemical name, or a recommended international non-proprietary name.

“14. The method of claim 1, wherein the universal identifier is persistent.

“15. A computer-implemented method, comprising: receiving, at a main tokenizer and from a first trading partner, a first data set comprising a first subset of a plurality of personally identifiable information (PII) fields corresponding to an individual; receiving, at the main tokenizer and from a second trading partner, a second data set comprising a second subset of the plurality of PII fields corresponding to the individual; resolving, by the main tokenizer in communication with a universal reference database, the individual based on the first subset of the plurality of PII fields and the second subset of the plurality of PII fields; linking a unique patient-centric token (PCT) to the individual based on the resolving; and outputting the PCT for generating a non-PII token linked to the individual, wherein the non-PII token is linked to the universal identifier corresponding to the individual.

“16. The method of claim 15, further comprising facilitating an exchange of the first subset of the plurality of PII fields and the second subset of the plurality of PII fields between the first trading partner and the second trading partner based on the non-PII token.

“17. The method of claim 16, wherein facilitating the exchange of the first subset of the plurality of PII fields and the second subset of the plurality of PII fields between the first trading partner and the second trading partner is further based on de-identified matched data.

“18. The method of claim 15, wherein the first subset is the same as the second subset.

“19. The method of claim 15, wherein the first subset differs from the second subset.

“20. The method of claim 15, wherein the non-PII token is linked to the universal identifier without requiring an SSN of the individual.

“21. The method of claim 15, wherein one or more variations of PII in the first subset and the second subset is resolved to the same universal identifier.

“22. The method of claim 15, wherein generating the non-PII token is based on one or more key identifying attributes comprising one or more of name, address, date of birth, and gender.

“23. The method of claim 15, wherein the universal identifier is longitudinally consistent and based on comprehensive knowledge of substantially an entire population of people.

“24. The method of claim 15, wherein the universal identifier is persistent.

“25. The method of claim 15, further comprising: receiving, at the main tokenizer and from the first trading partner, the non-PII token and the first subset of the plurality of PII fields corresponding to the individual; receiving, at the main tokenizes and from the second trading partner, the non-PII token and the second subset of the plurality of PII fields corresponding to the individual; generating, at the main tokenizer, and using the universal identifier, aggregated de-identified data corresponding to the individual based on the first subset of the plurality of PII fields, the second subset of the plurality of PII fields, and the non-PII token; and sending, to the first trading partner and the second trading partner, the non-PII token and the aggregated de-identified matched data corresponding to the individual.

“26. The method of claim 15, further comprising: generating de-identified matched data corresponding to the individual; and sending, to the first trading partner and the second trading partner, the non-PII token and the de-identified matched data corresponding to the first individual.”

For additional information on this patent application, see: Diamond, Jeffrey M.; Luber, Jill; Mortimer, Emily; Morton, Charles Edwards; Mullin, Brian Richard; Sultan, Jay; Tavernini, Victor E.; Valluri, Raghunandan; Wu, Theresa Roseanne. Referential Data Grouping And Tokenization For Longitudinal Use Of De-Identified Data. Filed April 20, 2022 and posted October 27, 2022. Patent URL: https://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220220343021%22.PGNR.&OS=DN/20220343021&RS=DN/20220343021

(Our reports deliver fact-based news of research and discoveries from around the world.)

Researchers Submit Patent Application, “Referential Data Grouping And Tokenization For Longitudinal Use Of De-Identified Data”, for Approval (USPTO 20220343021): Patent Application

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Researchers Submit Patent Application, “Referential Data Grouping And Tokenization For Longitudinal Use Of De-Identified Data”, for Approval (USPTO 20220343021): Patent Application

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account