Researchers Submit Patent Application, “Hospital Matching Of De-Identified Healthcare Databases Without Obvious Quasi-Identifiers”, for Approval (USPTO 20190147988)
2019 JUN 05 (NewsRx) -- By a
The patent’s assignee is
News editors obtained the following quote from the background information supplied by the inventors: “Numerous areas of healthcare research and development leverage healthcare databases containing data on medical patients. Medical histories or other clinical data, patient billing data, administrative records pertaining to matters such as hospital bed occupancy, and so forth are maintained by hospitals or other medical facilities and/or by individual units such as the cardiac care unit (CCU), intensive care unit (ICU), or emergency admittance department. These databases store sensitive patient data that generally must be maintained confidentially under financial and/or medical privacy laws such as (in
“To enable a patient database to be used for data analytics for clinical, hospital administrative, or other purposes while maintaining patient privacy, it is known to anonymize the database by removing patient-identifying information (PII). Information that needs to be anonymized includes patient name and/or medical identification number (suitably replaced by a randomly assigned number or the like), address, or so forth. Other anonymization measures may include removing ‘rare’ patients who might be identifiable by a combination of unusual characteristic for example, a patient who is 102 years old with a particular illness might be identified on the basis of that information alone.
“In addition to rare patients, a patient might be identifiable based on timestamp information for events recorded in the patient record. For example, if a patient is admitted to the hospital on a certain date with a certain condition, that information may be sufficient to narrow the number of possible patient identifications to a small number. However, longitudinal information, that is, the time sequence of events and the time intervals between various events, is sometimes useful in healthcare data analytics. For example, the time interval between admission and discharge may be useful or even critical for analyzing hospital efficiency and/or effectiveness of a certain treatment. To reduce the potential for using a timestamp to identify an anonymized patient while retaining the longitudinal information potentially of value for the healthcare data analysis, in some anonymized databases the timestamps are shifted by some random amount (generally different for each patient), using a rigid shift for all timestamped events of a given patient. The random rigid time shift in timestamps makes patient identification via timestamp more difficult, while the use particularly of a rigid time shift retains the longitudinal information, i.e. the time interval information between events.”
As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventors’ summary information for this patent application: “In one disclosed aspect, an anonymized healthcare data source device comprises at least one electronic processor programmed to integrate N anonymized healthcare databases (10) where N is a positive integer having a value of at least three by performing a database integration process including the operations of: for a pair of databases (i,j) of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases (i,j) and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables. The at least one electronic processor is further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.
“In another disclosed aspect, an anonymized healthcare data source device comprises at least one electronic processor programmed to integrate a healthcare database i and a healthcare database j by performing a database integration process including the operations of: for the pair of databases (i,j), identifying a set of features each contained in both databases i and j of the pair of databases (i,j) including at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events and generating a conversion table matching patients of the pair of databases (i,j) based on patient similarity measured by the set of features including comparison of the time interval .DELTA.t for patients in the two databases (i,j). The at least one electronic processor is further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in both anonymized healthcare databases (i,j) using the conversion table matching patients of the pair of databases (i,j). In another disclosed aspect, a non-transitory storage medium stores instructions readable and executable by a computer to perform an anonymized population image reconstruction method to reconstruct an anonymized population image from N anonymized healthcare databases where N is a positive integer having a value of at least two. The anonymized population image reconstruction method comprises: for a pair of databases (i,j) of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases (i,j) and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features. The identifying and generating operations are repeated for each unique pair of databases of the N anonymized healthcare databases to generate the anonymized population image comprising contents of the N anonymized healthcare databases integrated by the N(N-1)/2 conversion tables.
“One advantage resides in providing for integration of two, three, four, or more anonymized healthcare databases to leverage the combined data contained in the databases for healthcare data analytic tasks.
“Another advantage resides in providing for the foregoing in which one or more anonymized healthcare databases is an unstructured healthcare database.
“Another advantage resides in providing the foregoing in which longitudinal information, that is, time intervals between events, is leveraged in matching anonymized patients in different anonymized healthcare databases.
“A given embodiment may provide none, one, two, more, or all of the foregoing advantages, and/or may provide other advantages as will become apparent to one of ordinary skill in the art upon reading and understanding the present disclosure.”
The claims supplied by the inventors are:
“1. An anonymized healthcare data source device comprising: at least one electronic processor programmed to integrate N anonymized healthcare databases where N is a positive integer having a value of at least three by performing a database integration process including the operations of: for a pair of databases of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables; and the at least one electronic processor further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.
“2. The device of claim 1 wherein identifying the set of features for the pair of databases includes identifying features for which a feature accuracy metric satisfies a minimum accuracy for each anonymized healthcare database of the pair of databases.
“3. The device of claim 1 wherein retrieving the patient data contained in the N anonymized healthcare databases includes, for a query feature: if the query feature is contained in only one of the N anonymized healthcare databases then retrieving the query feature from the anonymized healthcare database containing the query feature; and if the query feature is contained in two or more of the N anonymized healthcare databases then generating a retrieved value for the query feature from the values of the query feature in the two or more of the N anonymized healthcare databases containing the query feature based on the feature accuracy metric for the query feature in the respective anonymized healthcare databases containing the query feature.
“4. The device of claim 1 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases.
“5. The device of claim 1 wherein the database integration process includes the further operation of refining the N(N-1)/2 conversion tables based on consistency of patient matching between the N(N-1)/2 conversion tables.
“6. The device of claim 5 wherein the refining does not use the identified sets of features.
“7. The device of claim 1 wherein the database integration process includes, for at least one pair of databases of the N anonymized healthcare databases: identifying at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events; and generating the conversion table matching patients of the pair of databases based in part on matching of the longitudinal feature including comparison of the time interval .DELTA.t for patients in the two databases.
“8. The device of claim 7 wherein generating the conversion table matching patients of the pair of databases based in part on matching of the longitudinal feature does not include comparison of timestamps of events for patients in the two databases.
“9. An anonymized healthcare data source device comprising: at least one electronic processor programmed to integrate a healthcare database i and a healthcare database j by performing a database integration process including the operations of: for the pair of databases, identifying a set of features each contained in both databases i and j of the pair of databases including at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features including comparison of the time interval .DELTA.t for patients in the two databases; the at least one electronic processor further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in both anonymized healthcare databases using the conversion table matching patients of the pair of databases.
“10. The device of claim 9 wherein generating the conversion table matching patients of the pair of databases based on patient similarity does not include comparison of timestamps of events for patients in the two databases.
“11. The device of claim 9 wherein: identifying the set of features includes identifying a set of non-longitudinal features contained in both databases i and j of the pair of databases and, for each patient in each database i and j, generating a universal identifier (UID) for the patient comprising a concatenation of values of the set of non-longitudinal features for the patient; and generating the conversion table includes generating the conversion table matching patients of the pair of databases based on patient similarity measured by the set of features further including comparison of the UIDs for patients in the two databases.
“12. The device of claim 9 wherein: identifying the set of features includes identifying at least one feature in at least one database of the pair of databases by performing natural language processing (NLP) on text content of patient records to extract the feature.
“13. The device of claim 9 wherein identifying the set of features each contained in both databases i and j of the pair of databases includes identifying features for which a feature accuracy metric satisfies a minimum accuracy for both the anonymized healthcare database i and the anonymized healthcare database j.
“14. The device of claim 9 wherein retrieving the patient data contained in both anonymized healthcare databases using the conversion table matching patients of the pair of databases includes, for a query feature: if the query feature is contained in only one database of the pair of anonymized healthcare databases then retrieving the query feature from the anonymized healthcare database containing the query feature; and if the query feature is contained in both databases of the pair of anonymized healthcare databases then generating a retrieved value for the query feature from the values of the query feature in the pair of anonymized healthcare databases based on the feature accuracy metric for the query feature in the respective anonymized healthcare databases containing the query feature.
“15. The device of claim 9 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases.
“16. The device of claim 9 wherein: the at least one electronic processor is programmed to integrate N databases including the anonymized healthcare database i, the anonymized healthcare database j, and at least one additional anonymized healthcare database by performing the database integration process including the further operation of repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables; and the at least one electronic processor is further programmed to perform the patient data retrieval process including the operations of receiving a patient ID of a patient in one of the anonymized healthcare databases and retrieving patient data for the patient contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.
“17. A non-transitory storage medium storing instructions readable and executable by a computer to perform an anonymized population image reconstruction method to reconstruct an anonymized population image from N anonymized healthcare databases where N is a positive integer having a value of at least two, the anonymized population image reconstruction method comprising: for a pair of databases of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; and repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate the anonymized population image comprising contents of the N anonymized healthcare databases integrated by the N(N-1)/2 conversion tables.
“18. The non-transitory storage medium of claim 17 wherein the stored instructions are readable and executable by a computer to further perform an anonymized population image data retrieval method including receiving an anonymized population data query and retrieving patient data responsive to the anonymized population data query from the anonymized population image using the N(N-1)/2 conversion tables.
“19. The non-transitory storage medium of claim 17 wherein N is a positive integer having a value of at least three.
“20. The non-transitory storage medium of claim 19 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases whereby each of the N(N-1)/2 conversion tables is an m.times.2 conversion table.
“21. (canceled)
“22. (canceled)
“23. (canceled)
“24. (canceled)”
For additional information on this patent application, see:
(Our reports deliver fact-based news of research and discoveries from around the world.)



Mullin urges flood victims to report in, defends Trump in face of questioning
Proposed Flood Hazard Determinations
Advisor News
- SEC manual shake-up: What every insurance advisor needs to know now
- Retirement moves to make before April 15
- Millennials are inheriting billions and they want to know what to do with it
- What Trump Accounts reveal about time and long-term wealth
- Wellmark still worries over lowered projections of Iowa tax hike
More Advisor NewsAnnuity News
- Variable annuity sales surge as market confidence remains high, Wink finds
- New Allianz Life Annuity Offers Added Flexibility in Income Benefits
- How to elevate annuity discussions during tax season
- Life Insurance and Annuity Providers Score High Marks from Financial Pros, but Lag on User Friendliness, JD Power Finds
- An Application for the Trademark “TACTICAL WEIGHTING” Has Been Filed by Great-West Life & Annuity Insurance Company: Great-West Life & Annuity Insurance Company
More Annuity NewsHealth/Employee Benefits News
- Tulane University Researchers Describe New Findings in Oral Cancer (Nationwide oral cancer screening and rural-urban disparities in oral cancer diagnosis, treatment and mortality: a population-based cohort study in Taiwan): Oncology – Oral Cancer
- Findings from University of Florida Provides New Data about Insurance (Barriers To Insurance Innovation): Insurance
- Data on Managed Care Reported by Researchers at Harvard Medical School (Year 1 Impact of Offering Non-Emergency Medical Transportation on Care Utilization Among Low-Income and Disabled Beneficiaries in Medicare Advantage): Managed Care
- Investigators from Harvard University Target Managed Care (Fluctuating State Medicaid Dental Coverage: Asymmetric Impact of Benefit Cuts and Expansions, 2010-21): Managed Care
- Research Conducted at Harvard University School of Dental Medicine Has Provided New Information about Health and Medicine (Dental Coverage Through Medicaid Managed Care vs Fee-for-Service): Health and Medicine
More Health/Employee Benefits NewsLife Insurance News
- Best’s Special Report: US Life/Health Insurance Industry Sees Impairments Halved in 2024
- Jackson Study Exposes Stark Disconnect Between Anticipation of Policy Change and Retirement Planning Conversations
- Thrivent plans to add 600 advisors this year
- Third Federal Named a top Financial Services Company by USA TODAY
- New Allianz Life Annuity Offers Added Flexibility in Income Benefits
More Life Insurance News