Newswires

June 5, 2019 Newswires

Researchers Submit Patent Application, “Hospital Matching Of De-Identified Healthcare Databases Without Obvious Quasi-Identifiers”, for Approval (USPTO 20190147988)

Hospital & Nursing Home Daily

2019 JUN 05 (NewsRx) -- By a News Reporter-Staff News Editor at Hospital & Nursing Home Daily -- From Washington, D.C., NewsRx journalists report that a patent application by the inventors SHARIFI SEDEH, Reza (Malden, MA); ELGORT, Daniel Robert (New York, NY); TRUYEN, Roel (Turnhout, BE), filed on April 19, 2017, was made available online on May 16, 2019.

The patent’s assignee is Koninklijke Philips N.V. (Eindhoven, Netherlands).

News editors obtained the following quote from the background information supplied by the inventors: “Numerous areas of healthcare research and development leverage healthcare databases containing data on medical patients. Medical histories or other clinical data, patient billing data, administrative records pertaining to matters such as hospital bed occupancy, and so forth are maintained by hospitals or other medical facilities and/or by individual units such as the cardiac care unit (CCU), intensive care unit (ICU), or emergency admittance department. These databases store sensitive patient data that generally must be maintained confidentially under financial and/or medical privacy laws such as (in the United States) the Health Insurance Portability and Accountability Act (HIPAA).

“To enable a patient database to be used for data analytics for clinical, hospital administrative, or other purposes while maintaining patient privacy, it is known to anonymize the database by removing patient-identifying information (PII). Information that needs to be anonymized includes patient name and/or medical identification number (suitably replaced by a randomly assigned number or the like), address, or so forth. Other anonymization measures may include removing ‘rare’ patients who might be identifiable by a combination of unusual characteristic for example, a patient who is 102 years old with a particular illness might be identified on the basis of that information alone.

“In addition to rare patients, a patient might be identifiable based on timestamp information for events recorded in the patient record. For example, if a patient is admitted to the hospital on a certain date with a certain condition, that information may be sufficient to narrow the number of possible patient identifications to a small number. However, longitudinal information, that is, the time sequence of events and the time intervals between various events, is sometimes useful in healthcare data analytics. For example, the time interval between admission and discharge may be useful or even critical for analyzing hospital efficiency and/or effectiveness of a certain treatment. To reduce the potential for using a timestamp to identify an anonymized patient while retaining the longitudinal information potentially of value for the healthcare data analysis, in some anonymized databases the timestamps are shifted by some random amount (generally different for each patient), using a rigid shift for all timestamped events of a given patient. The random rigid time shift in timestamps makes patient identification via timestamp more difficult, while the use particularly of a rigid time shift retains the longitudinal information, i.e. the time interval information between events.”

As a supplement to the background information on this patent application, NewsRx correspondents also obtained the inventors’ summary information for this patent application: “In one disclosed aspect, an anonymized healthcare data source device comprises at least one electronic processor programmed to integrate N anonymized healthcare databases (10) where N is a positive integer having a value of at least three by performing a database integration process including the operations of: for a pair of databases (i,j) of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases (i,j) and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables. The at least one electronic processor is further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.

“In another disclosed aspect, an anonymized healthcare data source device comprises at least one electronic processor programmed to integrate a healthcare database i and a healthcare database j by performing a database integration process including the operations of: for the pair of databases (i,j), identifying a set of features each contained in both databases i and j of the pair of databases (i,j) including at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events and generating a conversion table matching patients of the pair of databases (i,j) based on patient similarity measured by the set of features including comparison of the time interval .DELTA.t for patients in the two databases (i,j). The at least one electronic processor is further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in both anonymized healthcare databases (i,j) using the conversion table matching patients of the pair of databases (i,j). In another disclosed aspect, a non-transitory storage medium stores instructions readable and executable by a computer to perform an anonymized population image reconstruction method to reconstruct an anonymized population image from N anonymized healthcare databases where N is a positive integer having a value of at least two. The anonymized population image reconstruction method comprises: for a pair of databases (i,j) of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases (i,j) and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features. The identifying and generating operations are repeated for each unique pair of databases of the N anonymized healthcare databases to generate the anonymized population image comprising contents of the N anonymized healthcare databases integrated by the N(N-1)/2 conversion tables.

“One advantage resides in providing for integration of two, three, four, or more anonymized healthcare databases to leverage the combined data contained in the databases for healthcare data analytic tasks.

“Another advantage resides in providing for the foregoing in which one or more anonymized healthcare databases is an unstructured healthcare database.

“Another advantage resides in providing the foregoing in which longitudinal information, that is, time intervals between events, is leveraged in matching anonymized patients in different anonymized healthcare databases.

“A given embodiment may provide none, one, two, more, or all of the foregoing advantages, and/or may provide other advantages as will become apparent to one of ordinary skill in the art upon reading and understanding the present disclosure.”

The claims supplied by the inventors are:

“1. An anonymized healthcare data source device comprising: at least one electronic processor programmed to integrate N anonymized healthcare databases where N is a positive integer having a value of at least three by performing a database integration process including the operations of: for a pair of databases of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables; and the at least one electronic processor further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.

“2. The device of claim 1 wherein identifying the set of features for the pair of databases includes identifying features for which a feature accuracy metric satisfies a minimum accuracy for each anonymized healthcare database of the pair of databases.

“3. The device of claim 1 wherein retrieving the patient data contained in the N anonymized healthcare databases includes, for a query feature: if the query feature is contained in only one of the N anonymized healthcare databases then retrieving the query feature from the anonymized healthcare database containing the query feature; and if the query feature is contained in two or more of the N anonymized healthcare databases then generating a retrieved value for the query feature from the values of the query feature in the two or more of the N anonymized healthcare databases containing the query feature based on the feature accuracy metric for the query feature in the respective anonymized healthcare databases containing the query feature.

“4. The device of claim 1 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases.

“5. The device of claim 1 wherein the database integration process includes the further operation of refining the N(N-1)/2 conversion tables based on consistency of patient matching between the N(N-1)/2 conversion tables.

“6. The device of claim 5 wherein the refining does not use the identified sets of features.

“7. The device of claim 1 wherein the database integration process includes, for at least one pair of databases of the N anonymized healthcare databases: identifying at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events; and generating the conversion table matching patients of the pair of databases based in part on matching of the longitudinal feature including comparison of the time interval .DELTA.t for patients in the two databases.

“8. The device of claim 7 wherein generating the conversion table matching patients of the pair of databases based in part on matching of the longitudinal feature does not include comparison of timestamps of events for patients in the two databases.

“9. An anonymized healthcare data source device comprising: at least one electronic processor programmed to integrate a healthcare database i and a healthcare database j by performing a database integration process including the operations of: for the pair of databases, identifying a set of features each contained in both databases i and j of the pair of databases including at least one longitudinal feature defined by a pair of timestamped events separated by a time interval .DELTA.t between the timestamps of the events and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features including comparison of the time interval .DELTA.t for patients in the two databases; the at least one electronic processor further programmed to perform a patient data retrieval process including the operation of retrieving patient data for one or more anonymized patients contained in both anonymized healthcare databases using the conversion table matching patients of the pair of databases.

“10. The device of claim 9 wherein generating the conversion table matching patients of the pair of databases based on patient similarity does not include comparison of timestamps of events for patients in the two databases.

“11. The device of claim 9 wherein: identifying the set of features includes identifying a set of non-longitudinal features contained in both databases i and j of the pair of databases and, for each patient in each database i and j, generating a universal identifier (UID) for the patient comprising a concatenation of values of the set of non-longitudinal features for the patient; and generating the conversion table includes generating the conversion table matching patients of the pair of databases based on patient similarity measured by the set of features further including comparison of the UIDs for patients in the two databases.

“12. The device of claim 9 wherein: identifying the set of features includes identifying at least one feature in at least one database of the pair of databases by performing natural language processing (NLP) on text content of patient records to extract the feature.

“13. The device of claim 9 wherein identifying the set of features each contained in both databases i and j of the pair of databases includes identifying features for which a feature accuracy metric satisfies a minimum accuracy for both the anonymized healthcare database i and the anonymized healthcare database j.

“14. The device of claim 9 wherein retrieving the patient data contained in both anonymized healthcare databases using the conversion table matching patients of the pair of databases includes, for a query feature: if the query feature is contained in only one database of the pair of anonymized healthcare databases then retrieving the query feature from the anonymized healthcare database containing the query feature; and if the query feature is contained in both databases of the pair of anonymized healthcare databases then generating a retrieved value for the query feature from the values of the query feature in the pair of anonymized healthcare databases based on the feature accuracy metric for the query feature in the respective anonymized healthcare databases containing the query feature.

“15. The device of claim 9 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases.

“16. The device of claim 9 wherein: the at least one electronic processor is programmed to integrate N databases including the anonymized healthcare database i, the anonymized healthcare database j, and at least one additional anonymized healthcare database by performing the database integration process including the further operation of repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate N(N-1)/2 conversion tables; and the at least one electronic processor is further programmed to perform the patient data retrieval process including the operations of receiving a patient ID of a patient in one of the anonymized healthcare databases and retrieving patient data for the patient contained in the N anonymized healthcare databases using the N(N-1)/2 conversion tables.

“17. A non-transitory storage medium storing instructions readable and executable by a computer to perform an anonymized population image reconstruction method to reconstruct an anonymized population image from N anonymized healthcare databases where N is a positive integer having a value of at least two, the anonymized population image reconstruction method comprising: for a pair of databases of the N anonymized healthcare databases, identifying a set of features each contained in both databases i and j of the pair of databases and generating a conversion table matching patients of the pair of databases based on patient similarity measured by the set of features; and repeating the identifying and generating operations for each unique pair of databases of the N anonymized healthcare databases to generate the anonymized population image comprising contents of the N anonymized healthcare databases integrated by the N(N-1)/2 conversion tables.

“18. The non-transitory storage medium of claim 17 wherein the stored instructions are readable and executable by a computer to further perform an anonymized population image data retrieval method including receiving an anonymized population data query and retrieving patient data responsive to the anonymized population data query from the anonymized population image using the N(N-1)/2 conversion tables.

“19. The non-transitory storage medium of claim 17 wherein N is a positive integer having a value of at least three.

“20. The non-transitory storage medium of claim 19 wherein generating the conversion table includes generating an m.times.2 conversion table where m is the number of patients matched in the pair of databases whereby each of the N(N-1)/2 conversion tables is an m.times.2 conversion table.

“21. (canceled)

“22. (canceled)

“23. (canceled)

“24. (canceled)”

For additional information on this patent application, see: SHARIFI SEDEH, Reza; ELGORT, Daniel Robert; TRUYEN, Roel. Hospital Matching Of De-Identified Healthcare Databases Without Obvious Quasi-Identifiers. Filed April 19, 2017 and posted May 16, 2019. Patent URL: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220190147988%22.PGNR.&OS=DN/20190147988&RS=DN/20190147988

(Our reports deliver fact-based news of research and discoveries from around the world.)

Older

Mullin urges flood victims to report in, defends Trump in face of questioning

Newer

Proposed Flood Hazard Determinations

Advisor News

More Advisor News

Annuity News

More Annuity News

Health/Employee Benefits News

More Health/Employee Benefits News

Life Insurance News

More Life Insurance News

Researchers Submit Patent Application, “Hospital Matching Of De-Identified Healthcare Databases Without Obvious Quasi-Identifiers”, for Approval (USPTO 20190147988)

Advisor News

Annuity News

Health/Employee Benefits News

Life Insurance News

Sign in with your Insider Pro Account