Patent Issued for System and method for computerized synthesis of simulated health data (USPTO 11205504): Cardinal Health Commercial Technologies LLC
2022 JAN 12 (NewsRx) -- By a
Patent number 11205504 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “In the field of healthcare, electronic health record (EHR) data refers to digital collections of patient and population health information across time. Depending on the source, EHR may include a variety of metrics such as demographics, medical history, laboratory results, and billing information. EHR data provides an extensive data source for healthcare research and quality improvement. For example, policy researchers may evaluate the spread of an epidemic across a region to identify future emergency response strategies. In other cases, a large hospital may improve patient care by identifying the best practices to treat a particular disease. Many other potential benefits of using EHR data for research exist.
“In practice, EHR data frequently includes or is associated with Protected Health Information (PHI) data. PHI data may include any information that links health status, payment, or treatment to a specific individual. For example, PHI data may include names, geographical identifiers, dates, and health insurance numbers. Due to government regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in
“De-identification methods such as k-anonymization are known and have been widely used to disassociate PHI data by ensuring that data from at least K individuals are undistinguishable. However, poor implementation of anonymization schemes, lack of widespread privacy guidelines, and inadvertent cybersecurity leaks of original PHI data can place organizations using EHR data at great regulatory risk. Moreover, many types of healthcare research, such as epidemiology, may require unaltered PHI data to gain insight onto individual patient outcomes. However, this may cause the difficulty of going through Institutional Review Boards (IRB) and further creates a risk of PHI data leaks.
“In addition to the complexity of PHI regulations and de-identification, the integration of data across multiple EHR sources has many technical challenges. EHR records are not always in consistent format across EHR providers, making data transformation and aggregation difficult. Tracking individual patients that frequently change healthcare providers may be impossible without access to any PHI identification attributes. Without evidence of a robust software system that can mitigate cybersecurity threats, healthcare organizations may be reluctant to share EHR data in fear of PHI data risks.
“Additionally, with the growing field of computer driven health models, EHR data may not provide researchers with sufficient information to analyze complex health outcomes. For example, due to inconsistencies in healthcare visits, EHR data may be time limited and only extend over the period of a few years, thus affecting any models that extend over a patient’s lifetime. Also, the amount of EHR data may be limited for rare diseases or conditions or nonexistent for unknown diseases and comorbidities. In these cases, it may be difficult to rely on EHR data to predict complex events or to extrapolate the health outcomes of patient populations over a lifetime.
“Consequently, a need exists for a system to address the shortcomings of using EHR data in medical research and quality improvement.”
In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “In one aspect, a method is disclosed. The method includes a computing device for (1) receiving at least one respective data model constructed at least in part from protected health information (PHI) hosted by a health data provider of a plurality of health data providers, the received data model containing no PHI, (2) creating at least one state transition machine from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one health attribute and at least one disease prevalence statistic, (3) generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating one or more health related events for each time step in the synthetic life and where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step, (4) calculating a similarity score by comparing the generated synthetic health data set to an actual health data set, where the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (5) adjusting the at least one received data model based on the score indicating similarity falling below a threshold similarity, thereby creating at least one adjusted data model.
“In another aspect, a computing system is disclosed. The system includes at least one health data repository constructed from protected health information (PHI) hosted by a health data provider of a plurality of health data providers. The system further includes a computing device containing at least one received data model constructed from the at least one health data repository, where the at least one received data model contains no PHI. The system also includes at least one state transition machine that is (1) constructed from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one demographic and at least one health attribute and at least one disease prevalence statistic, and (2) configured to generate a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating a one or more health related events for each time step in the synthetic life, where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step. The system also includes a processor configured to (1) calculate a similarity score by comparing the synthetic health data set to an actual health data set, wherein the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (2) adjust the at least one received data model based on the score indicating similarity falling below a threshold similarity, thereby creating at least one adjusted data model.
“In yet another aspect, a non-transitory computer readable medium is provided. The non-transitory computer readable medium has stored thereon instructions, that when executed by one or more processors of a computing device, cause the computing device to perform functions. The functions include: (1) receiving at least one respective data model constructed at least in part from protected health information (PHI) hosted by a health data provider of a plurality of health data providers, the received data model containing no PHI, (2) creating at least one state transition machine from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one health attribute and at least one disease prevalence statistic, (3) generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating one or more health related events for each time step in the synthetic life and where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step, (4) calculating a similarity score by comparing the generated synthetic health data set to an actual health data set, where the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (5) adjusting the at least one received data model based on the score indicating similarity falling below a threshold similarity, where adjusting the at least one received data model comprises representing the at least one received data model as at least one probabilistic graphical model and determining probabilities of the at least one probabilistic graphical model based on at least one health related event from the actual health data set, thereby creating at least one adjusted data model.
“The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the figures and the following detailed description and the accompanying drawings.”
The claims supplied by the inventors are:
“1. A computerized method, comprising: receiving, by a computing device, a data model constructed, at least in part, from protected health information (PHI) hosted by a corresponding health data provider, the received data model containing no PHI, wherein the data model includes clinical pathways representing sequences of health related events; creating, by the computing device, a state transition machine from the received data model, wherein a state of the state transition machine represents a health related event of a synthetic life, and wherein a probability of transitioning from a first state to a second state is based on a health attribute and a disease prevalence statistic; generating, by the computing device, a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, wherein the synthetic EHR for each synthetic person is generated by calculating health related events for each time step in the synthetic life, and wherein the health related events are determined in part by the health attributes of each synthetic person at each time step; calculating, by the computing device, a similarity score by comparing the generated synthetic health data set to an actual health data set including an actual EHR, wherein the actual health data set includes a health related event and associated time information for the health related event, where the health related event and associated time information are derived from the PHI hosted by the health data provider; and tuning, by the computing device, the received data model based on the score indicating similarity falling below a threshold similarity, thereby creating a tuned model by: representing the received data model as a probabilistic graphical model; determining probabilities of probabilistic graphical model based on the health related event from the actual health data set to infer a probability that a particular synthetic life will transition within the graph; and selecting a distribution probability based on interpreting the data model from the inferred graphical model probability to create updated clinical pathways in the tuned data model for use in the state transition machine.
“2. The computerized method of claim 1, further comprising transmitting, to a second computing device, the received data model based on the score indicating similarity falling at or above the threshold similarity.
“3. The computerized method of claim 2, wherein the second computing device is configured to aggregate received data models into an aggregate data model, and generate a synthetic health data set using the aggregate data model.
“4. The computerized method of claim 1, wherein calculating the similarity score comprises applying a predetermined weighting scheme to a plurality of univariate analysis of the synthetic health data set and the actual health data.
“5. The computerized method of claim 1, wherein creating the state transition machine includes supplementing the received data model with publicly available health information.
“6. The computerized method of claim 1, wherein the probability of transitioning from the first state to the second state in the state transition machine is further based on a current calendar date of the synthetic life.
“7. The computerized method of claim 1, wherein the transitions between states of the state transition machine comprise a probability the synthetic person will transition from a current health related event to a different health related event.
“8. The computerized method of claim 1, wherein the synthetic life of the state transition machine begins at a randomly initiated first time within a predetermined range and ends at a second time, wherein the second time is based on a calculated life duration for the synthetic person.
“9. The computerized method of claim 1, wherein determining probabilities of probabilistic graphical model further comprises pruning low probabilities to prevent the graphical model from overfitting.
“10. The computerized method of claim 1, wherein tuning, by the computing device, the received data model based on the score indicating similarity falling below a threshold similarity, thereby creating a tuned data model further comprises mapping the actual EHR data to align health related events with those used by the clinical pathways in state machine to resolve conflicts in different data codification systems.
“11. A computing system comprising: a health data repository constructed from protected health information (PHI) hosted by a health data provider; a computing device containing a received data model constructed from the health data repository, wherein the received data model contains no PHI and the data model includes clinical pathways representing sequences of health related events; a state transition machine that is (1) constructed from the received data model, wherein a state of the state transition machine represents a health related event of a synthetic life, wherein a probability of transitioning from a first state to a second state is based on a demographic, a health attribute, and a disease prevalence statistic, and (2) configured to generate a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, wherein the synthetic EHR for each synthetic person is generated by calculating health related events for each time step in the synthetic life, wherein the health related events are determined in part by the health attributes of each synthetic person at each time step; and a processor configured to (1) calculate a similarity score by comparing the synthetic health data set to an actual health data set including an actual EHR, wherein the actual health data set includes a health related event and associated time information for the health related event, where the health related event and associated time information are derived from the PHI hosted by the health data provider, and (2) tune the received data model based on the score indicating similarity falling below a threshold similarity, thereby creating a tuned data model by: representing the received data model as a probabilistic graphical model; determining probabilities of probabilistic graphical model based on the health related event from the actual health data set to infer a probability that a particular synthetic life will transition within the graph; and selecting a distribution probability based on interpreting the data model from the inferred graphical model probability to create updated clinical pathways in the tuned data model for use in the state transition machine.
“12. The computing system of claim 11, wherein the processor is further configured to transmit to a second computing device the received data model based on the score indicating similarity falling at or above the threshold similarity.
“13. The computing system of claim 12, where the second server device is configured to aggregate received data models into an aggregate data model, and generate a synthetic health data set using the aggregate data model.
“14. The computing system of claim 11, wherein calculating the similarity score comprises applying a predetermined weighting scheme to a plurality of univariate analysis of the synthetic health data set and the actual health data.
“15. The computing system of claim 11, wherein the construction of the state transition machine includes supplementing the received data model with publicly available health information.
“16. The computing system of claim 11, wherein the probability of transitioning from the first state to the second state in the state transition machine is further based on a current calendar date of the synthetic life.
“17. The computing system of claim 11, wherein the transitions between states of the state transition machine comprise a probability the synthetic person will transition from a current health related event to a different health related event.
“18. The computing system of claim 11, wherein determining probabilities of probabilistic graphical model further comprises pruning low probabilities to prevent the graphical model from overfitting.
“19. The computing system of claim 11, wherein tuning the received data model based on the score indicating similarity falling below a threshold similarity, thereby creating a tuned data model further comprises mapping the actual EHR data to align health related events with those used by the clinical pathways in state machine to resolve conflicts in different data codification systems.”
There are additional claims. Please visit full patent to read further.
URL and more information on this patent, see: Graham,
(Our reports deliver fact-based news of research and discoveries from around the world.)
Swedish National Road and Transport Research Institute Reports Findings in Risk Management (Crash tests to evaluate the design of temporary traffic control devices for increased safety of cyclists at road works): Risk Management
Research from Russian Academy of Sciences Provide New Insights into International Relations (The Mediterranean Sea Basin as a Single Ecosystem: Problems and Prospects for International Cooperation): Global Views – International Relations
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News