Patent Issued for System and method for computerized synthesis of simulated health data (USPTO 11705231): Cardinal Health Commerical Technologies LLC
2023 AUG 07 (NewsRx) -- By a
The patent’s inventors are Graham,
This patent was filed on
From the background information supplied by the inventors, news correspondents obtained the following quote: “In the field of healthcare, electronic health record (EHR) data refers to digital collections of patient and population health information across time. Depending on the source, EHR may include a variety of metrics such as demographics, medical history, laboratory results, and billing information. EHR data provides an extensive data source for healthcare research and quality improvement. For example, policy researchers may evaluate the spread of an epidemic across a region to identify future emergency response strategies. In other cases, a large hospital may improve patient care by identifying the best practices to treat a particular disease. Many other potential benefits of using EHR data for research exist.
“In practice, EHR data frequently includes or is associated with Protected Health Information (PHI) data. PHI data may include any information that links health status, payment, or treatment to a specific individual. For example, PHI data may include names, geographical identifiers, dates, and health insurance numbers. Due to government regulations such as the Health Insurance Portability and Accountability Act (HIPAA) in
“De-identification methods such as k-anonymization are known and have been widely used to disassociate PHI data by ensuring that data from at least K individuals are undistinguishable. However, poor implementation of anonymization schemes, lack of widespread privacy guidelines, and inadvertent cybersecurity leaks of original PHI data can place organizations using EHR data at great regulatory risk. Moreover, many types of healthcare research, such as epidemiology, may require unaltered PHI data to gain insight onto individual patient outcomes. However, this may cause the difficulty of going through Institutional Review Boards (IRB) and further creates a risk of PHI data leaks.
“In addition to the complexity of PHI regulations and de-identification, the integration of data across multiple EHR sources has many technical challenges. EHR records are not always in consistent format across EHR providers, making data transformation and aggregation difficult. Tracking individual patients that frequently change healthcare providers may be impossible without access to any PHI identification attributes. Without evidence of a robust software system that can mitigate cybersecurity threats, healthcare organizations may be reluctant to share EHR data in fear of PHI data risks.
“Additionally, with the growing field of computer driven health models, EHR data may not provide researchers with sufficient information to analyze complex health outcomes. For example, due to inconsistencies in healthcare visits, EHR data may be time limited and only extend over the period of a few years, thus affecting any models that extend over a patient’s lifetime. Also, the amount of EHR data may be limited for rare diseases or conditions or nonexistent for unknown diseases and comorbidities. In these cases, it may be difficult to rely on EHR data to predict complex events or to extrapolate the health outcomes of patient populations over a lifetime.
“Consequently, a need exists for a system to address the shortcomings of using EHR data in medical research and quality improvement.”
Supplementing the background information on this patent, NewsRx reporters also obtained the inventors’ summary information for this patent: “In one aspect, a method is disclosed. The method includes a computing device for (1) receiving at least one respective data model constructed at least in part from protected health information (PHI) hosted by a health data provider of a plurality of health data providers, the received data model containing no PHI, (2) creating at least one state transition machine from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one health attribute and at least one disease prevalence statistic, (3) generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating one or more health related events for each time step in the synthetic life and where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step, (4) calculating a similarity score by comparing the generated synthetic health data set to an actual health data set, where the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (5) adjusting the at least one received data model based on the score indicating similarity falling below a threshold similarity, thereby creating at least one adjusted data model.
“In another aspect, a computing system is disclosed. The system includes at least one health data repository constructed from protected health information (PHI) hosted by a health data provider of a plurality of health data providers. The system further includes a computing device containing at least one received data model constructed from the at least one health data repository, where the at least one received data model contains no PHI. The system also includes at least one state transition machine that is (1) constructed from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one demographic and at least one health attribute and at least one disease prevalence statistic, and (2) configured to generate a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating a one or more health related events for each time step in the synthetic life, where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step. The system also includes a processor configured to (1) calculate a similarity score by comparing the synthetic health data set to an actual health data set, wherein the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (2) adjust the at least one received data model based on the score indicating similarity falling below a threshold similarity, thereby creating at least one adjusted data model.
“In yet another aspect, a non-transitory computer readable medium is provided. The non-transitory computer readable medium has stored thereon instructions, that when executed by one or more processors of a computing device, cause the computing device to perform functions. The functions include: (1) receiving at least one respective data model constructed at least in part from protected health information (PHI) hosted by a health data provider of a plurality of health data providers, the received data model containing no PHI, (2) creating at least one state transition machine from the at least one received data model, where a state of the at least one state transition machine represents a health related event of a synthetic life and where a probability of transitioning from a first state to a second state is based on at least one health attribute and at least one disease prevalence statistic, (3) generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, where the synthetic EHR for each synthetic person is generated by calculating one or more health related events for each time step in the synthetic life and where the one or more health related events are determined in part by the health attributes of each synthetic person at each time step, (4) calculating a similarity score by comparing the generated synthetic health data set to an actual health data set, where the actual health data set includes at least one health related event and associated time information for that at least one health related event, the at least one health related event and associated time information derived from at least a portion of the PHI hosted by the health data provider, and (5) adjusting the at least one received data model based on the score indicating similarity falling below a threshold similarity, where adjusting the at least one received data model comprises representing the at least one received data model as at least one probabilistic graphical model and determining probabilities of the at least one probabilistic graphical model based on at least one health related event from the actual health data set, thereby creating at least one adjusted data model.
“The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the figures and the following detailed description and the accompanying drawings.”
The claims supplied by the inventors are:
“1. A computerized method, comprising: creating, by a computing device, a state transition machine based upon data representing a clinical pathway that represents a sequence of health related events, wherein: a state of the state transition machine represents a health related event; and a probability of transitioning from a first state to a second state is based on a health attribute; generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, wherein a select synthetic EHR for an associated synthetic person is generated by calculating health related events that follow corresponding states of the state transition machine for each time step in a synthetic life of that associated synthetic person, and wherein the health related events are determined at least in part by health attributes associated with that synthetic person at each time step; calculating, by the computing device, a similarity score by comparing the generated synthetic health data set to an actual health data set including an actual EHR; and tuning, by the computing device, at least one of the data representing the clinical pathway or the state transition machine, based on the computed similarity score indicating similarity falling below a threshold similarity, to create updated clinical pathways for use in the state transition machine.
“2. The computerized method of claim 1, wherein calculating, by the computing device, the similarity score, comprises performing a statistical comparison of the synthetic health data set and the actual health data set; further comprising: performing where the similarity score meets or exceeds the threshold similarity: indicating that the synthetic health data set is representative of the actual health data; and outputting a discovered model that can be used to generate additional synthetic health data.
“3. The computerized method of claim 1, wherein tuning, by the computing device, at least one of the data representing the clinical pathway or the state transition machine, based on the computed similarity score indicating similarity falling below a threshold similarity, comprises: determining probabilities based on the health related event from the actual health data set to infer a probability that a particular synthetic life will transition according to a corresponding clinical pathway; and selecting a distribution probability based on interpreting the data representing the clinical pathway from the inferred probability to create the updated clinical pathways for use in the state transition machine.
“4. The computerized method of claim 3 further comprising: representing at least one clinical pathway of the state transition machine in a probabilistic graphical model format that utilizes inference parameters as evidence variables.
“5. The computerized method of claim 4 further comprising: pruning probabilities below a predetermined value within the distribution of probabilities to prevent the graphical model from overfitting.
“6. The computerized method of claim 3, wherein determining probabilities comprises inferring a probability that a particular patient makes a single transition utilizing a probabilistic inference method on a Bayesian network to compute posterior distributions of variables given evidence variables.
“7. The computerized method of claim 1, wherein: creating, by the computing device, the state transition machine based upon data representing the clinical pathway comprises: extracting the clinical pathway from a knowledge model, where the knowledge model stores clinical pathways representing known sequences of health related events that a patient may experience for a given health condition; and tuning, by the computing device, at least one of the data representing the clinical pathway or the state transition machine comprises: adjusting the knowledge model.
“8. The computerized method of claim 1 further comprising: receiving, by the computing device, a data model constructed, at least in part, from protected health information (PHI) hosted by a corresponding health data provider, the received data model containing no PHI, wherein the data model includes the data representing the clinical pathway corresponding to a sequence of health related events; wherein: calculating, by the computing device, the similarity score further comprises deriving for the actual health data set, a health related event and associated time information for the health related event, from the PHI hosted by the health data provider, which is associated with the data model.
“9. The computerized method of claim 1, wherein calculating the similarity score comprises applying a predetermined weighting scheme to a plurality of univariate analysis of the synthetic health data set and the actual health data.
“10. The computerized method of claim 1, wherein creating, by the computing device, the state transition machine includes at least one of: computing the probability of transitioning from the first state to the second state based on a current calendar date of an associated synthetic person; or computing the probability of transitioning from the first state to the second state further based upon a probability that an associated synthetic person will transition from a current health related event to a different health related event.
“11. The computerized method of claim 1, wherein the state machine simulates one or more synthetic lives using at least one machine learning model selected from the group consisting of a neural network, a Bayesian network, a hidden Markov model, a Markov decision process, and a set/graph theory model.
“12. A computer-implemented system comprising: a processor coupled to memory, where the processor reads out instructions in the memory to implement a model discovery agent, the model discovery agent comprising: a state transition machine that is based upon data representing a clinical pathway that represents a sequence of health related events, wherein: a state of the state transition machine represents a health related event; and a probability of transitioning from a first state to a second state is based on a health attribute; and a model evaluator, the model evaluator programmed to: generate a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, wherein a select synthetic EHR for an associated synthetic person is generated by calculating health related events that follow corresponding states of the state transition machine for each time step in a synthetic life of that associated synthetic person, and wherein the health related events are determined at least in part by health attributes associated with that synthetic person at each time step; calculate a similarity score by comparing the generated synthetic health data set to an actual health data set including an actual EHR; and tune at least one of the data model or the state transition machine, based on the computed similarity score indicating similarity falling below a threshold similarity, by: determining probabilities based on the health related event from the actual health data set to infer a probability that a particular synthetic life will transition according to a corresponding clinical pathway; and selecting a distribution probability based on interpreting the data model from the inferred probability to create updated clinical pathways for use in the state transition machine.
“13. A computerized method, comprising: repeatedly performing, until a similarity score indicating similarity satisfies a threshold similarity: creating, by a computing device, a state transition machine based upon data representing a clinical pathway that represents a sequence of health related events, wherein: a state of the state transition machine represents a health related event; and a probability of transitioning from a first state to a second state is based on a health attribute; generating a synthetic health data set comprised of a synthetic electronic health record (EHR) for each synthetic person in a synthetic population, wherein a select synthetic EHR for an associated synthetic person is generated by calculating health related events that follow corresponding states of the state transition machine for each time step in a synthetic life of that associated synthetic person, and wherein the health related events are determined at least in part by health attributes associated with that synthetic person at each time step; calculating, by the computing device, a new similarity score by comparing the generated synthetic health data set to an actual health data set including an actual EHR; and tuning, by the computing device, at least one of the data model or the state transition machine, based on the newly computed similarity score indicating similarity falling below the threshold similarity, to create updated clinical pathways for use in the state transition machine.
“14. The computerized method of claim 13, wherein tuning, by the computing device, at least one of the data representing the clinical pathway or the state transition machine, based on the computed similarity score indicating similarity falling below a threshold similarity, comprises: determining probabilities based on the health related event from the actual health data set to infer a probability that a particular synthetic life will transition according to a corresponding clinical pathway; and selecting a distribution probability based on interpreting the data representing the clinical pathway from the inferred probability to create the updated clinical pathways for use in the state transition machine.”
There are additional claims. Please visit full patent to read further.
For the URL and additional information on this patent, see: Graham,
(Our reports deliver fact-based news of research and discoveries from around the world.)
Patent Issued for Beacon-based management of queues (USPTO 11704711): Massachusetts Mutual Life Insurance Company
Patent Issued for Systems and methods for parsing multiple intents in natural language speech (USPTO 11705114): State Farm Mutual Automobile Insurance Company
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News