“Training Machine Learning Algorithms With Temporaly Variant Personal Data, And Applications Thereof” in Patent Application Approval Process (USPTO 20230409966): VEDA Data Solutions Inc.
2024 JAN 04 (NewsRx) -- By a
This patent application is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “
“Field
“This field is generally related to processing information.
“Background
“As technology advances, an ever increasing amount of personal data is becoming digitized, and as a result, more and more personal data is becoming lawfully accessible. The increased accessibility of personal data has spawned new industries focused on lawfully mining personal data.
“A personal data record may include a number of properties. A data record representing an individual may include properties such as the name of the individual, his or her city, state, and ZIP code. In addition to demographic information, data records can include information about a person’s behavior. Data records from different sources may comprise different properties. Systems exist for collecting information describing characteristics or behavior of separate individuals. Collecting such personal information has many applications, including in national security, law enforcement, marketing, healthcare and insurance.
“In healthcare for example, a healthcare provider may have inconsistent personal information, such as address information, from a variety of data sources, including the national provider identifier registration,
“As records receive more updates from different sources, they also have a greater risk of inconsistency and errors associated with data entry. In these ways, data records all describing the same individual can be incongruous, inconsistent, and erroneous in their content. From these various sources, a single healthcare provider can have many addresses, perhaps as many as 200 addresses. The sources may disagree about what the right address is. Some healthcare providers have multiple correct addresses. For this reason, the fact that a provider may have a more recent address does not mean that older addresses are incorrect.
“Some health and dental insurance companies have staff tasked with manually calling healthcare providers in an effort to determine their correct address. However, this manual updating is expensive because a healthcare provider’s address information may change frequently. In addition to address information, similar issues are present with other demographic information relating to a healthcare provider, such as its phone number.
“In addition, fraudulent claims are enormous problems in healthcare. By some estimates, fraudulent claims may steal in excess of
“Data-directed algorithms, known as machine learning algorithms, are available to make predictions and conduct certain data analysis. Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms can be used for prediction and estimation.
“To develop such models, they first must be trained. Generally, the training involves inputting a set of parameters, called features, and known correct or incorrect values for the input features. After the model is trained, it may be applied to new features for which the appropriate solution is unknown. By applying the model in this way, the model predicts, or estimates, the solution for other cases that are unknown. These models may uncover hidden insights through learning from historical relationships and trends in the database. The quality of these machine learning models may depend on the quality and quantity of the underlying training data.
“Systems and methods are needed to improve identification and forecasting of the correct personal information, such as a healthcare provider’s demographic information and propensity for fraud, or a data source.”
In addition to the background information obtained for this patent application, NewsRx journalists also obtained the inventor’s summary information for this patent application: “In an embodiment, a computer-implemented method trains a machine learning algorithm with temporally variant personal data. At a plurality of times, a data source is monitored to determine whether data relating to a person has updated. When data for the person has been updated, the updated data is stored in a database such that the database includes a running log specifying how the person’s data has changed over time. The person’s data includes values for a plurality of properties relating to the person. An indication is received that a value for the particular property in the person’s data was verified as accurate or inaccurate at a particular time. From the database based on the particular time, the person’s data is retrieved, including values for the plurality of properties that were up-to-date at the particular time. Using the retrieved data and the indication, a model is trained such that the model can predict whether another person’s value for the particular property is accurate. In this way, having the retrieved data be current to the particular time maintains the retrieved data’s significance in training the model.
“System and computer program product embodiments are also disclosed.
“Further embodiments, features, and advantages of the invention, as well as the structure and operation of the various embodiments, are described in detail below with reference to accompanying drawings.
“The drawing in which an element first appears is typically indicated by the leftmost digit or digits in the corresponding reference number. In the drawings, like reference numbers may indicate identical or functionally similar elements.”
The claims supplied by the inventors are:
“1-22. (canceled)
“23. A computer-implemented method for linking ingested data, the method comprising: accessing, by one or more computing devices, data records of individuals; parsing, by the one or more computing devices, to locate each individual’s demographic information; assigning, by the one or more computing devices, the demographic information into predetermined categories; comparing, by the one or more computing devices, each record against all other categorized records using a pair-wise function to determine if they are the same; calculating, by the one or more computing devices, a similarity score for each pair of records, wherein the similarity score is a ratio based on how many categories match for each data pair of records; determining, by the one or more computing devices, whether the similarity score meets or exceeds a similarity score threshold; based on determining the similarity score meets or exceeds the similarity score threshold, linking, by the one or more computing devices, the data pair in a group; determining, by the one or more computing devices, a most prevalent identity within the group; and modifying, by the one or more computing devices, the data records to match the most prevalent identity within the group.
“24. The method of claim 23, further comprising normalizing, by the one or more computing devices, the data records to be consistent with a predetermined format.
“25. The method of claim 23, wherein the comparing is performed using regular expression matching or fuzzy matching.
“26. The method of claim 25, wherein the regular expression matching determines whether two values match when they both satisfy the same regular expression.
“27. The method of claim 25, wherein the fuzzy matching determines whether two values match when two strings match a pattern approximately.
“28. The method of claim 23, further comprising: assigning, by the one or more computing devices, a weight to each of the predetermined categories; and wherein the similarity score is determined based on the weight for each of the predetermined categories.
“29. The method of claim 23, wherein modifying the data records to match the most prevalent identity within the group comprises standardizing, by the one or more computing devices, each individual’s demographic information within the group to match the most prevalent identity within the group.
“30. A non-transitory computer readable medium having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations for linking ingested data, the operations comprising: accessing, by one or more computing devices, data records of individuals; parsing, by the one or more computing devices, to locate each individual’s demographic information; assigning, by the one or more computing devices, the demographic information into predetermined categories; comparing, by the one or more computing devices, each record against all other categorized records using a pair-wise function to determine if they are the same; calculating, by the one or more computing devices, a similarity score for each pair of records, wherein the similarity score is a ratio based on how many categories match for each data pair of records; determining, by the one or more computing devices, whether the similarity score meets or exceeds a similarity score threshold; based on determining the similarity score meets or exceeds the similarity score threshold, linking, by the one or more computing devices, the data pair in a group; determining, by the one or more computing devices, a most prevalent identity within the group; and modifying, by the one or more computing devices, the data records to match the most prevalent identity within the group.
“31. The non-transitory computer readable medium of claim 30, wherein the operations further comprise normalizing, by the one or more computing devices, the data records to be consistent with a predetermined format.
“32. The non-transitory computer readable medium of claim 30, wherein the comparing is performed using regular expression matching or fuzzy matching.
“33. The non-transitory computer readable medium of claim 32, wherein the regular expression matching determines whether two values match when they both satisfy the same regular expression.
“34. The non-transitory computer readable medium of claim 32, wherein the fuzzy matching determines whether two values match when two strings match a pattern approximately.
“35. The non-transitory computer readable medium of claim 30, wherein the operations further comprise: assigning, by the one or more computing devices, a weight to each of the predetermined categories; and wherein the similarity score is determined based on the weight for each of the predetermined categories.
“36. The non-transitory computer readable medium of claim 30, wherein modifying the data records to match the most prevalent identity within the group comprises standardizing, by the one or more computing devices, each individual’s demographic information within the group to match the most prevalent identity within the group.
“37. A system for linking ingested data comprising: a memory storing instructions; a computing device, coupled to the memory, configured to process the stored instructions to: access data records of individuals; parse to locate each individual’s demographic information; assign the demographic information into predetermined categories; compare each record against all other categorized records using a pair-wise function to determine if they are the same; calculate a similarity score for each pair of records, wherein the similarity score is a ratio based on how many categories match for each data pair of records; determine whether the similarity score meets or exceeds a similarity score threshold; based on determining the similarity score meets or exceeds the similarity score threshold, link the data pair in a group; determine a most prevalent identity within the group; and modify the data records to match the most prevalent identity within the group.
“38. The system of claim 37, wherein the computing device is further configured to normalize the data records to be consistent with a predetermined format.
“39. The system of claim 37, wherein the comparing is performed using regular expression matching or fuzzy matching.
“40. The system of claim 39, wherein the regular expression matching determines whether two values match when they both satisfy the same regular expression.
“41. The system of claim 39, wherein the fuzzy matching determines whether two values match when two strings match a pattern approximately.
“42. The system of claim 37, wherein the computing device is further configured to: assign a weight to each of the predetermined categories, wherein the similarity score is determined based on the weight for each of the predetermined categories; and modify the data records to match the most prevalent identity within the group comprises standardizing each individual’s demographic information within the group to match the most prevalent identity within the group.”
URL and more information on this patent application, see: LINDER,
(Our reports deliver fact-based news of research and discoveries from around the world.)
Research in the Area of Alzheimer Disease Reported from Centers for Disease Control and Prevention (Financial And Health Insurance Statuses of Alzheimer’s Disease And Related Dementia Caregivers): Neurodegenerative Diseases and Conditions – Alzheimer Disease
Patent Application Titled “Protective Deactivation Of Gdpr Wallet” Published Online (USPTO 20230409720): Patent Application
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News