Patent Issued for Systems, devices, and methods for data analytics (USPTO 11669538): Massachusetts Mutual Life Insurance Company
2023 JUN 23 (NewsRx) -- By a
Patent number 11669538 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “A data file (e.g., spreadsheet, portable document format (PDF) file, text file, delimited file, flat file) can store a numerical value in a machine-readable format. As such, the numerical value can be copied based on the machine-readable format and then inserted into a data process (e.g., business intelligence algorithm, analytic algorithm) for generating a result based on the numerical value.
“If the numerical value is updated in the data file and the data process is not updated based on the numerical value as updated, then the result may be stale. Although the data process can be programmed to monitor the numerical value in the data file in real-time, this form of monitoring becomes logically complex and computationally intensive as the data process is scaled up in terms of data dimensionality, data volume, data types, data content, data source quantity, and data source speed. For example, if the numerical value conflicts in format, type, or content with other data used by the data process, then, as the data process is scaled up, such conflict can be logically complex and computationally intensive to deal with, especially while to trying to remain compliant with atomicity, consistency, isolation, and durability (ACID) principles. Likewise, if the data process is distributed over a network, then such configuration even further complicates data availability, data conflict management, and compliance with the ACID principles. Similarly, if the numerical value is hardcoded into the data process in order to avoid usage of the data file, then modifying the numerical value requires a set of specialized computer programming knowledge, which is often not readily available. Moreover, if the numerical value is hardcoded into the data process in a plurality of locations within the data process or if the numerical value depends on some source code internal or external to the data process or if some source code internal or external to the data process depends on the numerical value, then such configuration even further complicates data availability, data conflict management, and compliance with the ACID principles.”
In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “Various systems and methods use a value in a data file for a data process, as the data process is scaled up in terms of dataset dimensionality, data volume, data types, data content, data source quantity, and data source speed, while remaining compliant with the ACID principles. As such, these technologies provide for sourcing of data from various data sources, where the data includes the data file storing the value. The data is cleansed and fused, which enables a report to be generated. Subsequently, in response to the value in the data file being modified, the data, inclusive of the data file storing the value, is again cleansed and fused based on the value being modified. This processing cascades and in-turn enables the report to be modified based on the value being modified, which can be in real-time. For example, because the value in the data file can be manually modified by a non-programmer, without being manually hardcoded into a software application by a programmer in various source code locations of the software application, the value in the data file can be manually modified when a change in the value is desired (e.g., based on external factors). Therefore, this form of modification can be less technical for non-programmers and more user friendly for non-programmers.
“In an embodiment, a method comprises: sending, by a processor, a multidimensional dataset from a massively parallel processing (MPP) database of a first cloud computing platform (CCP) to an extract, transform, and load (ETL) application of a second CCP; sending, by the processor, a first set of datasets and a second set of datasets to the ETL application, wherein the first set of datasets is different from each other in format and content and is sourced from a set of data files external to the first CCP and the second CCP, wherein the set of data files includes a spreadsheet with a cell value, wherein the second set of datasets is sourced from a set of schematically different databases external to the first CCP and the second CCP; performing, by the processor, a data cleanse on the multidimensional dataset, the first set of datasets inclusive of the cell value, and the second set of datasets within the ETL application such that a cleansed dataset is generated within the ETL application; performing, by the processor, a data fusion on the cleansed dataset within the ETL application such that a fused dataset is generated within the ETL application; populating, by the processor, a MPP column-oriented database with the fused dataset sourced from the ETL application; generating, by the processor, a report based on the fused dataset sourced from the MPP column-oriented database; modifying, by the processor, the cell value in the spreadsheet after the report is generated without modifying the first set of datasets and the second set of datasets; updating, by the processor, the fused dataset in the MPP column-oriented database in real-time based on the cell value being modified; and updating, by the processor, the report in real-time based on the fused dataset in the MPP column-oriented database being updated.
“In an embodiment, a system comprises: a processor programmed to: send a multidimensional dataset from a massively parallel processing (MPP) database of a first cloud computing platform (CCP) to an extract, transform, and load (ETL) application of a second CCP; send a first set of datasets and a second set of datasets to the ETL application, wherein the first set of datasets is different from each other in format and content and is sourced from a set of data files external to the first CCP and the second CCP, wherein the set of data files includes a spreadsheet with a cell value, wherein the second set of datasets is sourced from a set of schematically different databases external to the first CCP and the second CCP; perform a data cleanse on the multidimensional dataset, the first set of datasets inclusive of the cell value, and the second set of datasets within the ETL application such that a cleansed dataset is generated within the ETL application; perform a data fusion on the cleansed dataset within the ETL application such that a fused dataset is generated within the ETL application; populate a MPP column-oriented database with the fused dataset sourced from the ETL application; generate a report based on the fused dataset sourced from the MPP column-oriented database; modify the cell value in the spreadsheet after the report is generated without modifying the first set of datasets and the second set of datasets; update the fused dataset in the MPP column-oriented database in real-time based on the cell value being modified; and update the report in real-time based on the fused dataset in the MPP column-oriented database being updated.”
The claims supplied by the inventors are:
“1. A method comprising: performing, by a processor, a data cleanse on a first set of datasets, a second set of datasets, and a dataset such that a cleansed dataset is generated within an extract, transform, and load (ETL) application, wherein the first set of datasets is sourced from a set of data files, and wherein the first set of datasets and the second set of datasets are from different databases; performing, by the processor, a data fusion on the cleansed dataset within the ETL application such that a fused dataset is generated within the ETL application, the fused dataset integrating the first set of datasets, the second set of datasets, and the dataset; generating, by the processor, a report based on the fused dataset; modifying, by the processor, a cell value in the set of data files after the report is generated without modifying the first set of datasets and the second set of datasets; updating, by the processor, the fused dataset based on the modified cell value; and updating, by the processor, the report in real-time based on the updated fused dataset.
“2. The method of claim 1, further comprising: populating, by the processor, a column oriented database with the fused dataset sourced from the ETL application, wherein the report is generated, by the processor, based on the fused dataset sourced from the column oriented database.
“3. The method of claim 1, wherein performing, by the processor, the data cleanse on the first set of datasets, the second set of datasets, and the dataset includes: modifying, by the processor, an inaccurate data from the first set of datasets, the second set of datasets, and the dataset.
“4. The method of claim 1, wherein performing, by the processor, the data cleanse on the first set of datasets, the second set of datasets, and the dataset includes: removing, by the processor, an inaccurate data from the first set of datasets, the second set of datasets, and the dataset.
“5. The method of claim 1, wherein performing, by the processor, the data fusion on the cleansed dataset within the ETL application includes: resolving, by the processor, a semantic conflict in the cleansed dataset.
“6. The method of claim 1, wherein the dataset is a multidimensional dataset from a massively parallel processing (MPP) database of a first cloud computing platform (CCP), wherein the MPP database stores a set of records populated from a set of data sources external to the MPP database, wherein the set of records and the set of data sources are not associated with the first CCP, wherein the multidimensional dataset is generated from the set of records within the first CCP.
“7. The method of claim 6, wherein the multidimensional dataset is not associated with the first CCP and a second CCP associated with the ETL application.
“8. The method of claim 7, wherein the first set of datasets is not associated with the first CCP and the second CCP.
“9. The method of claim 7, wherein the second set of datasets is not associated with the first CCP and the second CCP.
“10. The method of claim 1, wherein the fused dataset is updated in a massively parallel processing (MPP) column-oriented database in real-time based on the data cleanse being performed on the dataset, the first set of datasets inclusive of the cell value as modified, and the second set of datasets within the ETL application after the report is generated and in real-time response to the cell value being modified after the report is generated.
“11. The method of claim 10, wherein the fused dataset is updated in the MPP column-oriented database in real-time based on the data fusion being performed on the dataset, the first set of datasets inclusive of the cell value as modified, and the second set of datasets within the ETL application after the report is generated and in real-time response to the data cleanse being performed after the report is generated.
“12. A system comprising: a processor configured to: perform a data cleanse on a first set of datasets, a second set of datasets, and a dataset such that a cleansed dataset is generated within an extract, transform, and load (ETL) application, wherein the first set of datasets is sourced from a set of data files, and wherein the first set of datasets and the second set of datasets are from different databases, perform a data fusion on the cleansed dataset within the ETL application such that a fused dataset is generated within the ETL application, the fused dataset integrating the first set of datasets, the second set of datasets, and the dataset, generate a report based on the fused dataset, modify a cell value in the set of data files after the report is generated without modifying the first set of datasets and the second set of datasets, update the fused dataset based on the modified cell value, and update the report in real-time based on the updated fused dataset.
“13. The system of claim 12, wherein the processor is configured to: populate a column oriented database with the fused dataset sourced from the ETL application, wherein the processor is configured to generate the report, based on the fused dataset sourced from the column oriented database.
“14. The system of claim 12, wherein the processor is configured to perform the data cleanse on the first set of datasets, the second set of datasets, and the dataset by at least one of: modifying an inaccurate data from the first set of datasets, the second set of datasets, and the dataset or removing an inaccurate data from the first set of datasets, the second set of datasets, and the dataset.
“15. The system of claim 12, wherein the processor is configured to perform the data fusion on the cleansed dataset within the ETL application by: resolving a semantic conflict in the cleansed dataset.
“16. The system of claim 12, wherein the dataset is a multidimensional dataset from a massively parallel processing (MPP) database of a first cloud computing platform (CCP), wherein the MPP database stores a set of records populated from a set of data sources external to the MPP database, wherein the set of records and the set of data sources are not associated with the first CCP, wherein the multidimensional dataset is generated from the set of records within the first CCP.
“17. The system of claim 16, wherein the multidimensional dataset is not associated with the first CCP and a second CCP associated with the ETL application.
“18. The system of claim 17, wherein the first set of datasets is not associated with the first CCP and the second CCP, and wherein the second set of datasets is not associated with the first CCP and the second CCP.
“19. The system of claim 12, wherein the fused dataset is updated in a massively parallel processing (MPP) column-oriented database in real-time based on the data cleanse being performed on the dataset, the first set of datasets inclusive of the cell value as modified, and the second set of datasets within the ETL application after the report is generated and in real-time response to the cell value being modified after the report is generated.
“20. The system of claim 19, wherein the fused dataset is updated in the MPP column-oriented database in real-time based on the data fusion being performed on the dataset, the first set of datasets inclusive of the cell value as modified, and the second set of datasets within the ETL application after the report is generated and in real-time response to the data cleanse being performed after the report is generated.”
URL and more information on this patent, see: Sommers, Timothy. Systems, devices, and methods for data analytics.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Findings from Griffith University Has Provided New Data on Managed Care (Moral Hazard In Australian Private Health Insurance: the Case of Dental Care Services and Extras Cover): Managed Care
New Insurance Findings Has Been Reported by Investigators at RTI International (Benchmarking Changes and Selective Participation In the Medicare Shared Savings Program): Insurance
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News