Patent Issued for System, Method And Computer Program For Data Scraping Using Script Engine (USPTO 10,635,488)
2020 MAY 07 (NewsRx) -- By a
The patent’s inventors are Jeong, Jae Won (Gyeonggi-do, JP); Back,
This patent was filed on
From the background information supplied by the inventors, news correspondents obtained the following quote: “In general, screen scraping is implemented by a software that extracts only necessary data from data displayed on the screen, and is technology that automatically accesses a system, displays data on the screen, and extracts and fetches only necessary data. Because it extracts information from a web site and stores it in another site or database, it is also called web scraping. Because of storing data, it is possible to see at any time when needed, and the stored data may be processed for the purpose of use as comparison analysis data. In particular, it is an essential program for Internet banking and is being actively run by each financial institution, and can be used at any place where users can obtain information by clicking, for example, reward programs such as mileages of hotels, airline companies, rent cars and oil stations, e-mail integration check, news, chat, weather, etc.
“The screen scraping technology has been widespread in
“The screen scraping technology is largely classified into client side and server side, and is highly useful in account integration services, or personal financial management or business financial management programs, but client side screen scraping technology overwhelmingly predominates over server side due to the domestic security policy requiring the end-to-end policy enforcement.
“FIG. 1 is an architecture diagram of a conventional account integration service system using client side screen scraping.
“As shown in FIG. 1, when a user 10 accesses web service programs 41 registered in first to n.sup.th institutions 40-1 to 40-n through an account integration service program 20 via an Internet network 30, the conventional account integration service system performs a service according to communication security policies of the first to n.sup.th institutions 40-1 to 40-n. In this instance, the screen scraping is performed using the account integration service program 20.
“As shown in FIG. 1, upon screen scraping of a finance related web service, the conventional account integration service system using client side screen scraping is configured to execute a plurality of screen scraping tasks in a sequential order and receive the results due to service stability or technical limitations. For example, upon scraping to see transaction details of many bank accounts, there is a problem with serious performance degradation.
“To solve this problem, technology for parallel screen scraping by a plurality of scraping machines has been developed. However, conventionally, for scraping machines to work in different operating systems, it is necessary to separately develop scraping modules suitable for each operating system to conform to the security policies required by the financial institutions. For example, there is a need to develop each separate scraping modules for Windows operating system based PCs as well as Linux or OS X based computers, or mobile operating systems such as iOS and android, and as the type of users’ devices and operating systems becomes varied, the scale and cost of equipment used for development exponentially increase.”
Supplementing the background information on this patent, NewsRx reporters also obtained the inventors’ summary information for this patent: “According to an aspect of the present disclosure, there is provided a system, method and computer program for data scraping, in which a script engine in which environment information of scraping is stored allows the processing of scraping from various operating systems in one scraping module, and generates a communication session conforming to a communication method of a scraping target institution and transmits an authentication value obtained by processing user information according to an authentication method of the target institution, thereby enabling operation in various platforms and collecting scraping information from various institutions without constraints by the operating system.
“A data scraping system according to an embodiment includes a transmitting/receiving unit having a service program that calls inquiry or execution requiring scraping and configured to receive a scraping request including user information for scraping from a user device and transmit scraped data to the user device, and at least one data scraping information collection unit configured to scrape data from at least one institution using the user information received in the transmitting/receiving unit.
“Each of the at least one data scraping information collection unit includes a scraping engine unit in platform independent script, the scraping engine unit configured to store identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution, and scrape data from the institution based on the environment information.
“In an embodiment, the scraping engine unit includes a scraping environment management unit configured to store the environment information, a session management unit configured to generate a communication session between the institution and the scraping engine unit based on the communication information, a communication management unit configured to process the user information based on the authentication information, and a script engine configured to transmit an authentication value obtained by processing the user information to the institution and scrape data from the institution.
“In an embodiment, the user information includes a user’s biometric authentication information.
“In an embodiment, each of the at least one data scraping information collection unit further includes a meta database to designate a data item to be scraped. In this instance, the scraping engine unit is further configured to extract data to scrape based on the meta database from a data set provided by the institution.
“In an embodiment, each of the at least one data scraping information collection unit further includes a task management unit to allocate the user information for scraping and a task based on the user information to the scraping engine unit using an internal scheduling algorithm.
“In an embodiment, each of the at least one data scraping information collection unit further includes a platform management unit to monitor if the task management unit normally operates, when an error occurs, execute the task management unit again, and store, in the meta database, identification information for identifying a location in which the data item to be scraped is positioned in the data set.
“In an embodiment, each of the at least one data scraping information collection unit further includes a verification unit configured to verify the scraping request by inspecting if the service program of the transmitting/receiving unit and the scraping engine unit are cross-authenticated.
“In an embodiment, each of the at least one data scraping information collection unit further includes a certificate distribution unit configured to store one or multiple users’ certificates that are cross-authenticated with the data scraping information collection unit, and install a necessary certificate in the scraping engine unit based on a scraping task allocated to the data scraping information collection unit.
“In an embodiment, each of the at least one data scraping information collection unit further includes a module update unit configured to update a certificate installed in the scraping engine unit using a certificate received from the user device, when the certificate installed in the user device and the certificate installed in the scraping engine unit are different.
“In an embodiment, each of the at least one data scraping information collection unit further includes a scraping error modification unit to, when an error occurs while the user device directly performs scraping, analyze an error log from scraping input information inputted in the user device at the time of error occurrence, and update a scraping module in the user device based on the analyzed error log.
“A data scraping method according to an embodiment includes receiving, by a transmitting/receiving unit of a data scraping system having a service program that calls inquiry or execution requiring scraping, a scraping request including information of inquiry or execution called by a user and user information from a user device, allocating, by the data scraping system, the user information and a task based on the user information to a scraping engine unit of the data scraping system, wherein the scraping engine unit is in platform independent script and stores identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution, scraping, by the scraping engine unit, data from the institution based on the environment information, and transmitting, by the transmitting/receiving unit, the scraped data to the user device.
“In an embodiment, the scraping of data from the institution includes generating, by the scraping engine unit, a communication session between the institution and the scraping engine unit based on the communication information, processing, by the scraping engine unit, the user information based on the authentication information, and transmitting, by the scraping engine unit, an authentication value obtained by processing the user information to the institution and scraping data from the institution.
“The data scraping method according to an embodiment further includes, before the scraping of data from the institution, storing, by the data scraping system, a data item to be scraped in a meta database. In this instance, the scraping of data from the institution includes extracting data to scrape based on the meta database from a data set provided by the institution.
“The data scraping method according to an embodiment further includes, before the scraping of data from the institution, allocating, by a task management unit of the data scraping system, the user information and the task to the scraping engine unit using an internal scheduling algorithm.
“The data scraping method according to an embodiment further includes monitoring, by the data scraping system, if the task management unit normally operates and when an error occurs, executing the task management unit again, and storing, by the data scraping system, identification information in the meta database, the identification information for identifying a location in which the data item to be scraped is positioned in the data set.
“The data scraping method according to an embodiment further includes, before the scraping of data from the institution, verifying, by the data scraping system, the scraping request by inspecting if the service program of the transmitting/receiving unit and the scraping engine unit are cross-authenticated.
“In an embodiment, the data scraping system stores one or multiple users’ certificates that are authenticated with the data scraping system. In this instance, the data scraping method according to an embodiment further includes, before the scraping of data from the institution, installing, by the data scraping system, a necessary certificate in the scraping engine unit based on the scraping task allocated to the scraping engine unit.
“The data scraping method according to an embodiment further includes, before the scraping of data from the institution, updating, by the data scraping system, a certificate installed in the scraping engine unit using a certificate received from the user device, when the certificate installed in the user device and the certificate installed in the scraping engine unit are different.
“The data scraping method according to an embodiment further includes, when an error occurs while the user device directly performs scraping, analyzing, by the data scraping system, an error log from scraping input information inputted in the user device at the time of error occurrence, and updating, by the data scraping system, a scraping module in the user device based on the analyzed error log.
“A computer program according to an embodiment may be stored in a computer-readable medium to perform the data scraping method according to the above-described embodiments in combination with hardware.
“According to the system and method for data scraping in accordance with an aspect of the present disclosure, the script engine in which environment information of scraping is stored is mounted in the scraping module, allowing the processing of scraping from various operating systems in one scraping module without any need to separately develop scraping modules for each operating system, thereby significantly reducing costs and efforts incurred in developing scraping systems in keeping up with the current trend towards the diversity of devices and operating systems used.
“According to the system and method for data scraping in accordance with an aspect of the present disclosure, the script engine generates a communication session with financial institution such as banks, stock brokerages and card companies, public organization, or any other institution that provides property information in conformity with a communication method required by the corresponding institution, and transmits an authentication value obtained by processing user information according to an authentication method of the target institution, in order to scrape desired data, for example, financial information such as account balances, account transaction details, card acceptance details, card statements, card limits, stock balances and insurance details, or public information such as
The claims supplied by the inventors are:
“What is claimed is:
“1. A data scraping system, comprising: a transmitting/receiving unit having a service program that calls inquiry or execution requiring scraping, and configured to receive a scraping request including user information for scraping from a user device and transmit scraped data to the user device; and at least one data scraping information collection unit configured to scrape data from at least one institution using the user information received in the transmitting/receiving unit, wherein each of the at least one data scraping information collection unit comprises a scraping engine unit in platform independent script, the scraping engine unit configured to store identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution, and scrape data from the institution based on the environment information, wherein the communication information comprises at least one of a type of communication encryption protocol or session maintenance time, wherein the scraping engine unit comprises: a scraping environment management unit configured to store the environment information; a session management unit configured to generate a communication session between the institution and the scraping engine unit based on the communication information; a communication management unit configured to process the user information based on the authentication information; and a script engine configured to transmit an authentication value obtained by processing the user information to the institution, and scrape data from the institution, wherein each of the at least one data scraping information collection unit further comprises a meta database to designate a data item to be scraped, and the scraping engine unit is further configured to extract data to scrape based on the meta database from a data set provided by the institution, wherein each of the at least one data scraping information collection unit further comprises a task management unit to allocate the user information for scraping and a task based on the user information to the scraping engine unit using an internal scheduling algorithm, and wherein each of the at least one data scraping information collection unit further comprises a platform management unit to monitor if the task management unit normally operates, when an error occurs, execute the task management unit again, and store, in the meta database, identification information for identifying a location in which the data item to be scraped is positioned in the data set.
“2. The data scraping system according to claim 1, wherein the user information includes a user’s biometric authentication information.
“3. The data scraping system according to claim 1, wherein each of the at least one data scraping information collection unit further comprises a verification unit configured to verify the scraping request by inspecting if the service program of the transmitting/receiving unit and the scraping engine unit are cross-authenticated.
“4. The data scraping system according to claim 1, wherein each of the at least one data scraping information collection unit further comprises a certificate distribution unit configured to store one or multiple users’ certificates that are cross-authenticated with the data scraping information collection unit, and install a necessary certificate in the scraping engine unit based on a scraping task allocated to the data scraping information collection unit.
“5. The data scraping system according to claim 1, wherein each of the at least one data scraping information collection unit further comprises a scraping error modification unit to, when an error occurs while the user device directly performs scraping, analyze an error log from scraping input information inputted in the user device at the time of error occurrence, and update a scraping module in the user device based on the analyzed error log.
“6. A data scraping system, comprising: a transmitting/receiving unit having a service program that calls inquiry or execution requiring scraping, and configured to receive a scraping request including user information for scraping from a user device and transmit scraped data to the user device; and at least one data scraping information collection unit configured to scrape data from at least one institution using the user information received in the transmitting/receiving unit, wherein each of the at least one data scraping information collection unit comprises a scraping engine unit in platform independent script, the scraping engine unit configured to store identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution, and scrape data from the institution based on the environment information, wherein each of the at least one data scraping information collection unit further comprises a certificate distribution unit configured to store one or multiple users’ certificates that are cross-authenticated with the data scraping information collection unit, and install a necessary certificate in the scraping engine unit based on a scraping task allocated to the data scraping information collection unit, and a module update unit configured to update a certificate installed in the scraping engine unit using a certificate received from the user device, when the certificate installed in the user device and the certificate installed in the scraping engine unit are different.
“7. A data scraping method, comprising: receiving, by a transmitting/receiving unit of a data scraping system having a service program that calls inquiry or execution requiring scraping, a scraping request including information of inquiry or execution called by a user and user information from a user device; allocating, by the data scraping system, the user information and a task based on the user information to a scraping engine unit of the data scraping system, wherein the scraping engine unit is in platform independent script and stores identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution; storing, by the data scraping system, a data item to be scraped in a meta database, wherein the scraping of data from the institution comprises extracting data to scrape based on the meta database from a data set provided by the institution; allocating, by a task management unit of the data scraping system, the user information and the task to the scraping engine unit using an internal scheduling algorithm; scraping, by the scraping engine unit, data from the institution based on the environment information; and transmitting, by the transmitting/receiving unit, the scraped data to the user device; monitoring, by the data scraping system, if the task management unit normally operates, and when an error occurs, executing the task management unit again; and storing, by the data scraping system, identification information in the meta database, the identification information for identifying a location in which the data item to be scraped is positioned in the data set, wherein the communication information comprises at least one of a type of communication encryption protocol or session maintenance time, wherein the scraping of data from the institution comprises: generating, by the scraping engine unit, a communication session between the institution and the scraping engine unit based on the communication information; processing, by the scraping engine unit, the user information based on the authentication information; and transmitting, by the scraping engine unit, an authentication value obtained by processing the user information to the institution, and scraping data from the institution.
“8. The data scraping method according to claim 7, wherein the user information includes the user’s biometric authentication information.
“9. The data scraping method according to claim 7, before the scraping of data from the institution, further comprising: verifying, by the data scraping system, the scraping request by inspecting if the service program of the transmitting/receiving unit and the scraping engine unit are cross-authenticated.
“10. The data scraping method according to claim 7, wherein the data scraping system stores one or multiple users’ certificates that are authenticated with the data scraping system, and the data scraping method further comprises: before the scraping of data from the institution, installing, by the data scraping system, a necessary certificate in the scraping engine unit based on the scraping task allocated to the scraping engine unit.
“11. The data scraping method according to claim 10, further comprising: when an error occurs while the user device directly performs scraping, analyzing, by the data scraping system, an error log from scraping input information inputted in the user device at the time of error occurrence; and updating, by the data scraping system, a scraping module in the user device based on the analyzed error log.
“12. A computer program stored in a medium to perform the data scraping method according to claim 7 in combination with hardware.
“13. A data scraping method, comprising: receiving, by a transmitting/receiving unit of a data scraping system having a service program that calls inquiry or execution requiring scraping, a scraping request including information of inquiry or execution called by a user and user information from a user device; allocating, by the data scraping system, the user information and a task based on the user information to a scraping engine unit of the data scraping system, wherein the scraping engine unit is in platform independent script and stores identification information of the scraping target institution and environment information including authentication information and communication information corresponding to the scraping target institution; installing, by the data scraping system, a necessary certificate in the scraping engine unit based on the scraping task allocated to the scraping engine unit, wherein the data scraping system stores one or multiple users’ certificates that are authenticated with the data scraping system; updating, by the data scraping system, a certificate installed in the scraping engine unit using a certificate received from the user device, when the certificate installed in the user device and the certificate installed in the scraping engine unit are different; scraping, by the scraping engine unit, data from the institution based on the environment information; and transmitting, by the transmitting/receiving unit, the scraped data to the user device, wherein the communication information comprises at least one of a type of communication encryption protocol or session maintenance time.”
For the URL and additional information on this patent, see: Jeong, Jae Won; Back,
(Our reports deliver fact-based news of research and discoveries from around the world.)


Sen. Paul Introduces Bill to Amend Employee Retirement Income Security Act of 1974
Advisor News
- Estate planning during the great wealth transfer
- Main Street families need trusted financial guidance to navigate the new Trump Accounts
- Are the holidays a good time to have a long-term care conversation?
- Gen X unsure whether they can catch up with retirement saving
- Bill that could expand access to annuities headed to the House
More Advisor NewsAnnuity News
- Insurance Compact warns NAIC some annuity designs ‘quite complicated’
- MONTGOMERY COUNTY MAN SENTENCED TO FEDERAL PRISON FOR DEFRAUDING ELDERLY VICTIMS OF HUNDREDS OF THOUSANDS OF DOLLARS
- New York Life continues to close in on Athene; annuity sales up 50%
- Hildene Capital Management Announces Purchase Agreement to Acquire Annuity Provider SILAC
- Removing barriers to annuity adoption in 2026
More Annuity NewsHealth/Employee Benefits News
Life Insurance News
- AM Best Affirms Credit Ratings of Lonpac Insurance Bhd
- Reinsurance Group of America Names Ryan Krueger Senior Vice President, Investor Relations
- iA Financial Group Partners with Empathy to Deliver Comprehensive Bereavement Support to Canadians
- Roeland Tobin Bell
- Judge tosses Penn Mutual whole life lawsuit; plaintiffs to refile
More Life Insurance News