Patent Issued for Sensitive data identification in real time for data streaming (USPTO 11757837): International Business Machines Corporation
2023 OCT 03 (NewsRx) -- By a
The patent’s assignee for patent number 11757837 is
News editors obtained the following quote from the background information supplied by the inventors: “Identifying and protecting sensitive data is critical for data protection and for meeting regulation requirements (general data protection regulation (GDPR), the
“The data firewall typically captures or sniffs data accesses to a database (e.g., requests and responses) in real-time and analyzes the data according to policy rules to identify sensitive data. The data firewall may include a data activity monitor (DAM) and/or file activity monitor (FAM). The requests and responses sniffed by the data firewall may include data packets that may include a query, e.g., a structured query language (SQL) requests, or a response, and associated header information. The header may include metadata such as machine information, network information, user information, client information, etc.
“The classification of data may be performed by parsing the captured data packets, extracting the mapping between the metadata and data (e.g., field name for every value), running a rule engine against the metadata and then scanning the data itself to identify sensitive data. Currently, DAM and FAM products are classifying the captured data offline due to the complexity and performance requirements of the classification process. However, using the classifier in offline mode may be too late for preventing data breach or data tampering.
“Therefore, a method for online classification and identification of sensitive data for data streaming is required.”
As a supplement to the background information on this patent, NewsRx correspondents also obtained the inventors’ summary information for this patent: “According to embodiments of the invention, a system and method for classifying data in real-time may include: capturing a plurality of data packets flowing between a data source machine and a data client; searching at least one of the data packets for tokens associated with sensitive information; if tokens associated with sensitive information are not found in a data packet: allowing the data packet to flow between the data source machine and the data client; and sending the data packet to a comprehensive security analysis, and if tokens associated with sensitive information are found in the data packet: preventing the data packet form flowing between the data source machine and the data client; sending the data packet to a comprehensive security analysis.
“Furthermore, if tokens associated with sensitive information are found in the data packet, embodiments of the invention may include continuing to prevent the data packet from flowing between the data source machine and the data client if the comprehensive security analysis finds security issues: and allowing the data packet to flow between the data source machine and the data client if the comprehensive security analysis finds no security issues.
“According to embodiments of the invention, the data source machine may be selected from: a database server, a file server, a proxy and a database server, a combination of a proxy and a file server, a combination of a network gate and a database server, and a combination of a network gate and a file server.
“According to embodiments of the invention, the data packet may be one of: a query sent from the data client to the data source machine, and a response sent from the data source machine to the data client.
“According to embodiments of the invention, capturing and searching may be performed by a software agent that is installed on the data source machine.”
The claims supplied by the inventors are:
“1. A method for classifying data in real-time, the method comprising: capturing a plurality of data packets flowing between a data source machine and a data client; searching a header of at least one of the data packets for metadata to determine whether the data packet should be allowed or should be further analyzed, wherein the metadata includes at least one of machine information, network information, user information, and client information; if the search of the header indicates that the at least one data packet should be further analyzed, searching raw data of a payload of the at least one of the data packets for tokens, values, expressions, words or phrases associated with sensitive information streaming in or out of a database in real-time without parsing the data packets or knowing which values in the payload fit into each field; if, during the searching of the raw data of the payload, the tokens, values, expressions, words or phrases associated with sensitive information are not found in the payload of a data packet: allowing the data packet to flow between the data source machine and the data client and sending a copy of the data packet to an offline comprehensive security analysis; if, during the searching of the raw data of the payload, tokens, values, expressions, words or phrases associated with sensitive information are found in the data packet: performing a wildcard search, a dictionary search, and a regular expression search of the payload in parallel in parallel for identified terms; and if identified terms are detected, preventing the data packet from flowing between the data source machine and the data client and sending the data packet or a copy of the data packet along with results from the searching of the raw data of the payload, to the offline comprehensive security analysis.
“2. The method of claim 1, wherein if tokens, values, expressions, words or phrases associated with sensitive information are found in the data packet during the searching of the raw data of the payload the method further comprises: permanently blocking the data packet from flowing between the data source machine and the data client or discarding the data packet, if the offline comprehensive security analysis finds security issues; and allowing the data packet to flow between the data source machine and the data client if the offline comprehensive security analysis finds no security issues.
“3. The method of claim 1, further comprising: enhancing or adjusting classification rules used during the searches of the raw data of the payload of the at least one data packets based on the offline comprehensive security analysis wherein: if during the searching of the raw data tokens, values, expressions, words or phrases associated with the sensitive information was found and the offline comprehensive security analysis did not find the at least one packet to contain the sensitive information, then removing search terms from the classification rules that invoked during the searching of the raw data identification of the tokens, values, expressions, words or phrases as being associated with the sensitive information; and if during the searching of the raw data tokens, values, expressions, words or if during the searching of the raw data tokens, values, expressions, words or phrases associated with sensitive information was not found, but the offline comprehensive security analysis did identify the sensitive information within the at least one data packets, then adding search terms to the classification rules that invoked the offline comprehensive security analysis to identify the sensitive information.
“4. The method of claim 1, wherein the data packet is one of: a query sent from the data client to the data source machine, and a response sent from the data source machine to the data client.
“5. The method of claim 1, wherein capturing and searching are performed by a software agent that is installed on the data source machine.
“6. The method of claim 5, wherein performing the offline comprehensive security analysis is performed by a dedicated security server, and wherein the data packet is sent to the dedicated security server for performing the offline comprehensive security analysis.
“7. The method of claim 1, wherein searching the raw data of a payload of the at least one of the data packets for tokens, values, expressions, words or phrases associated with sensitive information streaming in or out of the database in real-time without parsing the data packets or knowing which values in the payload fit into each field further comprises: calculating a security score of the at least one of the data packets as a combination of findings from a regular expression search and a dictionary search, wherein the regular expression search and the dictionary search are associated with a weight and the security score is calculated as a function of the weights and if the security score is above a threshold, the data packet is identified as having sensitive information.
“8. The method of claim 1, wherein the offline comprehensive security analysis comprises: parsing the data packet; mapping metadata to data; building hierarchy of the data; and processing policy rules.
“9. The method of claim 1, comprising: issuing a security alert if tokens, values, expressions, words or phrases associated with sensitive information are found in the data packet and the offline comprehensive security analysis finds security issues.
“10. The method of claim 1, comprising: after capturing, decrypting the plurality of data packets to obtain a header of each packet; analyzing the headers to determine security status of packets associated with the headers; and selecting the at least one data packet based on the security status.
“11. A system for classifying data in real-time, the system comprising: a memory; and a processor configured to: capture a plurality of data packets flowing between a data source machine and a data client; search a header of at least one of the data packets for metadata to determine whether the data packet should be allowed or should be further analyzed, wherein the metadata includes at least one of machine information, network information, user information, and client information; if the search of the header indicates that the at least one data packet should be further analyzed, search raw data of a payload of the at least one of the data packets for tokens associated with sensitive information streaming in or out of a database in real-time without parsing the data packets or knowing which values in the payload fit into each field; if, during the searching of the raw data of the payload, the tokens, values, expressions, words or phrases associated with sensitive information are not found in the payload of a data packet: allow the data packet to flow between the data source machine and the data client and send a copy of the data packet to an offline comprehensive security analysis; if, during the searching of the raw data of the payload, tokens, values, expressions, words or phrases associated with sensitive information are found in the data packet: perform a wildcard search, a dictionary search, and a regular expression search of the payload in parallel in parallel for identified terms; and if identified terms are detected, prevent the data packet from flowing between the data source machine and the data client and send the data packet or a copy of the data packet, along with results from the searching of the raw data of the payload, to the offline comprehensive security analysis.
“12. The system of claim 11, wherein if tokens, values, expressions, words or phrases associated with sensitive information are found in the data packet during the searching of the raw data of the payload, the processor is configured to: permanently block the data packet from flowing between the data source machine and the data client or discard the data packet, if the offline comprehensive security analysis finds security issues; and allow the data packet to flow between the data source machine and the data client if the offline comprehensive security analysis finds no security issues.
“13. The system of claim 11, wherein the processor is further configured to: enhance or adjust classification rules used during the searches of the raw data of the payload of the at least one data packets based on the offline comprehensive security analysis wherein: if during the search of the raw data tokens, values, expressions, words or phrases associated with the sensitive information was found and the offline comprehensive security analysis did not find the at least one packet to contain the sensitive information, then removing search terms from the classification rules that invoked during the searching of the raw data identification of the tokens, values, expressions, words or phrases as being associated with the sensitive information; and if during the search of the raw data tokens, values, expressions, words or phrases associated with sensitive information was not found but the offline comprehensive security analysis did identify the sensitive information within the at least one data packets, then adding search terms to the classification rules that invoked the offline comprehensive security analysis to identify the sensitive information.
“14. The system of claim 11, wherein the data packet is one of: a query sent from the data client to the data source machine, and a response sent from the data source machine to the data client.
“15. The system of claim 11, wherein that the processor is installed on the data source machine, and wherein performing the offline comprehensive security analysis is performed by a dedicated security server, and wherein the processor is configured to send the data packet to the dedicated security server for performing the offline comprehensive security analysis.”
There are additional claims. Please visit full patent to read further.
For additional information on this patent, see: Biller,
(Our reports deliver fact-based news of research and discoveries from around the world.)
Patent Issued for High speed mainframe application tool (USPTO 11755388): Fidelity Information Services LLC
Patent Issued for Transaction card with luminous display (USPTO 11755871): Chuang Lien Hao
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News