Patent Issued for Mapping of personally-identifiable information to a person based on natural language coreference resolution (USPTO 11741163): Box Inc.
2023 SEP 14 (NewsRx) -- By a
Patent number 11741163 is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: “Cloud-based content management services and systems have impacted the way personal and enterprise computer-readable content objects (e.g., files, documents, spreadsheets, images, programming code files, etc.) are stored, and have also impacted the way such personal and enterprise content objects are shared and managed. Content management systems provide the ability to securely share large volumes of content objects among trusted users (e.g., collaborators) on a variety of user devices, such as mobile phones, tablets, laptop computers, desktop computers, and/or other devices. Modern content management systems can host many thousands or, in some cases, millions of files for a particular enterprise that are shared by hundreds or thousands of users. To further promote collaboration over the users and content objects, content management systems often provide various user communication tools, such as instant messaging or “chat” services. These communications may also be saved to create additional content objects managed by the systems.
“The foregoing content objects managed by the content management systems may include personally identifiable information (PII). The PII may be included in some content objects (e.g., social security numbers in tax forms) or may be extemporaneously embedded in other content objects (e.g., a contact phone number entered in a chat conversation). In many cases, neither the person nor even the candidate persons that are potentially associated with the PII in the content objects are necessarily known a priori. For example, the person associated with an instance of PII in a content object may or may not be a user of the content management system that manages the content object. Even with this as a backdrop, stewards of large volumes of electronic or computer-readable content objects (e.g., content management systems) must comply with the various laws, regulations, guidelines, and other types of governance that have been established to monitor and control the use and dissemination of personally identifiable information (PII) contained in the content objects.
“In the United States, for example, the federal statutes known as the Security Rule of the Health Insurance Portability and Accountability Act (HIPAA) was established to protect a patient’s PII while still allowing digital health ecosystem participants access to needed protected health information (PHI). As another example, the California Consumer Privacy Act (CCPA) is a state statute intended to enhance privacy rights and consumer protection to
“Unfortunately, there are no known techniques for identifying and controlling personally identifiable information embedded in large volumes of content objects. While certain approaches exist for identifying instances of PII in content objects, such approaches are limited in their ability to correlate that PII to specific people. Specifically, when the context surrounding an instance of PII does not explicitly identify a person associated with the PII, existing approaches are deficient in determining-with an acceptable level of confidence-who owns or is associated with the PII. What is needed is are ways to confidently and securely associate a particular instance of PII to a particular person. Furthermore, what is needed are techniques that address ongoing management of personally identifiable information that is embedded across arbitrary corpora of content objects.”
In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “This summary is provided to introduce a selection of concepts that are further described elsewhere in the written description and in the figures. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter. Moreover, the individual embodiments of this disclosure each have several innovative aspects, no single one of which is solely responsible for any particular desirable attribute or end result.
“Disclosed herein are various techniques for determining that a particular set of personally identifiable (PII) information belongs to a subject entity-even when the PII is not explicitly or directly associated with the subject entity’s name. Various of the disclosed techniques serve to identify aliases that are deemed to, at least potentially, refer to the subject entity. The PII of the aliases that are deemed to be aliases of the subject entity can thus be deemed to be PII of the subject entity.
“Identification of aliases of a subject entity can be carried out by processing a corpus of content objects to (1) identify a first set of personally identifiable information associated with a name or alias and (2) to identify a second set of personally identifiable information associated with another name or alias. The first set of personally identifiable information associated with the name or alias is codified in a first portion of a graph. Similarly, the second set of personally identifiable information is codified in a second portion of the graph. Upon a determination that the identified names and/or aliases refer to the same person, then the first portion of the graph and the second portion of the graph are deemed to be associated with each other. Since those two portions of the graph refer to the same person (e.g., the subject entity), then the graph can be queried and traversed so as to recognize that both the first set of personally identifiable information as well as the second set of personally identifiable information belong to the same person.
“As can be seen, the second set of personally identifiable information can be deemed to be PII of the subject entity, even though the information that is used to form the second portion of the graph does not explicitly identify by name the person associated with the PII. In some embodiments, the determination that a first identified name or alias and a second identified alias refer to the same person can be made on the basis that the first identified name or alias and the second identified alias share PII in common. For example, given the phrase, “Johnathan Smith has a social security number of 123-45-6789”, and given the phrase, “John’s social security number is 123-45-6789”, then “Johnathan Smith” and that occurrence of the alias “John’s” can be deemed to refer to the same person.
“In some embodiments, the determination that a name and an alias refer to the same person can be made on the basis of linguistic analysis (e.g., by identifying and analyzing pronominal anaphoric references) to determine that the alias is referring to the same person who is identified by name.”
The claims supplied by the inventors are:
“1. A method for associating personally identifiable (PII) information to an entity and the entity’s aliases, the method comprising: processing one or more content objects to identify a first set of personally identifiable information associated with a first alias; further processing the one or more content objects to identify a second set of personally identifiable information associated with a second alias; forming a first portion of a graph at least by populating the first portion with first PII nodes and a first entity node and further by connecting the first PII nodes with the first entity node, wherein the first PII nodes and the first entity node are determined from the one or more content objects to respectively represent the first set of personally identifiable information and the first alias; forming a second portion of the graph at least by populating the second portion with one or more second PII nodes and a second entity node and further by connecting the second PII nodes with the second entity node, wherein the second PII nodes and the second entity node are determined from the one or more content objects to respectively represent the second set of personally identifiable information and the second alias, and the first portion and the second portion are disconnected from each other in the graph; augmenting the graph at least by adding one or more edges that connect at least the first entity node or a first PII node in the first portion of the graph to the second entity node in the second portion of the graph; and processing a query to find personally identifiable information of the entity by traversing over at least one of the one or more edges added to the graph.
“2. The method of claim 1, further comprising: querying the graph, with a name of a person, wherein results of the query comprises (1) a first item of personally identifiable information of the person, and (2) a different item of personally identifiable information of an alias correlated with the person.
“3. The method of claim 1, further comprising: querying the graph, with a person of interest’s alias, wherein results of the query comprises (1) a first item of personally identifiable information corresponding to the person of interest’s alias, and (2) a different item of personally identifiable information of the person of interest.
“4. The method of claim 1, further comprising querying the graph to determine a subject content object that includes at least one of (1) a first item of the first set of personally identifiable information of a person, and (2) a second item of the second set of personally identifiable information of an alias correlated with the person.
“5. The method of claim 4, further comprising modifying the subject content object to eliminate occurrences of (1) the first item of the first set of personally identifiable information of the person, and (2) the second item of the second set of personally identifiable information of the alias.
“6. The method of claim 1, wherein at least one item of the first set of personally identifiable information is identified, based at least in part on a PII detection rule.
“7. The method of claim 1, wherein at least one item of the first set of personally identifiable information is identified, based at least in part on identification of pronominal anaphoric references.
“8. The method of claim 7, wherein the pronominal anaphoric references are taken from context of a PII instance.
“9. The method of claim 1 wherein a database is accessed to determine that the first alias and the second alias refer to the same person.
“10. The method of claim 9 wherein the database is accessed using a lightweight directory access protocol (LDAP).
“11. A non-transitory computer readable medium having stored thereon a sequence of instructions which, when stored in memory and executed by one or more processors causes the one or more processors to perform a set of acts for associating personally identifiable (PII) information to an entity and the entity’s aliases, the set of acts comprising: processing one or more content objects to identify a first set of personally identifiable information associated with a first alias; further processing the one or more content objects to identify a second set of personally identifiable information associated with a second alias; forming a first portion of a graph at least by populating the first portion with first PII nodes and a first entity node and further by connecting the first PII nodes with the first entity node, wherein the first PII nodes and the first entity node are determined from the one or more content objects to respectively represent the first set of personally identifiable information and the first alias; forming a second portion of the graph at least by populating the second portion with one or more second PII nodes and a second entity node and further by connecting the second PII nodes with the second entity node, wherein the second PII nodes and the second entity node are determined from the one or more content objects to respectively represent the second set of personally identifiable information and the second alias, and the first portion and the second portion are disconnected from each other in the graph; augmenting the graph at least by adding one or more edges that connect at least the first entity node or a first PII node in the first portion of the graph to the second entity node in the second portion of the graph; and processing a query to find personally identifiable information of the entity by traversing over at least one of the one or more edges added to the graph.
“12. The non-transitory computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of: querying the graph, with a name of a person, wherein results of the query comprises (1) a first item of personally identifiable information of the person, and (2) a different item of personally identifiable information of an alias correlated with the person.
“13. The non-transitory computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of: querying the graph, with a person of interest’s alias, wherein results of the query comprises (1) a first item of personally identifiable information corresponding to the person of interest’s alias, and (2) a different item of personally identifiable information of the person of interest.
“14. The non-transitory computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of querying the graph to determine a subject content object that includes at least one of (1) a first item of the first set of personally identifiable information of a person, and (2) a second item of the second set of personally identifiable information of an alias correlated with the person.
“15. The non-transitory computer readable medium of claim 14, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of modifying the subject content object to eliminate occurrences of (1) the first item of the first set of personally identifiable information of the person, and (2) the second item of the second set of personally identifiable information of an alias referring to the person.
“16. The non-transitory computer readable medium of claim 11, wherein at least one item of the first set of personally identifiable information is identified, based at least in part on a PII detection rule.
“17. The non-transitory computer readable medium of claim 11, wherein at least one item of the first set of personally identifiable information is identified, based at least in part on identification of pronominal anaphoric references.
“18. The non-transitory computer readable medium of claim 17, wherein the pronominal anaphoric references are taken from context of a PII instance.”
There are additional claims. Please visit full patent to read further.
URL and more information on this patent, see: Ojha, Alok. Mapping of personally-identifiable information to a person based on natural language coreference resolution.
(Our reports deliver fact-based news of research and discoveries from around the world.)
Patent Issued for Connecting users to entities based on recognized objects (USPTO 11740775): State Farm Mutual Automobile Insurance Company
Texas man pleads guilty to COVID-19 testing fraud scheme resulting in $7 million loss [Fort Worth Star-Telegram]
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News