‘Dark data’ muddies picture, holds promise for insurance industry
The term “dark data” sounds rather ominous, says Franklin Manchester, self-described “insurance super nerd” and principal global insurance advisor at SAS. “It's not as nefarious as it sounds,” says Manchester. “It’s just trash data.”
“Dark data” is data that's collected either manually or electronically that is stored and not utilized, he explains. “And this is a growing problem. So I use ‘trash’ in my description because I do think of a landfill. If you're a Star Wars fan, which I clearly am,” says Manchester, motioning to Star Wars posters on his wall, “then I think of the galactic space dump of Raxus Prime and the image of Star Destroyers coming in, just dropping a whole bunch of garbage on the planet surface.”
“It's kind of a tongue-in-cheek analogy, but it is what's going on because the rate at which carriers are collecting data is growing, especially when you think about IOT, and telematics devices. Everything's throwing off a signal these days,” he says.
“By some estimates, carriers use as low as 12% of their data in what they're doing, whether it's pricing policies, creating customer journeys, settling claims, and other estimates placed the inaccessibility rate of the remaining data somewhere between 60% and 80%,” he explains.
IBM studies have shown that as much as 90 percent of data collected is not being utilized.
According to a post on developer.ibm.com, “The proportion of dark data to usable data tends to be huge … IBM estimates that 90 percent of all sensor data collected from Internet of Things devices is never used. This dark data is valuable, however, because it's data that isn't available in any other format. Therefore, organizations continue to pay the cost of collecting and storing it for compliance purposes in the hopes of exploiting it in the future.”
Within the insurance industry, telematics is a rapidly growing data collection area that will likely only continue to grow, adding even more data.
According to a 2022 LexisNexis study, two-thirds (67%) of drivers were aware that their vehicles can capture and transmit telematic data, either through consumers’ mobile phones or apps in the vehicle. While only 22% used their data to get insurance discounts, 71% of those who weren’t using the data for discounts said that they would be interested in doing that.Dark data, or information collected and stored during day-to-day business but left unused by insurers, could bring insights and value – and potentially bolster revenue.
The car insurance app on our phones can earn a safe driving discount by monitoring your braking, night driving and more. It also captures how often you look at your phone while you drive. Insurers don’t use this information or factor it into pricing, and consumers don’t know about it, says Manchester, but with customer opt-in, and combed by AI, it could be used to incentivize even safer driving.
'Dark data' could help price for climate change
Remarkably, dark data could also help insurers monitor and price for climate change, he says.
Manchester, whose background prior to SAS includes nearly 17 years at Nationwide, describes the existence of dark data as “a twofold problem.”
If companies are collecting data they’re not utilizing, they are essentially paying for that data. Also, he says, if the unutilized data is being stored, the company is also paying for the storage. And to make matters worse, he says, “after you've collected it, you can't actually get to it.”
Manchester uses telematics as an example.
“So, one of the interesting [data] fields that’s collected is how often you look at your phone. Many carriers have that capability, yet they're not utilizing it,” he explains. “They're utilizing the rating associated with hard braking, fast acceleration, nighttime driving, idle time, those type of things. And then developed the rating off of it.”
“Well, an interesting thing that occurred to me when I first learned of this capability is the connection from distracted driving to accidents. And not just an accident, a more severe accident. And everyone's looking at their phone. I mean, every government agency that has ties to motor insurance has been talking about ‘putting the phone down’ for years. So, the idea of pivoting from indemnification to prevention starts creeping in.”
“How can you use the instances of distracted driving when you're tracking telematics information anyway to prevent that accident from happening? And that has a downstream effect. Not only are you protecting the life and the property, but the individual that's looking at the phone.”
Using this data, Manchester explains, the driver might be alerted through a text alert, "We see that you've looked at your phone on an average of ‘X’ times per trip. Customers with your profile, on average, experience a 10X likelihood of an accident."
“I have not seen anyone in the industry perform that type of analysis,” says Manchester, “but they are collecting that data.”
As long as the app is running in the background, he explains, the consumer doesn’t have to touch their phone. “The data is being collected based on position and it's becoming very crisp, like incredibly crisp.”
“So the downstream of that is every claim you prevent takes productivity strain off of your workforce, off the insurer.”
“You've got claims adjusters, you've got technology that's helping route those claims, and you have to do some basic administrative work associated with the claim handling process,” says Manchester.
Creating capacity
“If you remove one out of a pile sitting in front of an adjuster, it creates capacity elsewhere. And not just within an insurer, but also the service providers downstream from that as well. Think about the body shop, think about the rental car agency. And everything's getting more expensive. I mean, you cannot open your newsfeed without hearing about inflation and the impact on the economy – getting car parts, finding a rental car. And that problem's not going to go away anytime soon. So, if you prevent a loss, it has a compounding effect in the value chain.”
“I want to talk about ACORD forms,” says Manchester. "It’s a very common form in the industry. And there are a lot of fields under the ACORD form that are not utilized. So if you think about the major carriers who have been around for 40, 50, a hundred years, they will have collected information from ACORD forms for decades. And a lot of them still have mainframe systems. And in the event that they have acquired another organization, they've merged with an organization, bought a book of business, they will have migrated that data from maybe paper applications to the mainframe and then that mainframe to some of their cloud capabilities.”
Inaccurate information
Manchester says that errors in entering or converting previously unused data can lead to incorrect information. He described the case of one company he previously worked for that was transferring data and populating a “years in business” field.
“The company acquired another company. They had their own mainframe system that needed to be integrated with our current mainframe system. So, we went through an effort where we converted that information from the prior mainframe ACORD forms and populated it into our new mainframe. One of those fields was ‘years in business,’ which at the time was not a rating variable, but we collected it. And that information was stored as data and that storage cost us money, etc.”
“For a decade or more, I believe, between when we did the [data] conversion and when we went to our new platform, we were paying for the representation of dark data. No one was using that field to do anything,” he says.
To compound the issue, Manchester said that once the ACORD data was integrated into a new system and the company planned to start using it, some of it turned out to be incorrect, because it had not been correctly verified.
“Fast-forward, we started using the ‘years in business’ field. That kind of sounds like a success story.” That data was flawed, however. At one point in the company’s history, says Manchester, it turns out those doing the data entry were told to “if you don't know the years in business, just make it up.”
“Inaccuracy of ‘dark data’ exacerbates the problem,” he explains.
Not being able to access dark data is also an issue, Manchester explains.
“The technology used to capture it could be proprietary, was built homegrown by a cobalt programmer, or something similar. And the persons who did that work eventually retire, or they move on to other jobs … You have the proprietary technology with the people who know how to access the data who are no longer there. That creates inaccessibility.”
“When we talk about inaccessibility, the industry's systemic issue is the age and knowledge of the technology,” he says.
Business unit data 'not being connected'
“With property and casualty insurers specifically, there is relationships within the business units that are not being connected.”
“Within the four walls of insurers, what do you hear? Cross sell. Upsell. I started my career as an insurance agent. It was lesson one: ‘What products do you have with us? Let's get at a share of wallet. We have auto, home, business, life, retirement, long-term care.’ It's in the playbook of any insurer globally. What else can we sell you? It makes the customer more sticky. Retention goes up, cost to manage the account goes down.”
Within the verticals of an insurance organization, departments often are responsible for only one line of insurance, says Manchester, and they become siloed. This disconnection between data in various departments also creates a form of dark data where data within a single company is “disconnected,” Manchester explains.
Manchester uses his own situation as an insurance customer as an example.
“For my life insurance policy, my agent wrote as ‘Frank J. Manchester’ and he wrote it when I was living in a different state.”
“Then you have my current information, it says ‘Franklin J. Manchester’ with a different address and a different state, and a different customer identification number. And because the insurers are standing up these vertical leaders, there is dissonance that's being created between these data sets,” he says.
“If you could just tether those back together, well, you would have a better full, clear picture of who you have insured. And that, again, has implications downstream.”
“Say, you buy a different insurer, or you merge with an insurer that I just happen to have a product with because you didn't offer something at the time as an example. Maybe I decide I'm going to get my next insurance policy through one of the third parties in an open and embedded insurance ecosystem. Think about Toyota Insurance Management Solutions, which offers a white label insurance product. How do they understand that I may already be insured with you or offer their products for another carrier?”
“The most useful thing they could do is just federate their data internally, figure out how to tether it all and break down those silos because it'll increase their speed, it'll increase their efficiency, and they'll have a better understanding of their customer base.”
Manchester says one study on disconnected data found that, on average, “insurers had 50 different data sources internally that they were using to make decisions and build models.”
“And again, from my personal experience, that data was likely collected differently by different people using different technology or manual processes over a continuum.”
“If you're looking at my customer profile, it has changed drastically over the last 20 years. When I started in insurance 20 years ago, I was a single individual in my twenties having just graduated college, living in a small rural town in North Carolina. Fast-forward. Today, I'm the married, with two kids, a retirement plan, etc. My needs have changed over time. So how that data has been collected has also evolved. So I think that when you look at how prevalent it is and how pervasive and how invasive it is with insurers, it's a real problem and it's a global problem.
“What's more concerning about dark data for me is not necessarily the cost associated with the data that's been collected, but more how it's deployed. And that's maybe twofold… If you're deploying models with a minimal amount of information, like 12%, it’s laughable.”
“Then of that 12%, how much is accurate? So, if half of is accurate, it's really 6% that is actionable and the other 6% is harming you. It's really a concern.”
John Forcucci is InsuranceNewsNet editor-in-chief. He has had a long career in daily and weekly journalism. Contact him at [email protected]. Follow him on Twitter @INNJohnF.
© Entire contents copyright 2023 by InsuranceNewsNet.com Inc. All rights reserved. No part of this article may be reprinted without the expressed written consent from InsuranceNewsNet.com.
John Forcucci is InsuranceNewsNet editor-in-chief. He has had a long career in daily and weekly journalism. Contact him at johnf@innemail.
States where people spend the most, least on insurance
Contents claim handling: avoiding common bottlenecks
Advisor News
Annuity News
Health/Employee Benefits News
Life Insurance News