“It’s kind of a black box.” We hear this frequently from small business lenders about the data they’re using and what they know — and more often don’t know — about how this data comes to exist.
While opacity may be the norm for incumbent data providers, at Enigma, transparency is an operating principle. We believe you should have access to information about the data you’re evaluating and the processes through which this data was developed. This visibility is essential for understanding the strengths and weaknesses of a data provider and their approach to development.
In the spirit of transparency, here is a step-by-step overview of our process so that you can see everything we do to transform raw source data into data attributes you can access instantly.
In layman’s terms, a data attribute is a piece of information about a company; it’s one piece of the puzzle.
A data attribute is a broad category, and may have multiple data points beneath it. For example, “industry” is a data attribute which includes NAICS codes, an industry text label, and flags for whether the business partakes in certain activities like ecommerce.
Ideas for new attributes often surface from conversations with current customers.
“Customers are key in our process. Ongoing inbound attribute requests from customers help us build & prioritize our roadmap,” said product manager Nick Hershey. “We have a working relationship with every single one of our customers.”
In addition to customer requests, Enigma constantly explores new data sources that will help financial institutions gain a clearer picture of the small businesses they serve.
Developing new data attributes isn’t just about understanding a small business today; it’s also about helping customers predict where the small business will be tomorrow. This is where our data science edge really becomes evident.
“The raw amount of data science horsepower we have here is very special,” fellow product manager Jordan Dominguez, stated. “We have people who have PhDs who are focused on attribute forecasting. I think that that's unique to Enigma.”
We always explore a variety of data sources to determine how to deliver the highest quality attribute, unlike traditional data providers.
Pam Wu, Head Data Scientist at Enigma, explained our distinctive approach, “For us, data sourcing is key. The incumbent providers tend to use self-reported data and more standard sources. We’re able to be more creative, and only offer data that we can verify.”
Enigma’s DNA is in public data, so we’re skilled at getting value from complex government sources. We also have an appetite for alternative data, leveraging online data sources.
“Incumbent providers had access to credit histories and utility bills, but in the current climate, those data sources are too out of date for most clients,” Wu added.
Enigma’s approach to data sources allows us to provide fresher, more relevant data than many competitors.
Dominguez adds, “We want to make sure that we're exploring all available resources for an attribute, and that we're vetting that data quality from the very beginning. Sometimes it's not so much a case of finding the one perfect source...you have to triangulate across two or three sources.”
After the team has identified the best sources for an attribute, we begin a rigorous quality assurance process. The QA process exists to deliver clear, valuable information that customers can use.
To achieve this, Enigma focuses special attention on entity resolution and accuracy.
Entity resolution is all about making sure that the data attributes are connected to the right small businesses. One of Enigma’s key value-adds, according to Hershey, is our ability to tie all the different pieces of data together to a company in a persistent way.
“We persist entities through the entire lifecycle of the business, even as aspects of them (like industry or acquisitions) might change,” Hershey explained.
Accuracy is essential, and we consider accuracy from many different perspectives. For example, how often is the data attribute correct? What aspects of the data might be missing?
“We don’t just rely on self-reported data,” Wu also explained. “We collect our own data, run our own tests, and we are very harsh self-critics. As a result, we see 80 percent or higher precision on our data attributes.”
Some data attributes are developed in as few as two weeks, but most are developed in 1-3 months. When compared to the incumbent providers, who can take years to release a new attribute, this is remarkably fast.
When the data attribute is first ready, Enigma releases a public beta version and engages with interested customers to test it.
After launching, the data is refreshed regularly - sometimes every day, depending on the attribute - via our publicly accessible API. Enigma’s API makes it easy to use the data: customers can instantly access the data and seamlessly integrate it into their databases, processes, and statistical modeling without friction.
Each data attribute is ultimately offered a la carte to ensure that customers get the exact attributes they need. As Hershey explained, “Most data providers charge tens of dollars for a report with a lot of information some customers don’t end up using; Enigma charges just cents [per API call] for a data attribute so each customer can get exactly what they want.”
At Enigma, developing data attributes is a dynamic, iterative process. This process is designed to not only improve the currently available attributes, but also expand the ways we tell the story of a business, in terms of both quality and breadth of data available.
Our culture of rich customer involvement fuels the improvement of existing data attributes. Feedback from customers enables us to understand how attributes are performing in real-world processes, and how we can make the data more valuable.
The result is fresh, reliable data about small businesses that financial institutions can seamlessly implement into their risk management, monitoring, and readiness processes.
As COVID-19 continues to challenge the small business economy, our ongoing data attribute development, especially our credit risk data, is making it easier for financial institutions to serve small businesses.
“This customer focus, and the speed with which we can spin up data attributes, is more relevant than ever right now with COVID-19’s effect on small businesses,” Hershey said. “Our goal is to build a story around a small business - the full, complete story.”