For Enigma’s sales and marketing customers, we are always looking for ways we can use the huge range of data in our product to better tailor and segment businesses for targeting and outreach.
Enigma has >40 million websites in our data, and for a while we’ve been thinking about how to better leverage this data, use it to learn even more about the businesses in our product, and ultimately help our customers with better targeting for their sales and marketing campaigns.
So, I am excited to share more about our newly launched website scraping system. We are now able to scrape the content of all the business websites stored in our database. If you think about all the information we as humans can glean about a business by looking at its website, this opens up huge possibilities!
Two areas we decided to focus on initially - as important to our sales and marketing customers - were:
Where we started: Enigma provides a NAICS code for >90% of marketable businesses in our product, with >90% precision. We provide a 6 digit NAICS code for ~60% of all marketable businesses. We wanted to use the website scrape data to help push our NAICS code coverage of marketable businesses even closer to 100%, and to ensure that even more of these were 6 digit NAICS, without sacrificing our 90%+ industry precision.
We already use machine learning models to predict NAICS codes, and while the business name and other attributes are important features in this model, we realized that the content of a business’ website is far more valuable (it is the first place I would go to try and decide which industry a business is operating in!).
As a human, if I look at a website like apple.com, it is clear to me this business is a technology company, engaged in manufacturing and selling phones, computers, watches and many other products and services (Apple TV, Apple Care etc.). An example NAICS code for this business is NAICS Code: 334210 - Telephone Apparatus Manufacturing, along with other NAICS codes related to the other product and business lines at the company.
If I try to build a simple machine learning model to predict the industry of a business named Apple, with website “apple.com”, it is not unreasonable for a model that has not been trained on this example, and with no context of what Apple does in the real world for it to guess that this business might be in NAICS Code: 111310 - Apple Orchards.
Today, if I ask ChatGPT which NAICS code a business called “Apple” with website “apple.com” is in, it immediately assigns NAICS Code: 334210 - Telephone Apparatus Manufacturing - with the caveat that the business also operates in other NAICS codes (for computer manufacturing, software publishing etc.).
However, I don’t necessarily need to go to ChatGPT to get this result. In fact, doing so would be unnecessarily expensive and slow on a per-record basis. Our data scientists found they were able to achieve results on par with LLMs (models like ChatGPT) by first using an LLM to predict the NAICS code, based on a small set of website scrapes. Then they were able to use this set as training data, feeding this into a pre-trained BERT model, and apply this fine-tuned BERT model to the entire population.
For our customers - this means an even higher proportion of our businesses have 6 digit NAICS codes (rather than say 2, or 3 digit NAICS), as our website scrapes allow for higher confidence predictions of granular codes. This also means that some businesses for which we see card revenue and have a website, but did not previously have an industry prediction, we now have an industry prediction - so customers have more leads with accurate revenues, in their target industries.
Another focus area for many of our sales and marketing customers is identifying businesses with ecommerce capabilities (i.e., those that accept online payments for goods or services).
Our existing ecommerce model identified around 700,000 ecommerce businesses in the US, but we know there are many more and wanted to expand this coverage to better serve our customers who want to target these businesses - whether they offer online payment processing, online buy-now-pay-later or point of sale financing, or offer auxiliary services to online retailers such as fulfillment/delivery of online orders.
Again, the first place I would look when trying to establish if a business has ecommerce capabilities is to go to their website and figure out if you can buy anything on there! Therefore, feeding our website scraped data to our ecommerce model would surely be a good way to identify additional ecommerce businesses, so that the model could access the content of the website when making its prediction, in the same way you or I can.
With the inclusion of our scraped data, we are now able to identify >1.5M US businesses that accept online payments. This is a huge win for customers who care about online payment processing, consumer financing of online purchases, fulfillment of online orders, as we can now provide even larger prospecting lists within their ICP.
Current customers can reach out to their CS representatives with questions and feedback. If you’re new to Enigma and interested in our KYB products, please get in touch.