This article was updated Jan 2019 to reflect the latest changes in product updates.
Take a moment and google the startup Lemonade that is working to disrupt the boring old insurance industry. This company has been making some news lately, you should be able to find it pretty easily, right?
Here’s what I got:
- A recipe for how to make the perfect cup of lemonade.
- Information about the Beyoncé album.
- And yes, after the company spent an incredible amount of money on SEO, also results about Lemonade, the Insurtech company.
I had to wade through a lot of junk to finally find what I was looking for!
Now, imagine you’re an analyst, and that you have to search for Lemonade in hundreds of articles, blogs, and other unstructured data. Imagine all the irrelevant stuff you’ll have to filter out. Now consider doing that for millions of articles, posts and so on. You simply can’t do it; no human can.
Sounds like a good place for machines to step in and do the work for us, right? Surprisingly though, figuring out which “Lemonade” is which is VERY difficult for machines as well.
Solving this problem is what is known as “Entity Recognition.”
The Solution of Entity Recognition
Three years ago, when we first tried to tackle this, we had to deal with an unimaginable amount of false positives. Correctly flagging company names in unstructured text, seemingly a straightforward problem, proved to be one of our most difficult tasks.
After more than three years of development, I can say we’ve tamed this beast.
We built a machine that knows, within any unstructured data source, when it sees Lemonade the company, and to disregard the drink, the album, and and other irrelevant results.
So if you need to know about compelling events such as funding, executive management changes, partnerships, product launches, legal problem and more? We can process millions of articles and find them for you. And we won’t bombard you with recipes or album reviews in the process.
Using this technology we can also map competitors, highlight similarities and differences, all based on detecting news mentions and the news context. This part builds on the initial Entity Recognition, and is a whole separate complex NLP problem.
Truth be told, I’m not an engineer, and I’m managing high tech companies without a deep tech background. But I know hard engineering problems, and I do know we identified a hard problem that required a deep solution, and that we built the best technology team to solve it.
My point? In order to be a entrepreneur, you don’t have to be a technology wizard. You need to identify a real problem and to solve it.
Even if the problem seems simple, or doesn’t even seem like a problem when you start out – and you have to be ready for it to be way more complex to solve than you originally thought or planned.
We built four subsystems which help us put together a holistic solution:
- A proprietary Entity Recognition system (ER), focused on automatically detecting companies in unstructured text.
- A proprietary Event Detection process, focused on solving the complex problem of. knowing when companies encounter or achieve critical milestones or setbacks.
- A classification and tagging model, focused on classify companies according to the Global Industry Classification Standard (GICS).
- A similarity model, which identifies types and strength of semantic relationships between companies.
Our models are trained and retrained daily.
The result is a fully automated company snapshot like this one:
See the full version: https://www.zirra.com/auto-memo/lemonade
Currently we have over 5M companies, private and public, in our database, and more than 2.5 years of history for them!
Other companies power their analysis with people.
We power it with Artificial Intelligence; Natural Language Processing and Machine Learning.
As a result we can and are happy to offer you the use of our database for FREE.
Use Zirra.com to:
- Analyze companies for investment purposes
- Learn about your competitors
- Understand the company where you’re considering applying for a job
- Any other reason you like!
Our output might not be perfect; the machine is always learning – but it gets better the more you use it.
Don’t take my word for it, try it for yourself.
This article originally appeared in Medium on November 7, 2018.