We crawl millions of different sources every day, from PR news and company blogs to job openings.
Using classification models we categorize useful content and identify meaningful texts.
Our proprietary models extract various entities (organizations, divisions, persons, financing types, products etc.) and relationships between them.
Organization entities are linked to unique IDs (with domains) in our database for further manipulations.
Entities are normalized with a set of rule-based approaches and then sent to the deduplication system.
We employ multiple data analysts who monitor and verify data on a daily basis to ensure the data is of highest quality.