LeadGenius delivers the highest lead generation value in the marketplace enabling businesses to connect with their prospective customers.

LeadGenius was looking to improve its data processing pipeline striving to automate manual processes and make the platform ready to scale.

Squadex built data processing and storage solution, which provides customers with access to the data, through intricately built algorithms which clean and enrich data to solve entity resolution problems.

As a result of an implementation LeadGenius achieved an automated data pipeline, enabled the solution to perform faster, delivered quality data, and allowed to roll-out marketing and sales activities ahead of the schedule.

2 months

time to market

500 GB

data volume


30 billion

35 billion


Squadex Big Data Services

Data Warehouse & Business Intelligence

Big Data Platform & Next Generation BI

Advanced Analytics & Data Science

About LeadGenius

LeadGenius is the most efficient way to equip sales and marketing teams with custom B2B lead data at scale. LeadGenius data experts conduct ongoing research to automatically supply teams with accurate contact information and hard-to-acquire data points for your most valuable buyer personas. By combining a remote global workforce, custom data, and machine learning, LeadGenius delivers the highest lead generation value in the marketplace.


LeadGenius’ data pipeline processes was performed manually and didn’t ensure data quality and consistency. The data parsed from various sources should be verified and incorporated into a giant dataset of leads, and then provided to the customers.

Squadex was briefed to build an automated and scalable data processing pipeline. We were required to design and implement fault tolerant solution and run it on demand, while maintaining the ability to withstand possible faults within certain components. The architecture must be highly scalable with elastic applications and optimized to run on AWS.


Squadex designed and implemented a data parsing and processing pipeline on Apache Spark managed by AWS EMR and a data storage solution using S3, RDS (PostgreSQL), Redshift and ElasticSearch. By using an Apache Spark on EMR we were able to hit the ground running since EMR makes it fast and easy to process vast amounts of data.

Amazon S3 was chosen as a cloud object storage solution due to its reliability and integration capabilities with other systems. Amazon RDS and Redshift were used as scalable, fault tolerant and manageable data storage with optimal latency and scaling options. Finally, in order to ensure the timely, uninhibited customer access to the data, we used ElasticSearch, a highly scalable open-source search engine.

AWS Products Used


The produced pipeline resulted in the automation of data cleaning, data gathering and expedited the delivery of quality, enriched data to the customers which, in turn, helps speed up their sales and marketing activities.


“We collect huge volumes of raw data every day and the solutions that we had in place could not keep up in terms of handling this deluge of data. Squadex built a Big Data solution which eliminated a lot of manual work and increased the quality of the data we provide our customers with. We achieved 2X reduce in time and 3X optimized spendings on data processing while increasing the speed of delivery and accuracy of the data to our users. Now we look further for an even deeper optimization of data processing and connecting more and more data sources.”

Prayag Narula, CEO at LeadGenius

Looking to Improve Your Big Data Pipeline?Schedule Free Consulting Session