Figure Eight is the essential human-in-the-loop AI platform for data science and machine learning teams.

Figure Eight utilizes crowd workers to label data sets used for training machine and deep learning models. It is useful for data annotation and categorization, content moderation and hundreds of other use cases. The challenge was to coordinate thousands of distributed workers to cooperate as a single unit for such a complex venture.

Squadex solved the operational challenge of managing the client’s distributed workforce while providing the best annotation quality. An operational intelligence AI-based component was used to provide workers with necessary redundancy, to calculate the degree of worker reconciliation judgments and to identify low performing workers.

The newly built fraud detection platform powered by machine learning allowed the Figure Eight’s quality assurance department to weed out scammers by providing them with a comprehensive scoring system based on the workers’ performance.

Squadex Big Data Services

Big Data Platform & Next Generation BI

Advanced Analytics & Data Science

About Figure Eight

Figure Eight is a data mining and crowdsourcing platform. It helps customers generate high quality customized training data for their machine learning initiatives, or automate a business process with easy-to-deploy models and integrated human-in-the-loop workflows essential for data science teams.
The Figure Eight platform supports a wide range of use cases including self-driving cars, intelligent personal assistants, medical image labeling, content categorization, customer support ticket classification, social data insight, CRM data enrichment, product categorization, and search relevance.

PROBLEM

Figure Eight utilizes crowd workers to label data sets used for training machine and deep learning models. The company annotates millions of various images to train image recognition neural networks. The challenge was to coordinate thousands of distributed workers to jointly cooperate in such a complex venture.

Before implementing the ML platform, fraud detection was done manually using SQL and Python scripts. This approach was slow, not scalable, and contained a lot of manual redundant work.

Squadex was briefed to build an automated ML-based platform for fraud detection. The requirements were to have a scalable SaaS architecture and GUI for optimal user experience.

SOLUTION

The new ML platform for fraud detection was built from scratch. After conducting preliminary research, a proof of concept was created which showed positive results. We then proceeded to build the necessary architecture and infrastructure thus implementing machine learning algorithms, also from scratch.  

The ML part was done in Python using Pydata, TensorFlow and Pandas libraries. Models were dockerized in Docker containers. Using a different kind of features, we analyzed worker behaviour and defined a fraud probability score every worker.

PostgreSQL RDS service was used to store the backed state of the system, all the results from ML models, all actions results.

Java backend was used for processing API requests, running all ML models. Swagger was used for documenting the RESTful API.

S3 was used for data caching. Parquet format was chosen to store data for optimal performance.

Amazon Redshift was used as a Data Warehouse in CrowdFlower. The fraud detection platform was integrated with Redshift and extract from it all needed input data.

GUI was developed using React library. Using GUI Trust & Safety Analyst can show the list of Workers sorted by fraud probability. After that, they can review top scammers and take action by a ban or rejection.

Quartz was used for Jobs scheduling: running ML models, load updated data from Redshift, refreshing of the cache.

The monitoring module collects special metrics from all the applications and contains Micrometer (facade over clients), Prometheus (scrapes and stores time series metrics data) and Grafana (analytics & visualization).

A separate module was used for log processing and analysis. It contains data from Logstash, Elasticsearch, and Kibana. Logstash was used for ingesting, filtering and normalizing logs, and loading everything into Elasticsearch storage. Kibana was used for logs data analytics and visualization.

AWS Products Used

Results

Figure Eight received a scalable fraud detection ML-based platform which allowed to improve machine learning capabilities by weeding out scammers and associated costs in the process. It enabled greater worker productivity, increased the quality of machine learning training, and significantly cut operational costs.

Testimonials

Cameron Befus, VP of Engineering at CrowdFlower

Looking to Improve Your Machine Learning Capabilities?Schedule Free Consulting Session