PySpark Data Pipeline for Serasa Experian's Credit Analysis
At Dextra Digital, I faced the challenge of processing 10M+ daily records for Serasa Experian's credit analysis system. Their existing solution couldn't handle the growing data volume, causing significant processing delays and affecting business decisions.
I designed and implemented ETL data pipelines using PySpark and Hadoop that efficiently processed transaction data across distributed clusters. I focused on optimizing join operations and implementing custom partitioning strategies to handle skewed data distributions.
The new data pipeline reduced processing time by 60%, allowing Serasa to analyze credit transactions in near real-time. This improved their ability to detect fraudulent patterns and provide more accurate credit assessments to financial institutions across Brazil.
Wildfire Detection Data Processing at Sintecsys
At Sintecsys, I needed to build a reliable system to process and analyze 100K+ images daily from remote cameras to detect early-stage wildfires in real-time—a critical environmental and safety application.
I developed an image processing pipeline that extracted key visual features and fed them into a machine learning detection system. The solution included automated data validation to handle corrupted images and varying lighting conditions that could trigger false positives.
The system successfully detected 500+ early-stage fires, significantly reducing response time and environmental damage. The pipeline's 98% accuracy rate provided reliable alerts that forestry agencies could trust, and false positives were reduced by 75%.
Vehicle Recognition System at Multiway
At Multiway, I needed to build a backend system capable of processing and analyzing 1M+ vehicles daily with high detection accuracy for smart city applications. The existing Java-based system was struggling with performance issues and high maintenance costs.
I led the migration from the legacy Java stack to a Python-based solution using TensorFlow for vehicle recognition algorithms. I designed a highly efficient data processing pipeline that handled multiple camera streams simultaneously, implemented license plate recognition algorithms, and created a real-time database indexing system to enable fast querying of vehicle history.
The redesigned system achieved 98% detection accuracy while processing over 1 million vehicles daily. We realized a 70% performance improvement compared to the previous solution and achieved full SOC2 compliance for data handling. This enabled the expansion of the smart city platform to additional municipalities.
Real-time Network Monitoring at GPr Sistemas
GPr Sistemas needed a scalable system to monitor 10,000+ ATM devices across various banks in real-time, with strict requirements for alert response times and uptime monitoring.
I developed an SNMP-based monitoring backend that continuously collected performance metrics and operational status from banking network devices. The solution included a specialized time-series database for storing historical data, intelligent anomaly detection for preemptive alerts, and automated failover mechanisms.
The system achieved sub-1-second alert response time for critical device failures, enabling immediate intervention before customers were affected. We maintained 99.99% network monitoring uptime, providing banking clients with comprehensive real-time dashboards that significantly improved their operational visibility and reduced mean-time-to-repair for ATM issues.
Machine Learning Data Infrastructure for Security Ratings
At SecurityScorecard, I needed to create resilient data ingestion pipelines to handle diverse security data sources with inconsistent formats and reliability issues, all while maintaining data accuracy for cybersecurity rating calculations.
I built scalable ETL pipelines that standardized heterogeneous security data into consistent formats for analysis. The system included automated data validation, anomaly detection, and reconciliation processes to ensure data quality. I implemented backfill mechanisms to handle source system outages and recovery.
The reliable data infrastructure enabled SecurityScorecard to rate 3x more third-party vendors per customer while maintaining data accuracy. This directly supported business growth and improved customer satisfaction as clients could evaluate more of their supply chain partners for security risks.
Need similar solutions for your business?
I help companies build high-performance, scalable systems that solve real business problems. Let's discuss how I can bring my expertise to your project.