πŸ‘‹ Hi, I'm Isaac

Data Analyst & Pipeline Engineer

I Work with Data, but I Talk Like a Human

βœ” 130K+ records processed for client pricing tools
βœ” 292% revenue growth through dashboard analytics
βœ” 80% time saved with automated pipelines

"Have collaborated with Isaac in an ongoing python project and analysis for the freelance economy. Isaac was pivotal in the brainstorming and executing the process and script to achieve our goal. He is open to feedback and a great collaborator to have on a project."
- Antonio Lopez Jr

Featured Projects

Freelancer Market Pricing Analysis

For: Antonio Lopez Jr / Novae LLC

Built a pricing tool that helps freelancers price services competitively. Automated data collection from 3 platforms, processed 130K+ records, and cut manual effort by 80%.Result: 50+ market segments identified, real-time pricing insights

Read Full Case Study β†’

DCJ Analytics Dashboard

For: Data Career Jumpstart Bootcamp

Replaced manual reporting with automated dashboards. Leadership now makes decisions backed by real-time data.Result: 292% revenue increase + 85% time saved on reporting

Read Full Case Study β†’

BookLens Recommendation Engine

For: Open Source / Data Community

Built a smart book finder that understands meaning, not just keywords. Trained NLP model on 250+ books and 2,900+ reviews to recommend books based on natural language searches.Result: Semantic search that finds books traditional search misses

Read Full Case Study β†’

Featured Projects

Freelancer Market Pricing Analysis

For: Antonio Lopez Jr / Novae LLC

Built a pricing tool that helps freelancers price services competitively. Automated data collection from 3 platforms, processed 130K+ records, and cut manual effort by 80%.Result: 50+ market segments identified, real-time pricing insights

Read Full Case Study β†’

DCJ Analytics Dashboard

For: Data Career Jumpstart Bootcamp

Replaced manual reporting with automated dashboards. Leadership now makes decisions backed by real-time data.Result: 292% revenue increase + 85% time saved on reporting

Read Full Case Study β†’

BookLens Recommendation Engine

For: Open Source / Data Community

Built a smart book finder that understands meaning, not just keywords. Trained NLP model on 250+ books and 2,900+ reviews to recommend books based on natural language searches.Result: Semantic search that finds books traditional search misses

Read Full Case Study β†’

Superconductor Material Prediction

For: Academic Project

Built ML models that predict which materials become superconductors and their critical temperatures. Helps researchers screen thousands of materials in seconds instead of years in a lab.Result: 97% classification accuracy + 0.87 RΒ² for temperature prediction

Read Full Case Study β†’

U.S. Small Business Loan Analysis

For: Personal Project / Medium Publication

Analyzed 545K+ SBA loan records to measure program effectiveness. Identified lending patterns, default rates, and economic impact across states and industries.Result: Program supported 5.85M jobs (avg 10.7 jobs per loan)

Read Full Case Study β†’

Superconductor Material Prediction

β–ŒProject Overview

For: Physics Final Year Project | B.Sc. Thesis

Type: Physics Undergraduate Research ProjectInstitution: Federal University of Agriculture, Abeokuta

16K

Material Records

0.87

RΒ² Regression

97%

Classification Accuracy

Streamlit

Deployed Interface

β–ŒThe Problem

Discovering new superconducting materials is traditionally a slow, expensive process. Experimental methods for testing materials can take months and cost thousands of dollars per sample. This bottleneck limits scientific progress in a field with transformative potential for:

β—‹ Lossless electrical power transmission
β—‹ Magnetic resonance imaging (MRI) technology
β—‹ Quantum computing applications
β—‹ Magnetic levitation transportation
β—‹ Energy storage systems

The critical challenge: Can we predict whether a material will exhibit superconductivity, and at what critical temperature (Tc), based solely on its chemical composition?

β–ŒThe Solution

I developed a machine learning pipeline that predicts superconductivity and critical temperature using a dual-phase approach: classification followed by regression.

1. Feature Engineering (Domain-Specific)

This was the key innovation. I used two feature extraction methods:

● Element Property (EP) Features:

β—‹ Atomic radius: Size of atoms affects lattice structure
β—‹ Electronegativity: Influences electron bonding
β—‹ Ionization energy: Affects electronic configuration
β—‹ Electron affinity: Impacts material stability

● Element Fraction (EF) Features:

β—‹ Stoichiometric ratios of each element
β—‹ Proportion-based representations
β—‹ Captures composition relationships

2. Two-Phase Model Architecture

Phase 1: Classification (Is it a superconductor?)

β—‹ Trained Random Forest Classifier
β—‹ Binary prediction
β—‹ Used Voting Classifier ensemble combining EF, EP, and combined models
β—‹ Achieved 97% accuracy

Phase 2: Regression (What's the critical temperature?)

β—‹ Trained Random Forest Regressor on identified superconductors
β—‹ Predicted exact Tc value
β—‹ Achieved RΒ² of 0.87 (explained 87% of variance)

3. Hyperparameter Tuning

β—‹ Grid search with cross-validation
β—‹ Optimized number of trees, max depth, learning rate
β—‹ Balanced accuracy with computational efficiency

4. Streamlit Deployment

β—‹ User-friendly web interface for predictions
β—‹ Upload chemical composition β†’ Get instant prediction
β—‹ Visualizations: ROC curves, confusion matrices, residual plots
β—‹ Makes advanced ML accessible to non-experts

Why This Matters:
This dual-phase approach can screen thousands of potential materials in seconds (work that would take years in a lab). Researchers can now prioritize which materials to synthesize and test experimentally.

Interested in ML for Scientific Discovery?

I can apply machine learning to accelerate research in physics, chemistry, and materials science.

U.S. Small Business Loan Analysis

β–ŒProject Overview

For: Personal Project / Medium Publication

Type: Financial Data Analysis ProjectData Source: SBA FOIA Dataset.

545K+

Loan Records Analyzed

5.8M

Jobs Supported

β–ŒThe Problem

Small businesses are the backbone of the U.S. economy, but accessing financing remains a major challenge. The Small Business Administration (SBA) guarantees loans to help businesses that might otherwise struggle to secure funding. But how effective is this program?Key questions needed answers:

β—‹ What percentage of SBA-guaranteed loans are successfully repaid?
β—‹ Which states and industries benefit most from SBA loans?
β—‹ How much economic impact do these loans generate (jobs created)?
β—‹ What factors correlate with successful loan repayment?
β—‹ Are there patterns in loan amounts, interest rates, and terms?

β–ŒThe Solution

I conducted a comprehensive analysis of 545,714 SBA loan records spanning 2010-2019, using Python for data cleaning, PostgreSQL for analysis, and Tableau for visualization.

1. Data Cleaning

β—‹ Handled missing values
β—‹ Created binary indicators
β—‹ Imputed categorical data
β—‹ Fixed data quality issues
β—‹ Date conversion
β—‹ Type validation

2. PostgreSQL Analysis

I used SQL to extract key insights:

β—‹ Loan status distribution: Calculated percentages for Paid In Full, Active, Defaulted
β—‹ Top lending banks: Ranked banks by approved loan volume
β—‹ Geographic analysis: Median loan amounts by state
β—‹ Industry patterns: Analyzed interest rates and approval amounts by business type
β—‹ Time trends: Tracked loan approvals year-over-year

3. Tableau Visualizations

Created interactive dashboards featuring:

β—‹ Loan Status Distribution
β—‹ Top Banks with Approved Loans
β—‹ Median Loan Amount by State
β—‹ Loan Approval Trend Over Years
β—‹ Jobs Supported by Industry

Key Finding:
The SBA program supported 5.85 million jobs over the 10-year period, averaging 10.7 jobs per loan. This demonstrates significant economic impact beyond just financial assistance.

β–ŒRead The Full Analysis

I published a detailed write-up of this analysis on Medium, including additional visualizations and insights.

Need Financial Data Analysis?

I can help you analyze large financial datasets and extract actionable insights.

BookLens: Book Recommendation Engine

β–ŒProject Overview

For: Personal Project / Medium Publication

Focus: Natural Language Processing and Recommendation SystemsPublication: Featured article on Medium

250+

Books Scraped

Word2Vec

NLP Algorithm

2900+

Reviews Collected

166

Books in Analysis

β–ŒThe Problem

Finding relevant books on Amazon is frustrating when you're looking for something specific. Generic search results don't understand semantic similarity. If you search for "data analysis," you might miss great books about "statistical programming" or "business intelligence."Traditional keyword search has limitations:

β—‹ Only matches exact words, missing semantically related terms
β—‹ Can't understand context or relationships between concepts
β—‹ Ignores rich information in book descriptions and reviews
β—‹ Doesn't learn from how readers actually describe books

I wanted to build a smarter system that understands meaning, not just keywords. A system that could answer: "Show me books similar to Thinking, Fast and Slow" or "Find books about machine learning for beginners."

β–ŒThe Solution

I built BookLens, a book recommendation engine that uses Word2Vec to capture semantic relationships. The system learns from book descriptions and reviews to understand which books are genuinely similar.

1. Comprehensive Data Collection

I built a Scrapy-based scraper that collected extensive book data from Amazon:

β—‹ Book metadata
β—‹ Detailed information
β—‹ Book descriptions
β—‹ Review data

2. PostgreSQL Database Design

Structured storage with relational integrity:

β—‹ Books table: Stored all book metadata.
β—‹ Reviews table: Stored reviews with foreign key to books.
β—‹ Data integrity: Foreign key constraints ensured reviews linked properly to books

3. Word2Vec Model Training

The core recommendation engine:

β—‹ Text preprocessing
β—‹ Tokenization: Split text into individual words
β—‹ Model training
β—‹ Vector embeddings
β—‹ Similarity calculation

4. Intelligent Recommendation Algorithm

Multi-factor scoring system:

β—‹ Semantic similarity: Cosine distance between query and book vectors
β—‹ Quality weighting: Combined similarity with rating/5 for final score
β—‹ Top-N results: Returns 5 most relevant books
β—‹ Review matching: Also finds most relevant reviews for each book

β–ŒThe Outcome

BookLens successfully demonstrates the power of NLP for recommendations:

β—‹ Semantic understanding: Finds conceptually similar books even without shared keywords
β—‹ Natural language queries: Users ask in plain English
β—‹ Review-informed: Model learns from how readers describe books
β—‹ Quality-aware: Weights recommendations by book ratings

This project shows how NLP can power smarter recommendations. The model successfully finds books based on meaning, not just matching keywords.Next steps: Scale this to a larger dataset across multiple sources for even better recommendations.

β–ŒRead The Full Story

I wrote a detailed Medium article explaining how BookLens works, including code snippets and technical decisions.

Interested in NLP or Recommendation Systems?

I can build intelligent systems that understand and learn from text data.

Freelancer Market Pricing Analysis

β–ŒProject Overview

For: Antonio Lopez Jr / Novae LLC

Role: Lead Data Analyst & Engineer (Contract)Duration: June 2024 - December 2024

130K+

Job Postings Processed

50+

Market Segments Identified

3

Platforms Scraped

80%

Time Saved vs Manual

β–ŒThe Problem

Freelancers face a common challenge: pricing their services competitively. Without access to comprehensive market data, they struggle to understand what they should charge based on their role, experience level, and location.Antonio Lopez Jr was building a platform to help freelancers and small agencies solve this problem. He needed a robust dataset that could power intelligent pricing recommendations. But the data he needed was scattered across multiple freelance marketplaces, in different formats, currencies, and structures.The challenge was threefold:

β—‹ Scale: Collect data from three major platforms without manual effort
β—‹ Quality: Clean and standardize messy, unstructured data into a usable format
β—‹ Intelligence: Identify meaningful patterns that could inform pricing recommendations

β–ŒThe Solution

I built an end-to-end automated data pipeline that collected, cleaned, and analyzed freelance job data at scale. The system was designed to run continuously, adding new data to an ever-growing dataset.

1. Automated Web Scraping

I developed custom scrapers for each platform using Scrapy and Playwright:

β—‹ Freelancer.com
β—‹ Upwork
β—‹ Salary Transparent Street (STS)

2. Data Processing Pipeline

uilt a 12-step cleaning system to make scraped data consistent and reliable. It handled messy currency formats, standardized job titles, mapped experience levels, cleaned location info, removed duplicates, filled missing values, and converted all pricing types into a single comparable structure.

3. Clustering Analysis

Used DBSCAN to uncover pricing patterns and freelancer segments. Encoded key features, scaled data to manage outliers, separated hourly and fixed-rate jobs, and extracted insights like median prices, key roles, and top keywords for each cluster.

4. Unified Control System

Developed a single Python script to automate the full process. It handled scraping, cleaning, merging data, running clustering, and generating reports. The workflow became fully repeatable with one command.

β–ŒThe Outcome

The project delivered a production-ready data pipeline and a rich dataset that powers an intelligent freelancer pricing system.

β—‹ 130,000+ cleaned job records across multiple roles, experience levels, and regions
β—‹ 50+ unique market segments identified with detailed pricing profiles
β—‹ 80% reduction in manual effort, turning weeks of work into an automated process
β—‹ Real-time pricing insights that help freelancers understand and benchmark rates instantly
β—‹ A scalable foundation for continuous data collection and trend analysis over time

β–ŒTechnical Details

Technology Stack

β—‹ Python: Core language for all data processing
β—‹ Scrapy: High-performance web scraping framework
β—‹ Playwright: Browser automation for JavaScript-heavy sites
β—‹ Pandas: Data manipulation and cleaning
β—‹ scikit-learn: DBSCAN clustering and feature scaling
β—‹ BeautifulSoup: HTML parsing

Key Challenges Solved

β—‹ Anti-scraping measures: Implemented user agent rotation, request delays, and error recovery
β—‹ Currency normalization: Handled 15+ currencies with real-time conversion rates
β—‹ Data inconsistency: Built robust cleaning logic to handle varied formats
β—‹ Scale: Optimized pipeline to process 100K+ records efficiently
β—‹ Authentication: Managed login flows for platforms requiring credentials

Interested in Similar Work?

Need help with web scraping, data pipelines, or market analysis? Let's talk about your project.

Data Career Jumpstart Analytics Suite

β–ŒProject Overview

For: Data Career Jumpstart

Role: Data Analyst & Dashboard DeveloperDuration: January 2024 - Present (Ongoing)

292%

Revenue Increase Tracked

2

Interactive Dashboards

85%

Time Saved on Reporting

292%

Performance Insights

β–ŒThe Problem

Data Career Jumpstart needed comprehensive analytics to track their two primary marketing channels: webinars and podcasts. However, data was scattered across multiple platforms with no centralized view of performance.The challenges they faced:

β—‹ Manual reporting: Hours spent compiling weekly reports from spreadsheets
β—‹ Limited visibility: No real-time view of campaign performance
β—‹ Attribution gaps: Unclear which marketing channels drove conversions
β—‹ Disconnected metrics: Webinar and podcast data lived in separate systems
β—‹ Strategic blindspots: Difficult to identify trends and optimize timing

β–ŒThe Solution: Two Integrated Dashboards

I built two interactive Streamlit dashboards that became the single source of truth for marketing performance.

Dashboard 1: eWebinar Analytics

Purpose: Track webinar registrations, attendance, and conversions

β—‹ Key metrics: Registration count, show-up rate, conversion rate
β—‹ Sankey diagram: Device flow (mobile β†’ desktop transitions)
β—‹ Time-series chart: Registration trends over time
β—‹ Geographic analysis: Top countries for attendees
β—‹ Advanced filtering: By UTM source, channel, purchase status

Dashboard 2: Podcast Analytics

Purpose: Track podcast plays and engagement patterns

β—‹ Time-series line chart: Plays over time with custom date ranges
β—‹ Calendar heatmap: Visual pattern recognition across days
β—‹ Flexible aggregation: Daily, weekly, or monthly views
β—‹ Preset filters: Last 7/30/90 days or all-time
β—‹ Custom date selection: Analyze specific time periods

β–ŒThe Outcome

These dashboards transformed decision-making at Data Career Jumpstart:

β—‹ 292% revenue increase Weekly data-driven adjustments drove significant growth
β—‹ 85% time saved: Automated reporting eliminated manual work
β—‹ Strategic confidence: Leadership makes decisions backed by real data

Need Custom Analytics Dashboards?

I build dashboard suites that turn scattered data into actionable insights.

About Me

Data analyst, pipeline engineer, and educator

Who I am


I'm Isaac Oresanya. I work with data, but I don't talk like a robot.I turn messy spreadsheets into clear answers. I build tools that save people hours of manual work. And I help teams make better decisions by showing them what their data actually means.My background is physics (yeah, really), but I found my way to data because I love solving puzzles. Whether it's scraping 130K job postings for a client or building dashboards that track revenue, I'm all about making data work for people, not the other way around.Beyond client work, I'm passionate about education and community building. I founded Explain the Data, a platform where I break down complex data concepts and share real-world projects. I'm also active in data bootcamp communities, helping aspiring analysts build their skills and portfolios.

What I'm Currently Doing


Data Analyst at Data Career Jumpstart

January 2024 - PresentI work with a data bootcamp, building tools that help them grow. My job is to take scattered data from webinars, podcasts, and marketing campaigns and turn it into something leadership can actually use.The coolest part? I built a dashboard that helped them grow revenue by 292% in one year. Every week, they check the numbers and adjust their strategy. That's the kind of impact I'm here for.

Founder of Explain the Data

April 2025 - PresentI run a technical blog where I publish detailed case studies, tutorials, and walkthroughs of real data projects. My goal is to make data analytics accessible and practical for people who are just starting out or looking to level up their skills.Topics I cover include web scraping, data cleaning, machine learning, SQL analytics, and dashboard building. Everything I write is based on actual projects I've completed, so readers get real-world context instead of theoretical examples.