Eashwar Subramanian | Data Analyst | Python, SQL, R, BI & Data Quality

About Me

I’m a Master of Data Science graduate from RMIT, based in Melbourne. My strongest area is turning messy or unclear data problems into structured, validated and well-documented workflows that others can trust and reuse.

My work spans data ingestion, cleaning, validation, dashboarding, SQL reporting, public survey/population data assessment, healthcare data workflows, and monitoring-style analytics. I focus on practical delivery: define the required output, inspect source structure, profile the data, document assumptions, validate the results, and communicate findings clearly.

I’m especially interested in roles across data analytics, data processing, research data operations, BI reporting, healthcare analytics and data quality — where accuracy, traceability and clear reporting matter.

Data wrangling Validation checks Reporting-ready tables Dashboards Documentation Stakeholder support

What I’m strong at

Data processing: cleaning, reshaping, joining, standardising and exporting datasets.
Data quality: profiling, deduplication, validation rules, QA sampling and audit-style outputs.
Analytics delivery: dashboards, reporting views, stakeholder summaries and clear written insights.
Reproducibility: documented assumptions, version control, reusable workflow steps and handover notes.
Learning quickly: comfortable ramping into new data domains, platforms and reporting standards.

Professional Experience

Research Assistant (Data Analytics)

Cultural Infusion • Melbourne, Australia

Mar 2026 – Present

Develop and refine an analytical framework for a proposed study on cultural and demographic predictors of disinformation vulnerability, with emphasis on defensible methodology and stakeholder-ready outputs.
Scope and assess public survey/population datasets such as ESS, Eurobarometer and Ofcom for comparability, missingness, field definitions and suitability before analysis.
Translate ambiguous research questions into structured data requirements, documented assumptions, validation checkpoints and practical processing steps.
Define validation checkpoints across data collection, clustering, narrative mapping and interpretation to reduce over-claiming and improve reliability.
Draft reporting artefacts including cluster cards, scorecard concepts, matrix/heatmap visuals and concise methodology notes for stakeholder review.

Research data Survey datasets Methodology Validation Stakeholder reporting

Data Science Intern — Project Atlas

Cultural Infusion • Melbourne, Australia

Jan 2026 – Mar 2026

Delivered a monitoring-style data workflow that converted messy multi-source public information into structured, decision-ready outputs with direct source traceability.
Orchestrated ingestion and evidence extraction workflows spanning GDELT news retrieval, public/social narrative retrieval, article text extraction and normalised topic-level storage.
Applied staged relevance judgement and story-level clustering to reduce noise, consolidate overlapping records and support more stable downstream analysis.
Enabled delta-aware reporting so recurring runs surfaced net-new developments rather than repeating previously seen coverage.
Configured scheduled watchlist monitoring with Teams alerts triggered only when new items were detected, reducing repeat noise and supporting timely follow-up.
Identified data quality issues across recurring runs, including duplicate coverage, inconsistent fields and out-of-window records, then documented workflow controls to improve reliability.

Python workflows Multi-source ingestion Monitoring Deduplication Teams alerts

Data Science Intern

Solara Health • Melbourne, Australia

Jul 2025 – Nov 2025

Built ingestion pipelines for a healthcare content library covering PDF and HTML sources, applying metadata governance and SHA-256 deduplication to improve reliability and traceability.
Automated download, parsing and transformation workflows with retries, content-type handling and standardised text outputs to stabilise downstream datasets.
Designed PostgreSQL data structures for healthcare content, metadata and embeddings, and exposed curated datasets through a FastAPI service with basic automated tests for reliability.
Implemented tenant-aware access controls using PostgreSQL Row Level Security to support segregation across multi-hospital deployment scenarios.
Developed QA and audit tooling including flag logging, CSV exports, citation audit checks, labelled evaluation datasets and repeatable evaluation reporting using RAGAS and custom checks.
Documented assumptions, processing logic and validation issues so outputs could be reviewed, reused and improved by project team members.

Healthcare data FastAPI PostgreSQL RAGAS Audit outputs RLS

Data Analyst Intern

PrepInsta Pvt Ltd • Remote, India

Dec 2023 – Feb 2024

Built and delivered stakeholder dashboards in Tableau and Excel, validating source tables, transformations and dashboard calculations to improve reporting consistency.
Optimised multi-table SQL queries for recurring executive reporting and ad hoc reporting requests.
Automated recurring reporting inputs in Python using BeautifulSoup-based extraction, improving repeatability and reducing manual effort in reporting workflows.

Tableau Excel SQL Python automation

Featured Projects

Selected work showing data processing, quality checks, BI delivery and applied analytics. Project links point to GitHub or dashboard proof where available.

Australian Retail Customer Segmentation

Python • SQL • RFM analysis • K-Means • Power BI • data quality repair

Built an end-to-end segmentation workflow covering transaction-integrity checks, cleaning, customer-level feature engineering, RFM analysis, clustering and dashboard-ready outputs. Profiled the transaction data and identified widespread reused order-number inconsistencies that would have distorted customer metrics, then repaired the workflow with validation rules and consistency checks. Segmented 788 customers and identified a 106-customer high-value cohort contributing about 15.9% of revenue.

Python pandas RFM K-Means Power BI Data Quality

View GitHub

Healthcare Content Ingestion & QA Workflow

Healthcare RAG system • ingestion • QA/audit outputs • evaluation

Contributed to a team-built healthcare RAG platform covering document ingestion, retrieval, evaluation, safety and tenant-scoped access control. Worked on PDF/HTML ingestion, metadata governance, SHA-256 deduplication, QA exports and repeatable evaluation reporting. The final report recorded faithfulness of 69.4%, answer relevancy of 54.3%, context precision of 92.7% and context recall of 91.2%.

Python FastAPI PostgreSQL pgvector RAGAS Docker

View Sanitised Case Study

Sales & Customer Performance Dashboards

Tableau • KPI reporting • parameters • drilldowns • dashboard navigation

Built two interactive Tableau dashboards covering sales performance and customer performance, including current-year vs previous-year KPIs, monthly trends, sub-category performance, weekly views, customer profitability and navigation controls. Structured the project with mockups, technical notes, data dictionary documentation and packaged Tableau assets.

Tableau Calculated fields Parameters Dashboard UX

View GitHub Live Dashboard

Sales Analytics Dashboard

Power BI • MySQL • reporting-ready views • metric definitions

Built a Power BI dashboard connected to a MySQL dataset to track performance across time, region and product category. Created reporting-ready views and consistent metric definitions to support recurring performance reporting and stakeholder review.

Power BI MySQL KPI reporting Data modelling

View GitHub

Climate-Driven Urban Growth Analysis

R • dplyr • tidyr • ggplot2 • data integration

Integrated meteorological and city-level population datasets to examine relationships between rainfall, temperature, humidity and population growth across Australian cities. Performed missing value handling, outlier treatment, location matching, correlation analysis and visual exploration using reproducible R workflows.

R dplyr tidyr ggplot2 Data cleaning

View GitHub

Climate Policy Forecasting Dashboard

Python • Flask • SARIMAX • Folium • validation metrics

Built a Flask dashboard that generates on-demand weekly SARIMAX forecasts for Australian locations and returns forecast outputs through JSON responses. Added weekly resampling, interpolation, MAE/RMSE validation and a temporal Folium map for geographic exploration.

Python Flask SARIMAX Folium MAE/RMSE

View GitHub

Real Estate Data Cleaning in SQL

MySQL • standardisation • address parsing • duplicate handling

Built a reusable SQL cleaning workflow covering date standardisation, missing-address filling, address splitting, categorical normalisation, duplicate identification and column cleanup. Documented transformation logic step by step for repeatability and review.

MySQL SQL cleaning Standardisation Deduplication

View GitHub

Technical Expertise

Programming for data processing Python, pandas, NumPy; R, dplyr, tidyr, ggplot2; repeatable cleaning, transformation, validation and structured exports.

SQL & databases PostgreSQL, MySQL; joins, CTEs, views, window functions, reporting-ready tables, source-to-output validation.

Dashboards & reporting Power BI, Tableau, Excel; KPI reporting, drilldowns, trend views, dashboard usability and stakeholder-ready summaries.

Data quality & governance Profiling, validation rules, standardisation, deduplication, anomaly checks, QA spot-checks, audit logs and documentation.

Analytics & modelling RFM analysis, K-Means clustering, cohort-style comparisons, trend analysis, forecasting with SARIMAX, MAE/RMSE validation.

Data systems & delivery FastAPI, Flask, Docker, Git, Pytest, Jira, Confluence, Bitbucket; basic automated tests and handover documentation.

AI/RAG evaluation exposure LlamaIndex, Sentence-Transformers, pgvector, RAGAS, labelled evaluation datasets, citation audit checks and quality reporting.

Research data exposure Public survey/population datasets, variable definition checks, comparability assessment, missingness review and methodology notes.

Stakeholder communication Requirements clarification, plain-English issue explanation, written updates, reporting commentary and practical documentation.

Let’s Connect

Get in touch

Open to full-time and contract opportunities across data analytics, data processing, research data operations, BI reporting, healthcare analytics and data quality.

eashwars2001@gmail.com

+61 450 332 782

linkedin.com/in/eashwar-subramanian

github.com/Eashwar-Subramanian

Download Resume

Send a message

Use this form or email me directly.