Eashwar Subramanian | Data Analyst & Data Scientist

About Me

I’m a Master of Data Science graduate from RMIT (Dec 2025), based in Melbourne.

I work across the full analytics lifecycle: collect data → clean/validate → structure it into report-ready tables → analyse trends/patterns → communicate insights clearly to stakeholders. I care a lot about accuracy, traceability, and repeatable outputs.

Recent work includes healthcare document ingestion (PDF/HTML), metadata governance and deduplication, database design in PostgreSQL, and building QA/audit checks to improve trust in reporting.

I’m comfortable collaborating with both technical and non-technical teams, clarifying requirements, and documenting assumptions so the data stays usable over time.

Melbourne, Australia

Open to Full-time & Contract

SQL • Python • Power BI/Tableau

Education

Master of Data Science

RMIT University — Dec 2025

Bachelor of Engineering

Electronics & Communication Engineering — 2023

Work Authorization

Details available upon request

What I’m Strong At

Data Quality: validation rules, deduplication, audit outputs

Data Systems: PostgreSQL, schema design, indexing

Automation: Python workflows, reliable pipelines

Analytics: dashboards, segmentation, forecasting

Evaluation: reproducible testing/monitoring for quality over time

Professional Experience

Data Science Intern

Cultural Infusion • Melbourne, Australia

Jan 2026 – Present

Build and maintain automated pipelines to collect and structure publicly available data (APIs + website/RSS sources) into analysis-ready datasets.
Apply data quality controls (deduplication, normalisation, timestamp validation, and QA flags) and document rules/assumptions to keep outputs reliable.
Develop text-processing workflows to convert unstructured content into consistent fields for trend and theme analysis over time.
Produce stakeholder-friendly summaries of “what changed / why it matters / what to do next”, and iterate based on feedback to improve signal-to-noise.

Data Science Intern

Solara Health • Melbourne, Australia

Jul 2025 – Nov 2025

Built ingestion pipelines for a healthcare content library (PDF/HTML), including metadata governance and SHA-256 deduplication to improve dataset reliability.
Developed automated download and parsing workflows with retries and content-type handling to improve consistency for downstream analysis and retrieval.
Designed PostgreSQL data structures for content (metadata + embeddings) with indexing patterns, and exposed curated datasets via a FastAPI service with basic automated tests (Pytest).
Implemented tenant-aware access controls using PostgreSQL Row-Level Security (RLS) to enforce segregation across multi-hospital deployments.
Built QA and audit tooling (flag logging, CSV exports, citation audit harness, labelled evaluation dataset) and tracked quality trends using reproducible metrics.

Data Analyst Intern

PrepInsta Pvt Ltd • Remote (India)

Dec 2023 – Feb 2024

Delivered 5+ stakeholder dashboards (Tableau, Excel) and performed source-to-dashboard cross-checks to improve metric consistency.
Optimised SQL query performance, reducing data retrieval time by ~25% for executive reporting.
Automated reporting workflows with Python (BeautifulSoup), cutting manual effort by ~30%.

Featured Projects

Selected projects demonstrating applied analytics, data engineering, and evaluation-focused ML systems. Full code and additional work are on my GitHub.

Solara Healthcare RAG System

Data governance, ingestion pipelines, multi-tenant controls, reproducible evaluation

Built a healthcare knowledge system with strong data governance (metadata tracking, deduplication, tenant isolation) and audit-friendly evaluation. Implemented reproducible monitoring runs and tracked quality using evaluation metrics over time.

Python PostgreSQL pgvector FastAPI RAG RAGAS Pytest

View on GitHub

Customer Segmentation & RFM Analysis

K-Means • RFM • Power BI/Tableau Reporting

Segmented Australian retail customers and identified a high-value segment representing ~15% of revenue. Improved data quality (invalid order IDs) to enable accurate RFM analysis and reporting.

Python scikit-learn RFM Power BI

Climate Policy Forecasting Dashboard

SARIMA • Flask • Folium Mapping

Built time-series forecasting and an interactive dashboard with geospatial exploration to support stakeholder understanding of trends and uncertainty.

SARIMA Flask Folium Statsmodels

Cloud Music Subscription Platform

AWS • DynamoDB • Java

Built a cloud-based platform prototype using Java and AWS services, focusing on data storage, basic APIs, and cloud-native architecture concepts.

Java AWS DynamoDB

Real Estate Data Quality Pipeline

Data Cleaning • Duplicate Detection • ETL

Developed an automated data cleaning workflow, implementing parsing, duplicate detection, and schema validation to improve data consistency.

Python MySQL ETL Validation

Technical Expertise

Data & Analytics

SQL querying • Data profiling • Dashboarding (Power BI/Tableau) • Stakeholder reporting • Exploratory analysis

Data Quality & Governance

Validation rules • Deduplication • Audit outputs • Metadata management • Tenant isolation (RLS concepts)

Engineering

Python automation • ETL workflows • PostgreSQL schema design • Indexing • APIs (FastAPI) • Testing (Pytest)

ML & Evaluation

Retrieval systems (RAG) • Embeddings • Monitoring/evaluation runs • Forecasting (SARIMA) • Clustering (K-Means)

Tools

Python (Pandas, NumPy) • SQL • Power BI/Tableau • Git • PostgreSQL • AWS (S3/EC2/Lambda exposure)

Let’s Connect

Get in Touch

Open to full-time and contract roles in Data Analytics / Data Science, with a focus on reliable data systems, data quality, and decision-ready reporting.

eashwars2001@gmail.com

linkedin.com/in/eashwar-subramanian

GitHub

github.com/Eashwar-Subramanian

Download Resume

Send a Message