Eashwar Subramanian

Data Analyst / Data Scientist • SQL • Python • Data Quality • Reporting

About Me

I’m a Master of Data Science graduate from RMIT (Dec 2025), based in Melbourne.

I work across the full analytics lifecycle: collect data → clean/validate → structure it into report-ready tables → analyse trends/patterns → communicate insights clearly to stakeholders. I care a lot about accuracy, traceability, and repeatable outputs.

Recent work includes healthcare document ingestion (PDF/HTML), metadata governance and deduplication, database design in PostgreSQL, and building QA/audit checks to improve trust in reporting.

I’m comfortable collaborating with both technical and non-technical teams, clarifying requirements, and documenting assumptions so the data stays usable over time.

Melbourne, Australia
Open to Full-time & Contract
SQL • Python • Power BI/Tableau
Education
Master of Data Science
RMIT University — Dec 2025
Bachelor of Engineering
Electronics & Communication Engineering — 2023
Work Authorization
Details available upon request
What I’m Strong At
Data Quality: validation rules, deduplication, audit outputs
Data Systems: PostgreSQL, schema design, indexing
Automation: Python workflows, reliable pipelines
Analytics: dashboards, segmentation, forecasting
Evaluation: reproducible testing/monitoring for quality over time

Professional Experience

Data Science Intern
Cultural Infusion • Melbourne, Australia
Jan 2026 – Present
  • Build and maintain automated pipelines to collect and structure publicly available data (APIs + website/RSS sources) into analysis-ready datasets.
  • Apply data quality controls (deduplication, normalisation, timestamp validation, and QA flags) and document rules/assumptions to keep outputs reliable.
  • Develop text-processing workflows to convert unstructured content into consistent fields for trend and theme analysis over time.
  • Produce stakeholder-friendly summaries of “what changed / why it matters / what to do next”, and iterate based on feedback to improve signal-to-noise.
Data Science Intern
Solara Health • Melbourne, Australia
Jul 2025 – Nov 2025
  • Built ingestion pipelines for a healthcare content library (PDF/HTML), including metadata governance and SHA-256 deduplication to improve dataset reliability.
  • Developed automated download and parsing workflows with retries and content-type handling to improve consistency for downstream analysis and retrieval.
  • Designed PostgreSQL data structures for content (metadata + embeddings) with indexing patterns, and exposed curated datasets via a FastAPI service with basic automated tests (Pytest).
  • Implemented tenant-aware access controls using PostgreSQL Row-Level Security (RLS) to enforce segregation across multi-hospital deployments.
  • Built QA and audit tooling (flag logging, CSV exports, citation audit harness, labelled evaluation dataset) and tracked quality trends using reproducible metrics.
Data Analyst Intern
PrepInsta Pvt Ltd • Remote (India)
Dec 2023 – Feb 2024
  • Delivered 5+ stakeholder dashboards (Tableau, Excel) and performed source-to-dashboard cross-checks to improve metric consistency.
  • Optimised SQL query performance, reducing data retrieval time by ~25% for executive reporting.
  • Automated reporting workflows with Python (BeautifulSoup), cutting manual effort by ~30%.

Featured Projects

Selected projects demonstrating applied analytics, data engineering, and evaluation-focused ML systems. Full code and additional work are on my GitHub.

Customer Segmentation & RFM Analysis

K-Means • RFM • Power BI/Tableau Reporting

Segmented Australian retail customers and identified a high-value segment representing ~15% of revenue. Improved data quality (invalid order IDs) to enable accurate RFM analysis and reporting.

Python scikit-learn RFM Power BI

Climate Policy Forecasting Dashboard

SARIMA • Flask • Folium Mapping

Built time-series forecasting and an interactive dashboard with geospatial exploration to support stakeholder understanding of trends and uncertainty.

SARIMA Flask Folium Statsmodels

Cloud Music Subscription Platform

AWS • DynamoDB • Java

Built a cloud-based platform prototype using Java and AWS services, focusing on data storage, basic APIs, and cloud-native architecture concepts.

Java AWS DynamoDB

Real Estate Data Quality Pipeline

Data Cleaning • Duplicate Detection • ETL

Developed an automated data cleaning workflow, implementing parsing, duplicate detection, and schema validation to improve data consistency.

Python MySQL ETL Validation

Technical Expertise

Data & Analytics
SQL querying • Data profiling • Dashboarding (Power BI/Tableau) • Stakeholder reporting • Exploratory analysis
Data Quality & Governance
Validation rules • Deduplication • Audit outputs • Metadata management • Tenant isolation (RLS concepts)
Engineering
Python automation • ETL workflows • PostgreSQL schema design • Indexing • APIs (FastAPI) • Testing (Pytest)
ML & Evaluation
Retrieval systems (RAG) • Embeddings • Monitoring/evaluation runs • Forecasting (SARIMA) • Clustering (K-Means)
Tools
Python (Pandas, NumPy) • SQL • Power BI/Tableau • Git • PostgreSQL • AWS (S3/EC2/Lambda exposure)

Let’s Connect

Get in Touch

Open to full-time and contract roles in Data Analytics / Data Science, with a focus on reliable data systems, data quality, and decision-ready reporting.

Send a Message
Resume PDF
* Required fields