- Build and maintain automated pipelines to collect and structure publicly available data (APIs + website/RSS sources) into analysis-ready datasets.
- Apply data quality controls (deduplication, normalisation, timestamp validation, and QA flags) and document rules/assumptions to keep outputs reliable.
- Develop text-processing workflows to convert unstructured content into consistent fields for trend and theme analysis over time.
- Produce stakeholder-friendly summaries of “what changed / why it matters / what to do next”, and iterate based on feedback to improve signal-to-noise.
Data Analytics & Data Science — Reliable Data, Clear Insights
Master of Data Science graduate (RMIT, Dec 2025) with internship experience across data ingestion, cleaning/validation, and stakeholder reporting. I build analysis-ready datasets and dashboards, and apply ML when it improves decisions — with a strong focus on data quality, reproducibility, and practical delivery.
About Me
I’m a Master of Data Science graduate from RMIT (Dec 2025), based in Melbourne.
I work across the full analytics lifecycle: collect data → clean/validate → structure it into report-ready tables → analyse trends/patterns → communicate insights clearly to stakeholders. I care a lot about accuracy, traceability, and repeatable outputs.
Recent work includes healthcare document ingestion (PDF/HTML), metadata governance and deduplication, database design in PostgreSQL, and building QA/audit checks to improve trust in reporting.
I’m comfortable collaborating with both technical and non-technical teams, clarifying requirements, and documenting assumptions so the data stays usable over time.
Professional Experience
- Built ingestion pipelines for a healthcare content library (PDF/HTML), including metadata governance and SHA-256 deduplication to improve dataset reliability.
- Developed automated download and parsing workflows with retries and content-type handling to improve consistency for downstream analysis and retrieval.
- Designed PostgreSQL data structures for content (metadata + embeddings) with indexing patterns, and exposed curated datasets via a FastAPI service with basic automated tests (Pytest).
- Implemented tenant-aware access controls using PostgreSQL Row-Level Security (RLS) to enforce segregation across multi-hospital deployments.
- Built QA and audit tooling (flag logging, CSV exports, citation audit harness, labelled evaluation dataset) and tracked quality trends using reproducible metrics.
- Delivered 5+ stakeholder dashboards (Tableau, Excel) and performed source-to-dashboard cross-checks to improve metric consistency.
- Optimised SQL query performance, reducing data retrieval time by ~25% for executive reporting.
- Automated reporting workflows with Python (BeautifulSoup), cutting manual effort by ~30%.
Featured Projects
Selected projects demonstrating applied analytics, data engineering, and evaluation-focused ML systems. Full code and additional work are on my GitHub.
Solara Healthcare RAG System
Built a healthcare knowledge system with strong data governance (metadata tracking, deduplication, tenant isolation) and audit-friendly evaluation. Implemented reproducible monitoring runs and tracked quality using evaluation metrics over time.
Customer Segmentation & RFM Analysis
Segmented Australian retail customers and identified a high-value segment representing ~15% of revenue. Improved data quality (invalid order IDs) to enable accurate RFM analysis and reporting.
Climate Policy Forecasting Dashboard
Built time-series forecasting and an interactive dashboard with geospatial exploration to support stakeholder understanding of trends and uncertainty.
Cloud Music Subscription Platform
Built a cloud-based platform prototype using Java and AWS services, focusing on data storage, basic APIs, and cloud-native architecture concepts.
Real Estate Data Quality Pipeline
Developed an automated data cleaning workflow, implementing parsing, duplicate detection, and schema validation to improve data consistency.
Technical Expertise
Let’s Connect
Open to full-time and contract roles in Data Analytics / Data Science, with a focus on reliable data systems, data quality, and decision-ready reporting.