*Location:* UK (Manchester)
*Remote:* Yes (preferred)
*Willing to relocate:* UK/EU considered for the right role
*Resume:* josh-gree.github.io/cv
*Email:* [email protected]
*Technologies:* Python, SQL, R; Airflow, Prefect, Dagster; Kafka; Docker/Kubernetes; Terraform; GCP/AWS; Postgres, PostGIS, Snowflake, Redshift; Zarr/Parquet; ML/Deep Learning; HPC; React/Flask.
*Summary:* Senior Software/Data Engineer with a strong mathematical and computational modelling background. I build high-reliability data systems, complex ETL/ELT pipelines, and ML-ready data platforms—especially where datasets are large, irregular, hierarchical, or scientifically complex.
Most recently, I’ve been designing and operating large-scale data infrastructure for high-dimensional biological datasets (100k+ samples), unifying heterogeneous storage formats into lineage-aware catalogues, creating ontologies for hierarchical labels, building QC pipelines in Dagster, developing synthetic single-cell data generators, and working closely with domain scientists to formalise and scale experimental and computational workflows.
Previously: large-scale mobile-network analytics for humanitarian agencies; climate/energy data engineering; ad-tech pipelines; and HPC-driven modelling from computational research.
I’m looking for roles where difficult data problems, scientific or ML-adjacent pipelines, or complex modelling workflows need to be made robust, reproducible, and scalable. Prefer small teams, high ownership, and work with real impact.
*What I offer:* – Architecture & implementation of reliable data/ML platforms
– Workflow orchestration, data governance, and reproducibility
– Scientific/ML pipeline design (Bayesian modelling, synthetic data, QC/validation)
– Cloud infra/IaC and cost-efficient storage design
– Ability to collaborate deeply with domain experts and formalise messy processes