Requires Master's degree in Statistics or Biostatistics + 2 years of experience: curating healthcare datasets of >20 million electronic health record (EHR) encounters and transforming raw EHR data into research-grade datasets; implementing natural language processing techniques and large language models on unstructured EHR text; applying transformer architectures adapted for biomedical applications; and using Python, SQL, PySpark, AWS cloud computing environment, Databricks, Git-based workflows, R and SAS.