Data collection, retrieval, and management (35%)
You'll be responsible for working with IRIS Data Scientists to prepare and manage IRIS's research datasets. Specifically, you will:
- Write scripts to download, ingest, and update large-scale publicly available datasets in a variety of formats (JSON, XML, Flat les, API calls) from multiple sources
- Extract, transform, and load data into IRIS secure data warehouse
- Perform quality assurance checks, describe, and document datasets
- Prepare data for integration and for record linkage
- Clean, harmonize, and de-identify IRIS data for research use
- Build and maintain datasets for research use
Data analysis and visualization (35%)
You'll be engaged in hands-on data analytical work. Specifically, you will:
- Write, maintain, and revise programs for data analysis
- Demonstrate creativity in visualizing results, using Python, Tableau, or any other data visualization tools
- Create rapid turn-around data summaries in response to ad hoc requests
- Data structure and feature engineering for machine learning model development, training and testing using parallel processing in a high performance computing (HPC) environment.
- Implement and evaluate shallow and deep machine learning models and classifiers using a range of supervised and semi-supervised techniques.
- Perform quality assurance and validation checks and collaborate with researchers
Researcher support (30%)
You'll deliver high-quality research support to external researchers using IRIS data. Specifically, you will:
- Contribute to data documentation and technical reports for IRIS research community
- Provide technical assistance to researchers working in the IRIS virtual data enclave
- Respond to research data questions
- Assist researchers in le import into and export out of the IRIS enclave
- Collaborate with IRIS technical and research staff to improve data quality, and address research community needs