Protege

View More Jobs
Job Details

applied data scientist

Remote

Published: 4 days ago


Job Description:

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data, starting in the healthcare industry. 

Solving AI's data problem is a generational opportunity. The company that succeeds will be one of the largest in AI — and in tech.

Summary

The Applied Data Scientist bridges the gap between our data assets and our customers' needs in our healthcare vertical. They play a key role in ensuring our datasets are well-matched to the AI models our customers are building and well-understood by those customers. This role requires both healthcare data expertise, extensive experience with statistical analysis, and some customer collaboration.

We are open to hiring someone for part-time, temp-to-hire, and full-time opportunities in this role. Part-time would require at least 20 hours per week. 

Responsibilities

  • Data Analysis: Conduct feasibility analyses by querying healthcare datasets to assess patient cohort availability based on complex inclusion/exclusion criteria (i.e. procedures, diagnoses, diversity, longitudinal completeness, regulatory constraints). 

  • Trade-off Assessments: Assess privacy-preservation techniques to maximize dataset utility.

  • Customer Collaboration: Work directly with prospective customers to understand their data requirements and help curate the best data assets for their use cases.

  • Data Strategy: Identify gaps in our data offerings and provide insights to our partnerships team on the highest-priority data acquisitions.

  • Data Quality Assurance: Evaluate potential data partnerships, ensuring the data is high-quality, well-documented, and commercially viable.

Technical Skill Set

  • Data Expertise: Experience working with healthcare/medical datasets: some combination of imaging, EHR, genomic, claims, and pathology data as well as comfort with SQL, R , and/or Python for data analysis. The bigger the dataset you have worked with, the better! 

  • Longitudinal & Cohort Analysis: Ability to evaluate datasets for completeness over time, ensuring sufficient patient follow-up and retention for model training.

  • Diversity & Bias Mitigation: Knowledge of techniques to assess and improve dataset diversity across demographics, geographies, and clinical subpopulations.

  • Privacy-Preserving Technologies: Familiarity with de-identification techniques such as Safe Harbor and Expert Determination.

Qualifications

  • 2+ years experience in a health data role (e.g., biomedical informatics, computational biology, AI/ML in healthcare) or equivalent experience, e.g., Ph.D. or Masters in healthcare economics, statistics or data science with healthcare focus, etc.

  • Excellent communication skills with the ability to translate complex data concepts.

  • Proficiency in Snowflake and a stats coding language (SQL, R, Python), including writing complex queries and working with large datasets.

  • Experience in a customer-facing role preferred.

ProtegeProtege united states