Superfluid is developing the first high-performance, predictive blood-based test for Alzheimer’s Disease (AD) and related dementias that directly assays mRNA transcripts from the brain via its platform technology of cell-free messenger RNA (cf-mRNA) analysis and machine learning. This next-generation novel liquid biopsy technology enables non-invasive measurement of the dynamic biology of organs throughout the body, including the brain. Our precise understanding of the underlying pathways of disease will transform AD disease care and treatment.
Superfluid has a small but mighty team including Founder Steve Quake (Stanford Professor and Head of Science at CZI) and CEO Gajus Worthington (Former Founder/CEO of Fluidigm). Superfluid has published extensively in multiple peer-reviewed journals. Superfluid is well funded by notable investors including Brook Byers and Reid Hoffman and is also supported by the NIH and Alzheimer’s Drug Discovery Foundation.
We are seeking a highly experienced and motivated Senior Data Scientist to join our cell-free cf-mRNA data science team. This role is pivotal in driving innovation in secondary and tertiary analyses of cf-RNA sequencing, with a focus on developing predictive and prognostic models in AD research. The successful candidate will play a key role in advancing analysis techniques such as normalization, batch correction, differential gene expression analysis, pathway and enrichment analysis, feature selection/engineering, and model development optimized for generalizable validation.
We are looking for someone with a deep understanding of gene expression analysis, statistics, data science, and machine learning modeling, coupled with a passion for producing rigorous, translational results. This role requires a strong background in first-principles data exploration, excellent communication skills, a collaborative, data-driven mindset and a commitment to best practice clinical-grade implementation. This role is on-site at our South San Francisco office.
See Job Description
Key Responsibilities:
- Lead innovation in secondary and tertiary analysis of cf-RNA sequencing data, focusing on delivering rigorous and reproducible results
- Develop and implement advanced methods for differential gene expression analysis, pathway analysis, and enrichment analysis, optimizing for accuracy and biological insights
- Build, train, test, and validate predictive models, including logistic regression, random forests, and neural networks, as well as leverage existing RNA-seq large language models (LLMs) for inference and analysis
- Design and build scalable, efficient data analysis pipelines
- Engage in hypothesis-driven research, rigorously testing and validating new
methods and models
- Critically evaluate results, ensuring robust models that are applicable in
real-world clinical contexts beyond academic publications
- Visualize complex datasets and create compelling narratives to communicate
findings to both scientific and executive audiences
- Collaborate with cross-functional teams, contributing to the company’s overall
scientific and technical strategy
Qualifications:
- PhD in a quantitative field with a strong focus on biological sciences (e.g., Applied Statistics, Biophysics, Computational Biology)
- Postdoctoral experience is highly desirable
- 5+ years of biotech industry experience with a proven track record of leading successful projects
- Expertise in gene expression data analysis, including count table filtering, normalization strategies, noise quantification, differential expression analysis, and dimensionality reduction
- Strong foundation in statistical principles and rigorous application; including, but not limited to, hypothesis testing, P-value corrections, Bayesian approaches, bootstrapping, and permutation testing
- Extensive experience in building, training, testing, and validating machine learning and deep learning models, including model selection based on comparative analysis and performance metrics. Proficient in feature set development (selection, engineering, etc.) and skilled in updating and performing inference with RNA-seq-specific large language models (LLMs)
- Ability to innovate both in applying library methods and developing algorithms from scratch
- Experience with common data science infrastructure, including pipelines, clusters, databases, and feature stores. Direct experience with cloud platforms (AWS preferred) for scaling, deploying, and managing data workflows is a strong advantage
- Proficient in Python and Unix/Linux environments; additional proficiency in other languages (e.g. R, Julia, Rust) is a strong plus
- Strong coding skills across the software development lifecycle
- Deep scientific curiosity and a solid grasp of the scientific method, hypothesis testing, and model validation
- Passion for building predictive and prognostic models that perform effectively in real-world applications
- Independent research capabilities, with the ability to drive projects with minimal supervision
- Exceptional data visualization skills and the ability to translate complex datasets into actionable insights
- Excellent communication skills, with the ability to message both technical and executive-level audiences
The estimated salary range for this role is $160,000- $190,000 and is based on a number of factors including experience and qualifications. This role also includes meaningful pre-IPO equity and benefits. Please apply by emailing your resume and this position of interest to careers@superfluiddx.com