Senior Data Scientist, Clinical Data Wrangler

Boston, Massachusetts

  Data Science



Boston | New York | San Francisco | Hybrid

Our client is a technology company that is applying human and machine intelligence to accelerate the creation of life-changing medical treatments. They are building a best-in-class platform focused on developing and advancing new medicines. As part of a wider data team, you will be working with top level data scientists and engineers. You will develop solutions covering the gambit of diverse data including medical records, clinical trial data, imaging, genomics, etc. developed from our own labs. Our platform encompasses an end-to-end, integrated drug discovery and development engine that is being built from the ground up.


We are looking for a senior-level Data Scientist / Data Wrangler to join the Clinical Data Science, Real World Evidence team to advance our preclinical programs and drive development of our drug-discovery platform. You will execute complex initiatives and lead projects aimed at leveraging and maximizing the potential of real-world data (RWD) within our broader data ecosystem.


Day to Day

  • Execute complex initiatives and lead projects aimed at leveraging and maximizing the potential of real-world data (RWD) within our broader data ecosystem.
  • Define the data harmonization standards and curation strategy for RWE and clinical data within the company and in partnership with various stakeholders from engineering to therapeutics.
  • Partner with the Data Science Clinical Support Insights team to design pipelines to create Fit-for-Purpose data products from traditional RWD sources, unstructured/semi-structured EMR and research grade data that will support downstream clinical programs, insights generation, and algorithmic development.
  • Collaborate with software and data engineers to create reusable tools/packages, to generalize data processes, and to productionize and automate data transformation.



  • MS/ PhD in a computational science, data science, engineering or related fields, with 4+ years of industry experience (including post-doc) in a collaborative settings to unravel complex biological problems and communicate domain knowledge to non-computational stakeholders & colleagues
  • Knowledge of medical coding ontologies used in US and globally (ICD, ATC, LOINC, SNOMED, MedDRA etc.)
  • Expertise in strategic data search, data harmonization, quality and capability evaluation.
  • Fluency in python (required) and SQL.
  • Familiarity with at least one distributed computing library (such as Spark or Dask) and hands-on experience with pipeline development.
  • Basic understanding and experience on cloud computing (AWS), linux environment, and shell scripting.
  • Familiarity and/or experience with real-world evidence (RWE) studies or RWE-informed clinical trial design.
  • Familiarity and/or experience with cardiovascular, metabolic, and renal disease areas
  • Familiarity and/or experience with data formats and standards used in EMR systems and clinical trials such as HL7, CDISC, CDASH, ADaM, OMOP
  • Familiarity and/or experience with drug development process.
  • Familiarity and/or experience with designing python libraries or packages in other languages