Job Description
Description
About us
Dimensions, part of the Digital Science family, is the world’s largest linked research information dataset, covering millions of research publications and connected by more than 1.3 billion citations. We are shaping the future of research and are looking for a Data Scientist to join the team.
As part of a dynamic team environment you will support our global customers through the development of new analytic approaches and capabilities leveraging our scientometric data sets and emerging knowledge graph ecosystem. You will help our customers, including the largest funding and research organizations in the U.S. Federal government and beyond, to more effectively manage their multi-billion dollar research portfolios by providing delivery excellence that delights our customers, fuels word-of-mouth growth, and very high renewal rates. You will leverage our data and platforms, including Dimensions and the rest of Digital Sciences portfolio to support research assessment, portfolio management/analysis, strategic planning and more.
What you’ll be doing
- Conduct large-scale, quantitative data analysis (millions of records) potentially including custom indexing, data linking, data collection and other data wrangling using Dimensions in-house data assets and external or customer data sets as required.
- Leverage Large Language Models and other AI technologies to address customer analytical needs, identifying opportunities to incorporate these tools into analytic workflows and customer facing applications.
- Plan, design, maintain and document data integrations, pipelines, internal use utilities, tools and software packages to support our advanced analytic capabilities.
- Build machine-learning models that operate on large, text-based documents (10s – 100s of millions of documents), for a variety of applications including named entity resolution, relationship extraction, document clustering and topic modeling.
- Create and deploy visualizations and interactive web-based dashboards, using tools such as Plotly, Dash, and React.
What you’ll bring to the role
- You will have a good understanding of the S&T ecosystem – funders, research organizations, scientific publishing and related experience working with bibliometric/scientometric datasets such as scientific publications, grants, and patents.
- You will have familiarity with knowledge graphs (including technologies such as RDF and SPARQL). Ideally, you will experience building and querying knowledge graphs in support of analytic workloads leveraging bibliometric/scientometric data sets.
- You will have experience in Python, including relevant Python libraries and modules such as pandas, scikit learn, gensim, transformers, pyTorch and Dash.
- You will have familiarity with commercial AI models like GPT, Bard or Palm and ideally experience working with LLM support toolkits such as LangChain, Guidance, and Haystack.
- You’ll be experienced in Natural Language Processing and machine learning methods with bibliometric/scientometric datasets.
- You will have experience with data visualization tools (Plotly, D3, matplotlib etc)
- You will thrive in an environment where you can work independently and remotely
- You will have previous experience of working globally and across multiple teams
- You will be a strong communicator and able to communicate your findings to a varied audience through written and verbal presentation
- You will have 3-5 years of experience delivering customer solutions.
Additional Information
Living our Values
We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open, efficient and effective.
The talent we secure is fundamental to us achieving our vision and our growth plans. The values we live by are:
We are brave in the pursuit of better
We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
About Digital Science
We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open and effective.
Our portfolio includes admired brands including Altmetric, Dimensions, Figshare, ReadCube, Symplectic, IFI Claims, Writefull, and Overleaf.
We believe that together, we can help researchers make a difference.
EWJP2