We are seeking to recruit a machine learning scientist with experience in text mining, NLP and machine learning to work on the EMERALD (Enriching MEtagenomics Results using Artificial intelligence and Literature Data) project. This position will be located within the Literature Services Team at the European Bioinformatics Institute (EMBL-EBI), and will work in close collaboration with the metagenomics team (MGnify), also based at the EMBL-EBI.
Metagenomics is a rapidly expanding field in which the depth and breadth of data are constantly increasing. Consequently, the number of published research articles associated with the field is growing. The overall goal of the EMERALD project is to use machine learning to make metagenomics datasets more reusable, and then perform analyses on these data to discover and identify novel secondary metabolite biosynthetic gene clusters (SMGCs). The project will develop methods to identify full text publications on metagenomics and extract useful experimental concepts (such as biome descriptions and experimental techniques). These key concepts can then be integrated to the public datasets in MGnify, making them more usefully cross-comparable. In the second element of this project, we will work collaboratively with the MGnify team to search for evidence in research papers for information on novel SMGCs, identified through re-analysis across multiple metagenomics datasets. For example, co-occurrence of sets of gene names, secondary metabolites, and inferred relationships between these concepts).
The successful candidate will be responsible for researching and developing machine learning and related methods to extract concepts from research publications pertaining to the above goals and sharing the results with the MGnify team. The role will require you to work within the context of the Literature services multi-disciplinary team, which includes full stack developers, ontology expertise, data scientists, and biologists as well as text mining and machine learning scientists. You will not be starting from scratch: the Europe PMC team runs basic text mining workflows on all incoming Europe PMC content, which will provide a foundation for this project. This is a great opportunity for someone who wants to make an impact with their text and data mining skills in an open research data infrastructure.