Photo by Ellen Qin on Unsplash

Orion: An open-source tool for the science of science

Kostas Stathoulopoulos
3 min readFeb 9, 2020

--

Imagine you are a policymaker with a pot of money to invest in research. How do you ensure your investment strategy does not perpetuate the concentration of resources to the few well-known places? How do you spot and compare countries with similar research profiles?

Now imagine you are a researcher ready to embark on a new project. How many databases would you have to search for a literature review? How easy would it be to find collaborators from your discipline who are not in your existing network?

Quite often, knowledge and meta-knowledge are fragmented across databases, making it difficult to spot gaps and opportunities in research, track emerging topics and find collaborators.

As a Mozilla Open Science Fellow, I am trying to change that. The first jab at it is called Orion.

Update: We’ve launched Orion! Have a look at the web-interface and the open-source code!

What is Orion?

Orion is an open-source tool to monitor and measure progress in science. Orion depends on a flexible data collection, enrichment, and analysis system that enables users to create and explore research databases. In more detail, researchers can choose an academic journal, conference or thematic topic and collect all the relevant documents from Microsoft Academic Graph. Along with every document, Orion retrieves its DOI, citations, publication year, publisher, title and abstract, fields of study, authors and their affiliations. This collection is enriched with metadata from other sources; we geocode institutional affiliations and infer authors’ gender with Google Places API and GenderAPI respectively. Lastly, Orion measures the research specialisation and interdisciplinarity as well as the gender diversity of countries and institutions.

Orion also has a semantic search engine that enables researchers to retrieve relevant cuts of the rich and content-specific database they created. Users can query Orion with anything between one or two words (for example, gene editing) and a blogpost they read online. Orion uses modern machine learning methods to find a numerical representation of the users’ query and search for its closest matches in a high-dimensional, academic publication space. This flexibility can be powerful; researchers can query Orion with an abstract of their previous work, policymakers could use a news article or the executive the summary of a white paper.

Orion makes the database and the search results available through interactive data visualisations. Our tool offers two visualisation modes. The first mode compresses the high-dimensional, academic publication space to 2D so that users can observe groups of similar papers and find more information about them. The second mode enables users to explore the taxonomy of research and understand how disciplines are connected. Then, the users can choose a topic and find out how countries and institutions perform on it. We want the visualisations to provide users with multiple entry points to the underlying data and that both modes promote the visual exploration of the research landscape. We believe that communities without a shared vocabulary will be able to explore the research landscape, observe trends and discuss findings by using the visual dictionary we are developing.

Next steps

Orion is still in development but many important components are already in place. We are currently aiming to have a fully working prototype by the end of July. Orion’s first use case will be bioRxiv, the largest repository of online preprints in the life sciences. In the meantime, we plan to release a series of blogs, starting with a descriptive analysis of the enriched bioRxiv data and a technical description of Orion’s backend.

Acknowledgements

Zac Ioannidis is my technical collaborator and leads Orion’s front-end and data visualisation components.

This project is funded by the Mozilla Foundation. It builds on the work we are doing at Nesta where we use data science methods to build tools and inform policy.

If you have any questions about our work or would like to collaborate, send me an email at kostas [at] mozillafoundation [dot] org.

Want to learn more about Orion’s backend? Read the second blog of the series.

--

--