A software developer in Oxfordshire has found a way to help scientific researchers get to grips with managing the multiples of files they have to handle in projects. Old Reliable Tech (ORT) has built Git for Scientists, an online training course on the version control platform Git, that is designed for researchers who write code as part of their jobs. The idea emerged from the firm’s work with earth observation and climate change mitigation organisations.
ORT founder Daniel Tipping says researchers he has worked with have faced issues around accessing different versions of files in their projects, and sharing them with others. Git is widely used among software developers to handle the multitude of computer files they work with, enabling users to track changes across multiple versions and to coordinate work on different file versions among teams.
Tipping explains that while Git is widely used in software development to manage and version source code, there are areas like scientific research where it is not the norm. He suggests introducing users in these fields to Git would help them to organise their code and other files such as drafts of reports, and to collaborate on shared work
In his experience, people are using methods they are familiar with but aren’t necessarily up to the job: “A large company I know said their scientists do version control by writing a log of their code changes in a text file which is saved next to their code,” he says, adding that industry colleagues he’d mentioned this to had reacted in horror at the waste of time this would create.
The idea for the course emerged from a workshop for earth observation scientists, which included, he says, “an excellent talk about how using Git is the first step towards reproducibility. There was a discussion around when you write research code to model something, how to enable someone else to reproduce what you found in your research.” As this is something Git is designed for, Tipping admits he was surprised to find people who didn’t use this tool every day.
Git’s complexity can be a barrier: “It’s a massive, complicated tool which people are trying to learn ‘on the job’. When they hit a problem, it’s likely that a software developer will get involved and give complicated instructions, and no one has time for that.”
So the ORT team designed a self-contained but minimal set of practical Git skills that scientists need to know. There are plenty of online resources, but where ORT’s course differs is that it lets users do practical exercises on their own, creating a ‘safe place’ to experiment and make mistakes to learn from: “One of the main problems with Git is that it’s quite easy for things to go wrong and for people to panic,” says Tipping. “So on the course we deliberately drop them into broken Git repos (codebases which are version-controlled with Git) and make them fix it. This really helps to build confidence.”
Knowing how to generate good quality version histories is important, especially in collaborative projects, and the course takes users through best practice for making changes to code that are easily understood. “If you forget why you made a certain change,” he explains, “you have your own description as to why, and so do your colleagues.” The course gets into the detail on performing ‘pull requests’ to help collaboration: “When you request a code owner to pull your change into their code base, the course explains how to do this properly and collaboratively, so colleagues can wade in and make suggestions.”
For scientists who share and collaborate on multiple file versions, Git for Scientists covers workflows that apply to GitHub, the open source collaboration platform used globally by software developers: “GitHub is where the powerful collaborations lie,” says Tipping, “So many scientists working on the same code can work together without stepping on each others’ toes.”
Tipping and his team built Git for Scientists during the 2020 lockdown. “Anyone who writes code, no matter how small the amount, should be using Git,” he says.
Git for Scientists is at gitscientist.com