Managing & Publishing Research Code

Several journals now request data and/or code to be made openly available in a permanent repository accessible via a digital object identifier (doi), which is – in my opinion – generally a really good thing. However, there are associated challenges. First, because the expectation that code and data are made openly available is quite new (still nowhere near ubiquitous), many authors do not know of an appropriate workflow for managing and publishing their code. If code and data has been developed on a local machine, there is work involved in making sure the same code works when transferred to another computer where paths, dependencies and software setup may differ, and providing documentation. Neglecting this is usually no barrier to publication, so there has traditionally been little incentive to put time and effort into it. Many have mad great efforts to provide code to others via ftp sites, personal webpages or over email by request. However, this relies on those researchers maintaining their sites and responding to requests.

I thought I would share some of my experiences with curating and publishing research code using Git, because actually it is really easy and feeds back into better code development too. The ethical and pragmatic arguments in favour of adopting a proper version control system and publishing open code are clear – it enables collaborative coding, it is safer, more tractable and transparent. However, the workflow isn’t always easy to decipher to begin with. Hopefully this post will help a few people to get off the ground…

Version Control:

Version control is a way to manage code in active development. It is a way to avoid having hundreds of files with names like “model_code_for” in a folder on a computer, and a way to avoid confusion copying between machines and users. The basic idea is that the user has an online (‘remote’) repository that acts as a master where the up-to-date code is held, along with a historical log of previous versions. This remote repository is cloned on the user’s machine (‘local’ repository). The user then works on code in their local repository and the version control software  (VCS) syncs the two. This can happen with many local repositories all linked to one remote repository, either to enable one user to sync across different machines or to have many users working on the same code.

Changes made to code in a local repository are called ‘modifications’. If the user is happy with the modifications, they can be ‘staged’. Staging adds a flag to the modified code, telling the VCS that the code should be considered as a new version to eventually add to the remote repository. Once the user has staged some code, the changes must be ‘committed’. Committing is saving the staged modifications safely in the local repository. Since the local repository is synced to the remote repository by the VCS, I think of making a commit as “committing to update the remote repository later”. Each time the user ‘commits’ they also submit a ‘commit message’ which details the modifications and the reasons they were made. Importantly, a commit is only a local change. Staging and committing modifications can be done offline – to actually send the changes to the remote repository the user ‘pushes’ it.

Adapted from

Sometimes the user might want to try out a new idea or change without endangering the main code. This can be achieved by ‘branching’ the repository. This creates a new workflow that is joined to the main ‘master’ code but kept separate so the master code is not updated by commits to the new branch. These branches can later be ‘merged’ back onto the master branch if the experiments on the branch were successful.

These simple operations keep code easy to manage and tractable. Many people can work on a piece of code, see changes made by others and, assuming the group is pushing to the remote repository regularly, be confident they are working on the latest version. New users can ‘clone’ the existing remote repository, meaning they create a local version and can then push changes up into the main code from their own machine. If a local repository is lagging behind the remote repository, local changes cannot be pushed until the user pulls the changes down from the remote repository, then pushes their new commits. This enables the VCS and the users to keep track of changes.


To make the code useable for others outside of a research group, a good README should be included in the repository, which is a clear and comprehensive explanation of the concept behind the code, the choices made in developing it and a clear description of how to use and modify it. This is also where any permissions or restrictions on usage should be communicated, and any citation or author contact information. Data accompanying the code can also be pushed to the remote repository to ensure that when someone clones it, they receive everything they need to use the code.


One great thing about Git is that almost all operations are local – if you are unable to connect to the internet you can still work with version control in Git, including making commits, and then push the changes up to the remote repository later. This is one of many reasons why Git is the most popular VCS. The name refers to the tool used to manage changes to code, whereas Github is an online hosting service for Git repositories. With Git, versions are saved as snapshots of the repository at the time of a commit. In contrast, many other VCSs log changes to files.

There are many other nuances and features that are very useful for collaborative research coding, but these basic concepts are sufficient for getting up and running. It is also worth mentioning BitBucket too – many research groups use this platform instead of GitHub because repositories can be kept private without subscribing to a payment plan, whereas Github repositories are public unless paid for.

Publishing Code

To publish code, a version of the entire repository should be made immutable and separate from the active repository, so that readers and reviewers can always see the precise code that was used to support a particular paper. This is achieved by minting a doi (digital object identifier) for a repository that exists in GitHub. This requires exporting to a service such as Zenodo.

Zenodo will make a copy of the repository and mint a doi for it. This doi can then be provided to a journal and will always link to that snapshot of the repository. This means the users can continue to push changes and branch the original repository, safe in the knowledge the published version is safe and available. This is a great way to make research code transparent and permanent, and it means other users can access and use it, and the authors can forget about managing files for old papers on their machines and hard drives and providing their code and data over email ‘by request’. It also means the authors are not responsible for maintaining a repository indefinitely post-publication, as all the relevant code is safely stored at the doi, even if the repository is closed down.


Smartphone Spectrometry

The ubiquitous smartphone contains millions of times more computing power than was used to send the Apollo spacecraft to the moon. Increasingly, scientists are repurposing some of that processing power to create low-cost, convenient scientific instruments. In doing so, these measurements are edging closer to being feasible for citizen scientists and under-funded professionals, democratizing robust scientific observations. In our new paper in the journal ‘Sensors’, led by Andrew McGonigle (University of Sheffield) we review the development of smartphone spectrometery.

Created by Natanaelginting –
Abstract: McGonigle et al. 2018: Smartphone Spectrometers

Smartphones are playing an increasing role in the sciences, owing to the ubiquitous proliferation of these devices, their relatively low cost, increasing processing power and their suitability for integrated data acquisition and processing in a ‘lab in a phone’ capacity. There is furthermore the potential to deploy these units as nodes within Internet of Things architectures, enabling massive networked data capture. Hitherto, considerable attention has been focused on imaging applications of these devices. However, within just the last few years, another possibility has emerged: to use smartphones as a means of capturing spectra, mostly by coupling various classes of fore-optics to these units with data capture achieved using the smartphone camera. These highly novel approaches have the potential to become widely adopted across a broad range of scientific e.g., biomedical, chemical and agricultural application areas. In this review, we detail the exciting recent development of smartphone spectrometer hardware, in addition to covering applications to which these units have been deployed, hitherto. The paper also points forward to the potentially highly influential impacts that such units could have on the sciences in the coming decades

New spec-tech! A smartphone-based UV spectrometer

In our new paper we report on some novel tech that uses the sensor in a smartphone for ultraviolet spectroscopy. It is low cost and based entirely on off-the-shelf components plus a 3-D printed case. The system was designed with volcanology in mind – specifically the detection of atmospheric sulphur dioxide, but may also have applications for supraglacial spectroscopy. As far as we know this is the first nanometer resolution UV spectrometer based on smartphone sensor technology and the framework can be easily adapted to cover other wavelengths.

This follows on from a Raspberry-Pi based UV camera reported in Sensors last year which was recently adapted to sense in the visible and near-infra-red wavelengths for use on ice. The plan now is to compare the images from the Pi-cam system to those made using an off-the-shelf multispectral imaging camera that detects the same wavelengths. A report of testing this camera system for detecting volcanic gases is available at Tom Pering’s blog here.

Raspberry-Pi and smartphone based spectroscopy could make obtaining high-spectral resolution data a real possibility for hobbyists and scientists lacking sufficient funds to purchase an expensive field spectrometer. The system is also small and light and therefore more convenient for some field applications than the heavy and cumbersome field specs available commercially and can easily be mounted to a UAV.

Frontiers Paper: Albedo products from drones

A new paper, led by Johnny Ryan, shows that a consumer grade digital camera mounted to a drone can be used to estimate the albedo of ice surfaces with an accuracy of +/- 5%. This is important because albedo measurements are fundamental to predicting melt, but satellite albedo data is limited in its spatial and temporal resolution and ground measurements can only be for small areas. Methods employing UAV technology can therefore bridge the gap between these two scales of measurement. The work demonstrates that this is achievable using a relatively simple workflow and low cost equipment.

The fixed-wing UAV setup (Figure 1 in the paper)

The full workflow is detailed in the paper, involving processing, correcting and calibrating raw digital images using a white reference target, and upward and downward shortwave radiation measurements from broadband silicon pyranometers. The method was applied on the SW Greenland Ice Sheet, providing albedo maps over 280 km2 at a ground resolution of 20 cm.

An example of a UAV-derived albedo map from the SW Greenland Ice Sheet (Figure 3 in the paper)

This study shows that albedo mapping from UAVs can provide useful data and as drone technology advances it will likely provide a low cost, convenient method for distinguishing surface contaminants and informing energy balance models.

Final UAV mods

After testing the UAV performance in Svalbard in March, I realised the original ‘tripod’ landing assembly was not going to cut it for work in the Arctic. To prevent damage from landing in cryoconite holes and to spread the drone’s weight when landing on snow, I have added some ski’s modified from off-the-shelf landing gear for RC helicopters. This also has the added advantage that if one attachment point fails, the UAV is still landable, which is not the case for the tripod design.


As well as the ski’s, I have now added the Red-Edge camera’s down-welling light sensor to the top of the casing. This will automatically correct the images for changes in the ambient light field in each wavelength.


New AGU paper: Microbes change the colour and chemistry of Antarctic snow

In recent decades there has been a significant increase in snow melt on the Antarctic Peninsula and therefore more ‘wet snow’ containing liquid water. This wet snow is a microbial habitat In our new paper, we show that distance from the sea controls microbial abundance and diversity. Near the coast, rock debris and marine fauna fertilize the snow with nutrients allowing striking algal blooms of red and green to develop, which alter the absorption of visible light in the snowpack. This happens to a lesser extent further inland where there is less fertilization.

Figure showing the location of the field sites on the Antarctic Peninsula at two scales (A/B), plus close up views of the red snow algal patches (C/D).

A particularly interesting finding is that the absorption of visible light by carotenoid pigments has greatest influence at the surface of the snow pack whereas chlorophyll is most influential beneath the surface. Higher concentrations of dissolved inorganic carbon and carbon dioxde were measured in interstitial air near the coast compared to inland and a close association was found between chlorophyll and dissolved organic carbon. These observations suggest in situ production of carbon that can support more diverse microbial life, including species originating in nearby terrestrial and marine habitats.

Reflected light from clean snow, snow with green algae and snow with red algae.


These observations will help to predict microbial processes including carbon exchange between snow, atmosphere, ocean and soils occurring in the fastest-warming part of the Antarctic, where snowmelt has already doubled since the mid-twentieth century and is expected to double again by 2050.