Upernavik Field Work 2018

2018 saw the Black & Bloom postdocs exploring a new field site in the north western sector of the Greenland Ice Sheet. After two seasons working in the south west near Kangerlussuaq, the team migrated north to investigate dark ice where the melt seasons are shorter and the temperatures lower.

DSC03461
Beautiful Upernavik, viewed from the airport (ph J Cook)

We soon learned that there were additional challenges to working up here beyond the colder weather. Upernavik itself is on a small island in an archipelago near where the ice sheet flows and calves into the sea. While this produces spectacular icebergs, it also means access to the ice sheet is possible only by helicopter. The same helicopter serves local communities elsewhere in the archipelago with food, transport and other essential services. While we were in Upernavik, a huge iceberg floated into the harbour in nearby Inarsuit, threatening the town with the potential for a huge iceberg-induced tsunami. The maritime Arctic weather also played havoc with the flight schedules, and resupplying local communities (rightly) took priority over science charters.

DSC03547
Iceberg near the harbour in Upernavik (ph. J Cook)_

These factors combined to prevent us from leaving Upernavik for 3.5 weeks. It seemed like we would never make it onto the ice. However, we finally got a weather window that coindided with heli and pilot availablity. With the difficulty of getting on to the ice weighing on our minds, we had to consider the risk of similar difficulties getting back out. We repacked to ensure we had several weeks of emergency supplies to make sure we would not be flying in to a potential search and rescue disaster.

Once on the ice, we quickly built a camp and started recording measurements quickly. The albedo measurements and paired drone flights went very smoothly, with refined methods developed over the past two seasons. However, we only saw exposed glacier ice for 1.5 days, and continuous snowfall kept it buried for the rest of the season.

DSC03792
Air Greenland’s Bell 212 sling loading our field kit (ph. J Cook)

Overall it was an interesting site, and the important thing is that we can confirm that the algal bloom we studied in the south west is also present in the northern part of the ice sheet, is composed of the same species and also makes the ice dark. We have sampled the mineral dusts too, to see how they compare with the more southern site.

Advertisements

ASD spectra processing with Linux & Python

I’m sharing my workflow for processing and analysing spectra obtained using the ASD Field Spec Pro, partly as a resource and partly to see whether others have refinements or suggestions for improving the protocols. I’m specifically using Python rather than any proprietary software to keep it all open source, transparent and to keep control over every stage of the processing..

Working with .asd files

By default the files are saved as a filetype with the extension .asd which can be read by the ASD software ‘ViewSpec’. The software does allow the user to export the files as ascii using the “export as ascii” option in the dropdown menus. My procedure is to use this option to resave the files as .asd.txt. I usually keep the metadata by selecting the header and footer options; however I deselect the option to output the x-axis because it is common to all the files and easier to add once later on. I choose to delimit th data using a comma to enable the use of Pandas ‘read_csv’ function later.

To process and analyse the files I generally use the Pandas package in Python 3. To read the files into Pandas I first rename the files using a batch rename command in the Linux terminal:

cd /path/folder/

rename “s/.asd.txt/.txt/g”**-v

Then I open a Python editor – my preference is to use the Spyder IDE that comes as standard with an Anaconda distribution. The pandas read_csv function can then be used to read the .txt files into a dataframe. Put this in a loop to add all the files as separate columns in the dataframe…

import pandas as pd

import os

spectra = pd.DataFrame()

filelist = os.listdir(path/folder/)

for file in filelist:

spectra[file] = pd.read_csv(‘/path/folder/filename’, header=None, skiprows=0)

If you chose to add any header information to the file exported from ViewSpec, you can ignore it by skipping the appropriate number of rows in the read_csv keyword argument ‘skiprows’.

Usually each acquisition comprises numerous individual replicate spectra. I usually have 20 replicates as a minimum and then average them for each sample site. Each individual replicate has its own filename with a dequentially increasing number (site1…00001, site1….00002, site1…00003 etc). My way of averaging these is to cut the extension and ID number from the end of the filenames, so that the replicates from each sample site are identically named. Then the pandas function ‘groupby’ can be used to identify all the columns with equal names and replace them with a single column containing the mean of all the replicates.

filenames = []

for file in filelist:

file = str(file)

file = file[:-10]

filenames.append(file)

#rename dataframe columns according to filenames

filenames = np.transpose(filenames)

DF.columns = [filenames]

# Average spectra from each site

DF2 = DF.transpose()

DF2 = DF2.groupby(by=DF2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0],numbers.Number) else g.iloc[0])

DF = DF2.transpose()

Then I plot the dataset to check for any errors or anomalies, and then save the dataframe as one master file organised by sample location

spectra.plot(figsize=(15,15)),plt.ylim(0,1.2)

spectra.to_csv(‘/media/joe/FDB2-2F9B/2016_end_season_HCRF.csv’)

Common issues and workarounds…

Accidentally misnamed files

During a long field season I sometimes forget to change the date in the ASD software for the first few acquisitions and then realise I have a few hundred files to rename to reflect the actual date. This is a total pain, so here is a Linux terminal command to batch rename the ASD files to correct the data at the beginning of the filename.

e.g. to rename all files in folder from 24_7_2016 accidentally saved with the previous day’s date, run the following command…

cd /path/folder/

rename “s/23_7/24_7/g” ** -v

Interpolating over noisy data and artefacts

On ice and snow there are known wavelengths that are particularly susceptible to noise due to water vapour absorption (e.g. near 1800 nm) and there may also be noise at the upper and lower extremes of the spectra range measured by the spectrometer. Also, where a randomising filter has not been used to collect spectra, there can be a step feature present in the data at the crossover point between the internal arrays of the spectrometer (especially 1000 nm). This is due to the spatial arrangement of fibres inside the fibre optic bundle. Each fibre has specific wavelengths that it measures, meaning if the surface is not uniform certain wavelengths are over sampled and others undersampled for different areas of the ice surface. The step feature is usually corrected by raising the NIR (>1000) section to meet the VIS section (see Painter, 2011). The noise in the spectrum is usually removed and replaced with interpolated values. I do this in Pandas using the following code…

for i in DF.columns:

# calculate correction factor (raises NIR to meet VIS – see Painter 2011)

corr = DF.loc[650,i] – DF.loc[649,i]

DF.loc[650:2149,i] = DF.loc[650:2149,i]-corr

# interpolate over instabilities at ~1800 nm

DF.loc[1400:1650,i] = np.nan

DF[i] = DF[i].interpolate()

DF.loc[1400:1600,i] = DF.loc[1400:1600,i].rolling(window=50,center=False).mean()

DF[i] = DF[i].interpolate()

The script is here for anyone interested… https://github.com/jmcook1186/SpectraProcessing

CASPA at EGU 2018

The EGU annual meeting in Vienna is one of the major events in the earth science calendar, where the latest ideas are aired and discussed and new collaborations forged. My talk this year was in the “Remote Sensing of the Cryosphere” session. Here’s an overview:

Albedo is a primary driver of snow melt. For clean snow and snow with black carbon, radiative transfer models to an excellent job of simulating albedo, yet there remain aspects of snow albedo that are poorly understood. In particular current models do not take into account algal cells that grow and dramatically discolour ice in some places (except our 1-D BioSNICAR model) and few take into account changes in albedo over space and time.

This led me to wonder about using cellular automata as a mechanism for distributing albedo modelling using radiative transfer over three spatial dimensions and time, and also enabling a degree of stochasticity to be introduced to the modelling (which is certainly present in natural systems).

Cellular automata are models built on a grid composed of individual cells. These individual cells update as the model progresses through time according to some function – usually a function of the values of the neighbouring cells. Cellular automata have been used extensively to study biological and physical systems in the past – for examples Conway’s Game of Life, Lovelock’s DaisyWorld and Bak’s Sandpile Model not only gave insight into particular processes, but arguably changed the way we think about nature at the most fundamental level. Those three models were epoch-changing for the concepts of complexity and chaos theory.

gof
An implementation of Conway’s Game of Life, showing the grid updating in a complex fashion, driven by simple rules, by Jakub Konka

For the snowpack, I developed a model I am calling CASPA -an acronym for Cellular Automaton for SnowPack Albedo. CASPA draws on a cellular automaton approach with a degree of stochasticity to predict changes in snowpack biophysical properties over time

At each timestep the model updates the biomass of each cell. This happens according to a growth model (an initial inoculum doubles in biomass). This biomass has a user-defined probability of growing in situ (darkening that cell) or spreading to a randomly selected adjacent cell. Once this has occurred, the radiative transfer model BioSNICAR is called and used to predict the albedo, and the energy absorbed per vertical layer. The subsurface light field is visualised as the planar intensity per vertical layer, per cell. The energy absorbed per layer is also used to define a temperature gradient which is used to drive a grain evolution model. In the grain evolution model, wet and dry grain growth ca occur, along with melting, percolation and refreezing of interstitial water. This is consistent with the grain evolution model in the Community Land Model. The new grain sizes are fed back into SNICAR ready for the albedo calculation at the next timestep.

At the same time, inorganic impurities can be incorporated into the model. These include dust and soot. These can be constant throughout the model run, or can vary according to a user-defined scavenging or deposition rate. They can also melt-out from beneath, by having the inorganic impurities rising up through successive vertical layers per timestep.

CASPA_map
The 2D albedo map output by CASPA showing the albedo decline due to an algal bloom growing on the snowpack

In this way, the albedo of a snowpack can be predicted in three spatial dimensions plus time. Taking the incoming irradiance into account, the radiative forcing can be calculated at each vertical depth at each cell per timestep. Furthermore, the energy available as photosynthetically active radiation in each layer can be quantified. Ultimately. these values can feed back into the growth model. Coupling the CASPA scheme wit a sophisticated ecological model could therefore be quite powerful.

By default the model outputs a 2D albedo map and a plot of biomass against albedo. It is interesting to realise that the subtle probabilistic elements of the cellular model can lead to drastically different outcomes for the biomass and albedo of the snowpack even with identical initial conditions. This is also true of natural systems and the idea that an evolving snowpack can be predicted using a purely deterministic model seems, to me, erroneous. There are interesting observations to make about the spatial ecology of the system. Even this simplified system can runaway into dramatic albedo decline or almost none. It makes me wonder about natural snowpacks and the Greenland dark zone – how much of the interannual variation emerges from internal stochasticity rather than being a deterministic function of meteorology or glaciology?

Fig2B
A plot of albedo and biomass against time for CASPA. Each individual run is presented as a dashed line, the mean of all runs is represented as the solid line. The divergence in evolutionary trajectory between individual runs is astonishing since these were all run with identical initial conditions – a result of emergent complexity and subtle imbalances in the probabilistic functions in the model.

In terms of quantifying biological effects on snow albedo, CASPA can be run with the grain evolution and inorganic impurity scavenging models turned ON or OFF. Comparing the albedo reduction taking into account the physical evolution of the snow with that when the snow physics remain constant provides an estimate of the indirect albedo feedbacks and the direct albedo reduction due to the algal cells.

This modelling approach opens up an interesting opportunity space for remote sensing in the cryosphere. In parallel to this modelling I have been working hard on a supervised classification scheme for identifying various biological and non-biological ice surface types using UAV and satellite remote sensing products. Coupling this scheme with CASPA offers an opportunity to upsample remote sensing imagery in space and time, or to set the initial conditions for CASPA using real aerial data and then experimenting with various future scenarios. At the moment, I lack any UAV data for snow with algal patches to actually implement the workflow, but it is proven using multispectral UAV data from bare ice on the Greenland ice sheet. When I obtain multispectral data for snow with algal blooms, it is possible to automate the entire pipeline from loading the image, classifying it using a supervised classifier, converting it into an n-dimensional array that can be used as an initial state for the CASPA cellular automaton, whose conditions can be tweaked to experiment with various environmental scenarios.

Therefore, the limiting factor for CASPA at the moment is availability of multispectral aerial data and field spectroscopy for training data for algal blooms on snow. In the spirit of open science and to try to stimulate a development community, I have made this code 100% open and annotated despite being currently unpublished, and I’d be delighted to receive some pull requests!

In summary, I suggest coupling radiate transfer with cellular automata and potentially remote sensing imagery is a promising way to push albedo modelling forwards into spatial and temporal variations and an interesting way to build a degree of stochasticity into our albedo forecasting and ecological modelling.

Managing & Publishing Research Code

Several journals now request data and/or code to be made openly available in a permanent repository accessible via a digital object identifier (doi), which is – in my opinion – generally a really good thing. However, there are associated challenges. First, because the expectation that code and data are made openly available is quite new (still nowhere near ubiquitous), many authors do not know of an appropriate workflow for managing and publishing their code. If code and data has been developed on a local machine, there is work involved in making sure the same code works when transferred to another computer where paths, dependencies and software setup may differ, and providing documentation. Neglecting this is usually no barrier to publication, so there has traditionally been little incentive to put time and effort into it. Many have mad great efforts to provide code to others via ftp sites, personal webpages or over email by request. However, this relies on those researchers maintaining their sites and responding to requests.

I thought I would share some of my experiences with curating and publishing research code using Git, because actually it is really easy and feeds back into better code development too. The ethical and pragmatic arguments in favour of adopting a proper version control system and publishing open code are clear – it enables collaborative coding, it is safer, more tractable and transparent. However, the workflow isn’t always easy to decipher to begin with. Hopefully this post will help a few people to get off the ground…

Version Control:

Version control is a way to manage code in active development. It is a way to avoid having hundreds of files with names like “model_code_for _TC_paper_v0134_test.py” in a folder on a computer, and a way to avoid confusion copying between machines and users. The basic idea is that the user has an online (‘remote’) repository that acts as a master where the up-to-date code is held, along with a historical log of previous versions. This remote repository is cloned on the user’s machine (‘local’ repository). The user then works on code in their local repository and the version control software  (VCS) syncs the two. This can happen with many local repositories all linked to one remote repository, either to enable one user to sync across different machines or to have many users working on the same code.

Changes made to code in a local repository are called ‘modifications’. If the user is happy with the modifications, they can be ‘staged’. Staging adds a flag to the modified code, telling the VCS that the code should be considered as a new version to eventually add to the remote repository. Once the user has staged some code, the changes must be ‘committed’. Committing is saving the staged modifications safely in the local repository. Since the local repository is synced to the remote repository by the VCS, I think of making a commit as “committing to update the remote repository later”. Each time the user ‘commits’ they also submit a ‘commit message’ which details the modifications and the reasons they were made. Importantly, a commit is only a local change. Staging and committing modifications can be done offline – to actually send the changes to the remote repository the user ‘pushes’ it.

git_workflow
Adapted from graphicsbuzz.com

Sometimes the user might want to try out a new idea or change without endangering the main code. This can be achieved by ‘branching’ the repository. This creates a new workflow that is joined to the main ‘master’ code but kept separate so the master code is not updated by commits to the new branch. These branches can later be ‘merged’ back onto the master branch if the experiments on the branch were successful.

These simple operations keep code easy to manage and tractable. Many people can work on a piece of code, see changes made by others and, assuming the group is pushing to the remote repository regularly, be confident they are working on the latest version. New users can ‘clone’ the existing remote repository, meaning they create a local version and can then push changes up into the main code from their own machine. If a local repository is lagging behind the remote repository, local changes cannot be pushed until the user pulls the changes down from the remote repository, then pushes their new commits. This enables the VCS and the users to keep track of changes.

 

To make the code useable for others outside of a research group, a good README should be included in the repository, which is a clear and comprehensive explanation of the concept behind the code, the choices made in developing it and a clear description of how to use and modify it. This is also where any permissions or restrictions on usage should be communicated, and any citation or author contact information. Data accompanying the code can also be pushed to the remote repository to ensure that when someone clones it, they receive everything they need to use the code.

version-control-fig2

One great thing about Git is that almost all operations are local – if you are unable to connect to the internet you can still work with version control in Git, including making commits, and then push the changes up to the remote repository later. This is one of many reasons why Git is the most popular VCS. The name refers to the tool used to manage changes to code, whereas Github is an online hosting service for Git repositories. With Git, versions are saved as snapshots of the repository at the time of a commit. In contrast, many other VCSs log changes to files.

There are many other nuances and features that are very useful for collaborative research coding, but these basic concepts are sufficient for getting up and running. It is also worth mentioning BitBucket too – many research groups use this platform instead of GitHub because repositories can be kept private without subscribing to a payment plan, whereas Github repositories are public unless paid for.

Publishing Code

To publish code, a version of the entire repository should be made immutable and separate from the active repository, so that readers and reviewers can always see the precise code that was used to support a particular paper. This is achieved by minting a doi (digital object identifier) for a repository that exists in GitHub. This requires exporting to a service such as Zenodo.

Zenodo will make a copy of the repository and mint a doi for it. This doi can then be provided to a journal and will always link to that snapshot of the repository. This means the users can continue to push changes and branch the original repository, safe in the knowledge the published version is safe and available. This is a great way to make research code transparent and permanent, and it means other users can access and use it, and the authors can forget about managing files for old papers on their machines and hard drives and providing their code and data over email ‘by request’. It also means the authors are not responsible for maintaining a repository indefinitely post-publication, as all the relevant code is safely stored at the doi, even if the repository is closed down.

Smartphone Spectrometry

The ubiquitous smartphone contains millions of times more computing power than was used to send the Apollo spacecraft to the moon. Increasingly, scientists are repurposing some of that processing power to create low-cost, convenient scientific instruments. In doing so, these measurements are edging closer to being feasible for citizen scientists and under-funded professionals, democratizing robust scientific observations. In our new paper in the journal ‘Sensors’, led by Andrew McGonigle (University of Sheffield) we review the development of smartphone spectrometery.

smartphone
Created by Natanaelginting – Freepik.com
Abstract: McGonigle et al. 2018: Smartphone Spectrometers

Smartphones are playing an increasing role in the sciences, owing to the ubiquitous proliferation of these devices, their relatively low cost, increasing processing power and their suitability for integrated data acquisition and processing in a ‘lab in a phone’ capacity. There is furthermore the potential to deploy these units as nodes within Internet of Things architectures, enabling massive networked data capture. Hitherto, considerable attention has been focused on imaging applications of these devices. However, within just the last few years, another possibility has emerged: to use smartphones as a means of capturing spectra, mostly by coupling various classes of fore-optics to these units with data capture achieved using the smartphone camera. These highly novel approaches have the potential to become widely adopted across a broad range of scientific e.g., biomedical, chemical and agricultural application areas. In this review, we detail the exciting recent development of smartphone spectrometer hardware, in addition to covering applications to which these units have been deployed, hitherto. The paper also points forward to the potentially highly influential impacts that such units could have on the sciences in the coming decades

New spec-tech! A smartphone-based UV spectrometer

In our new paper we report on some novel tech that uses the sensor in a smartphone for ultraviolet spectroscopy. It is low cost and based entirely on off-the-shelf components plus a 3-D printed case. The system was designed with volcanology in mind – specifically the detection of atmospheric sulphur dioxide, but may also have applications for supraglacial spectroscopy. As far as we know this is the first nanometer resolution UV spectrometer based on smartphone sensor technology and the framework can be easily adapted to cover other wavelengths.

This follows on from a Raspberry-Pi based UV camera reported in Sensors last year which was recently adapted to sense in the visible and near-infra-red wavelengths for use on ice. The plan now is to compare the images from the Pi-cam system to those made using an off-the-shelf multispectral imaging camera that detects the same wavelengths. A report of testing this camera system for detecting volcanic gases is available at Tom Pering’s blog here.

Raspberry-Pi and smartphone based spectroscopy could make obtaining high-spectral resolution data a real possibility for hobbyists and scientists lacking sufficient funds to purchase an expensive field spectrometer. The system is also small and light and therefore more convenient for some field applications than the heavy and cumbersome field specs available commercially and can easily be mounted to a UAV.