AI Adventures in Azure: Accessing the VM via terminal or remote desktop

Accessing the Data Science Virtual Machine

Once the virtual machine is set up and started (by clicking “start” on the appropriate VM in the Azure portal) there are several ways to interface with it. The first is via the terminal (I am running Ubuntu 16.04 on both my local machine and the virtual machine). To connect to the virtual machine from the terminal, we can use secure shell, or SSH. This requires a set of keys which are used for encryption and decryption and keep the connection between the local and virtual machine secure. These keys are unique to your system, and they need to be generated. This can be done using the command line.

Generating ssh keys:

Option 1 is to use the terminal on your local machine. In Ubuntu, the following command will generate an RSA key pair (RSA is a method of encryption named after Rivest, Shamir and Adleman who first proposed it) with a length of 2048 bits:

ssh-keygen -t rsa -b 2048

Alternatively, the Azure command line interface (azure CLI) can be used. The Azure CLI is a command line service that can be installed to run from the existing terminal or it can also run in a web browser and is used to send commands directly to the virtual machine in an Azure-friendly syntax. To create the ssh key pair in Azure CLI:

az vm create –name VMname –resource-group RGname –generate-ssh-keys

Regardless of the method used to generate them, ssh key pairs are stored by default into

~/.ssh

and to view the key the following bash command can be used

cat ~/.ssh/id_rsa.pub

The key values displayed by this command should be stored somewhere secure for later use. The ssh keys enable access to the VM through the command line (local terminal or Azure CLI). Alternatively, the virtual machine can be configured with a desktop that can be accessed using a remote desktop client. This requires some further VM configuration:

 

To set up remote desktop

The ssh keys created earlier can be used to access the VM through the terminal. Then, the terminal can be used to install a desktop GUI to the VM. I chose the lightweight GUI LXDE to run on my Ubuntu VM. To install LXDE use the command:

sudo apt-get install lxde -y

To install the remote desktop support for LXDE:

sudo apt-get install xrdp -y

Then start XRDP running on the VM:

/etc/init.d/xrdp start

Then the VM needs to be configured to enable remote desktop. This cna be done via the Azure portal (portal.azure.com). Login using Azure username and password, start the VM by clicking “start” on the dashboard. Then navigate to the inbound security rules:

resource group > network security > inbound security rules > add >

A list of configuration options is then available, they should be updated to the following settings:

source: any

source port ranges: *

Destination: any

destination port ranges: 3389

protocol: TCP

Action: Allow

 

Finally,  a remote desktop client is required on the local machine. I chose to use X2Go client available from the Ubuntu software centre or can be installed in the terminal using apt-get. After The RDC is installed, the system is ready for remote access to the VM using a desktop GUI.

Remote Access to VM using Desktop GUI:

  1. The VM must first be started – this can be done via the Azure portal after logging in with the usual Azure credentials (username and password) and clicking “start” on the dashboard. Copy the VM IP addres to the clipboard.
  2. Open X2Go Client and comfigure a new session:
    1. Host = VM ip address
    2. Login = Azure login name
    3. SSH port: 22
    4. Session Type = XFCE
  3. These credentials can be saved under a named session so logging in subsequently just requires clicking on the session icon in X2Go (although the ip address for the VM is dynamic by default so will need updating each time).
  4. A LXDE GUI will open!

 

Remember that closing the remote desktop does not stop the Azure VM – the VM must be stopped by clicking “stop” on the dashboard on the Azure portal.

Advertisements

Ice Alive: Grants!

Ice Alive has a life of it’s own – no longer just a film, now an organization that exists to promote emerging scientists and communicators working on Earth’s changing ice and snow. Our website (icealive.org) is almost ready to launch, and we have just announced our inaugural Ice Alive grant scheme!

ICEALIVE

The grant will support 2-4 individuals or teams that have a novel idea for communicating cryospheric science on the broad theme of “Ice Alive”. We hope to see applications from artists, performers, musicians, writers, educators, journalists, scientists – anyone who has a great idea for spreading cryospheric science to new audiences in exciting ways.

All the details are HERE – please spread the word and/or apply yourself before 31st July 2018.

Bio-co-albedo?

At EGU I had the pleasure of talking about BioSNICAR and biological albedo reduction with two of the big-names in albedo research. A very interesting point they raised was that the term ‘bioalbedo’ does not precisely describe the concept that it is attached to. This is true. The term bioalbedo was not coined by spectroscopy or remote sensing experts, but by microbiologists and glaciologists, and is now well-baked into the literature. I will outline here the reasons why we should be cautious of this terminology.

Albedo is the survival probability of a photon entering a medium. Light incident upon a material partly reflects from the upper surface, the remainder enters the medium and can scatter anywhere there is a change in the refractive index (e.g. a boundary between air and ice, or ice and water, etc). Where there are opportunities for scattering, light bounces around in the medium, sometimes preferentially in a certain direction depending upon the optical properties of the medium (ice is forward-scattering) but always changing direction to some extent each time it scatters, until it is either absorbed or it escapes back out of the medium travelling in a skywards direction. The albedo of the material is the likelihood that the down-welling light entering the medium exits again later as up-welling light. The more strongly absorbing the material, the more likely the light is to be absorbed before exiting. Ice is very weakly absorbing in blue wavelengths (~400 nm), becoming generally more strongly absorbing at longer wavelengths into the near infra-red (hence ice often appearing blue). Solar energy is mostly concentrated within the wavelength range 300 – 5000 nm and the term albedo concerns the survival probability of all photons with wavelengths within this range either at a particular wavelength (spectral albedo) or integrated over the entire solar spectrum (broadband albedo).

This means that a photon entering a material with a broadband albedo of 0.8 has an 80% chance of exiting again. Therefore, when a material is bombarded with billions of photons, 80% of them are returned skywards and 20% are absorbed, and the surface appears bright. A lower albedo therefore means less likelihood of photon survival.

For a single material, its absorbing and scattering efficiencies are described using the scattering and absorption coefficients. The ratio of these two coefficients is known as the single scattering albedo (SSA), which is a crucial term for radiative transfer. A higher SSA is associated with a greater likelihood of a particle scattering a photon rather than absorbing it. a particle with SSA = 1 is non-absorbing.

Therefore, with these definitions we can see why the term bio-albedo is not semantically perfect. The term bio-albedo implies that the relevant measurement is the light reflected from biological cells, which is really the inverse of the measurement of interest. Algal cells are strongly absorbing and their effect on snow and ice albedo is to increase the likelihood of a photon being absorbed rather than scattered back out of the medium. For this reason, the better term to use would be bio-co-albedo, where co-albedo describes the fraction of incident energy absorbed by the particles (i.e. 1-SSA).

Bio-co-albedo is more technically correct terminology, but it is also quite a subtle distinction, and arguably if we have calculated the single scattering albedo, we have by default calculated the co-albedo (co-albedo = 1 – single scattering albedo), and the outcome is the same. The meaning of the term ‘bio-co-albedo’ is not obvious to those outside of spectroscopy and remote sensing communities, which i think is a major issue since the topic is so broadly interdisciplinary. The more aesthetic and simpler ‘bio-albedo’ is justified in most cases, especially because it is already well-used in the literature and more widely accessible. From a utilitarian perspective, bio-albedo wins out.

As an aside, it reminds me that I have often wondered whether ‘evolution’ is really an acceptable word for cryosphere scientists to use to describe the temporal development of – for example – a snowpack or ice surface. Evolution implies changes resulting from inherited characteristics passed through successive generations plus random mutations that are selected for or against based on goodness of fit for the specific environment. A melting snowpack cannot ‘evolve’ as there are no ancestors, no selection, no inheritance, no generations. People also age over time, influenced by external factors, but we do not describe individuals as evolving – same applies to a snowpack or glacier. Overall, I suspect splitting hairs over terms like bio-co-albedo does more to dissuade non-specialists from joining the conversation than it does to improving understanding of the processes involved.

ASD spectra processing with Linux & Python

I’m sharing my workflow for processing and analysing spectra obtained using the ASD Field Spec Pro, partly as a resource and partly to see whether others have refinements or suggestions for improving the protocols. I’m specifically using Python rather than any proprietary software to keep it all open source, transparent and to keep control over every stage of the processing..

Working with .asd files

By default the files are saved as a filetype with the extension .asd which can be read by the ASD software ‘ViewSpec’. The software does allow the user to export the files as ascii using the “export as ascii” option in the dropdown menus. My procedure is to use this option to resave the files as .asd.txt. I usually keep the metadata by selecting the header and footer options; however I deselect the option to output the x-axis because it is common to all the files and easier to add once later on. I choose to delimit th data using a comma to enable the use of Pandas ‘read_csv’ function later.

To process and analyse the files I generally use the Pandas package in Python 3. To read the files into Pandas I first rename the files using a batch rename command in the Linux terminal:

cd /path/folder/

rename “s/.asd.txt/.txt/g”**-v

Then I open a Python editor – my preference is to use the Spyder IDE that comes as standard with an Anaconda distribution. The pandas read_csv function can then be used to read the .txt files into a dataframe. Put this in a loop to add all the files as separate columns in the dataframe…

import pandas as pd

import os

spectra = pd.DataFrame()

filelist = os.listdir(path/folder/)

for file in filelist:

spectra[file] = pd.read_csv(‘/path/folder/filename’, header=None, skiprows=0)

If you chose to add any header information to the file exported from ViewSpec, you can ignore it by skipping the appropriate number of rows in the read_csv keyword argument ‘skiprows’.

Usually each acquisition comprises numerous individual replicate spectra. I usually have 20 replicates as a minimum and then average them for each sample site. Each individual replicate has its own filename with a dequentially increasing number (site1…00001, site1….00002, site1…00003 etc). My way of averaging these is to cut the extension and ID number from the end of the filenames, so that the replicates from each sample site are identically named. Then the pandas function ‘groupby’ can be used to identify all the columns with equal names and replace them with a single column containing the mean of all the replicates.

filenames = []

for file in filelist:

file = str(file)

file = file[:-10]

filenames.append(file)

#rename dataframe columns according to filenames

filenames = np.transpose(filenames)

DF.columns = [filenames]

# Average spectra from each site

DF2 = DF.transpose()

DF2 = DF2.groupby(by=DF2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0],numbers.Number) else g.iloc[0])

DF = DF2.transpose()

Then I plot the dataset to check for any errors or anomalies, and then save the dataframe as one master file organised by sample location

spectra.plot(figsize=(15,15)),plt.ylim(0,1.2)

spectra.to_csv(‘/media/joe/FDB2-2F9B/2016_end_season_HCRF.csv’)

Common issues and workarounds…

Accidentally misnamed files

During a long field season I sometimes forget to change the date in the ASD software for the first few acquisitions and then realise I have a few hundred files to rename to reflect the actual date. This is a total pain, so here is a Linux terminal command to batch rename the ASD files to correct the data at the beginning of the filename.

e.g. to rename all files in folder from 24_7_2016 accidentally saved with the previous day’s date, run the following command…

cd /path/folder/

rename “s/23_7/24_7/g” ** -v

Interpolating over noisy data and artefacts

On ice and snow there are known wavelengths that are particularly susceptible to noise due to water vapour absorption (e.g. near 1800 nm) and there may also be noise at the upper and lower extremes of the spectra range measured by the spectrometer. Also, where a randomising filter has not been used to collect spectra, there can be a step feature present in the data at the crossover point between the internal arrays of the spectrometer (especially 1000 nm). This is due to the spatial arrangement of fibres inside the fibre optic bundle. Each fibre has specific wavelengths that it measures, meaning if the surface is not uniform certain wavelengths are over sampled and others undersampled for different areas of the ice surface. The step feature is usually corrected by raising the NIR (>1000) section to meet the VIS section (see Painter, 2011). The noise in the spectrum is usually removed and replaced with interpolated values. I do this in Pandas using the following code…

for i in DF.columns:

# calculate correction factor (raises NIR to meet VIS – see Painter 2011)

corr = DF.loc[650,i] – DF.loc[649,i]

DF.loc[650:2149,i] = DF.loc[650:2149,i]-corr

# interpolate over instabilities at ~1800 nm

DF.loc[1400:1650,i] = np.nan

DF[i] = DF[i].interpolate()

DF.loc[1400:1600,i] = DF.loc[1400:1600,i].rolling(window=50,center=False).mean()

DF[i] = DF[i].interpolate()

The script is here for anyone interested… https://github.com/jmcook1186/SpectraProcessing

Ice Alive: Uncovering the secrets of Earth’s Ice

In collaboration with Rolex Awards for Enterprise, Proudfoot Media and I have produced a documentary film explaining the latest research into the surprising hidden biology shaping Earth’s ice. The story is told by young UK Arctic scientists with contributions from guests including astronaut Chris Hadfield and biologist Jim Al-Khalili. We went to great lengths to make this a visually striking film that we hope is a pleasure to watch and communicates the otherwordly beauty and incredible complexity of the Arctic glacial landscape. We aim to educate, entertain and inspire others into exploring and protecting this most sensitive part of our planet in their own ways.

We think the film is equally suited to the general public as school and university students, and we are delighted to make this a free-to-all teaching resource. Please watch, share and use!

 

Alongside this film, I also collaborated with musician Hannah Peel on an audiovisual piece designed to communicate the complexity of process occurring on the Greenland Ice Sheet through sound. View the piece (good headphones recommended!) and write up here