AI Adventures in Azure: Blob storage

My AI for Earth project is quite memory intensive so I have been learning about ways to take the data storage off the local disk and into the cloud, while still maintaining on the fly access to crucial files on my virtual or local machine. My classification problem started off requiring just a few GB of Sentinel-2 images, but now the challenge is to use Azure to scale to observing the entire western coast of the Greenland Ice Sheet and over time, so the amount of storage required has increased dramatically and it is no longer feasible to store everything locally. The best solution I have found so far is to use Azure blob storage.

Blobs are repositories of unstructured data held in the cloud that can usefully be accessed using a simple Python API via the command line or in-script. I have been using the sentinelsat python API to batch download Sentinel-2 images for a specific range of tiles and dates, processing them from the L1C product into the L2A product using ESA Sen2Cor, uploading the L2A product to blob storage and then deleting the files stored locally. On my virtual machine I have enough memory to store one month’s worth of imagery for one tile (bearing in mind the memory required is much greater than that of the final product since the zipped and unzipped L1C and the processed L2A product will exist for a while), meaning the script can download a full month’s images before sending to the blob store, flushing the local hard disk and beginning to download the mages for the subsequent month.

One difficulty I have come across is maintaining file structures when sending entire folders to blob storage programmatically. This is trivial when using the Azure Storage Explorer because the file structure is automatically maintained simply by dragging and dropping the folder into the appropriate blob container. However, the Python API does not allow direct upload of an entire folder to a blob container unless the individual files are uploaded individually without being arranged into parent folders. To achieve this programmatically, virtual folders need to be invoked. To do this, the folder and file paths are both provided in the call to the blob service, rather than the filename alone. This requires iterating through a list of folders, then iterating through the files in each folder in a sub-loop, each time appending the filename to the folder name and using the full path as the blob store destination.

I posted my solution to this on the Azure community forum.


Heliguy Blog: Drones for Climate

UK drone company Heliguy recently ran a blog article about my work with drones in the Arctic including on my Microsoft/National Geographic AI for Earth grant.

Drones have been increasingly important in my work on Arctic climate change, especially in mapping melting over glacier surfaces and as a way to link ground measurements with satellite remote sensing. I have recently passed the UK CAA Permissions for Commercial Operations assessments, so please reach out with projects and collaboration ideas related to drone photography or remote sensing.

Image taken from a quadcopter while mapping the ice surface, near point 660, Greenland Ice Sheet.





Eyes in the Sky 2: Airspace

Just like the land and oceans, the sky is divided into regulated regions. This makes sense, as it prevents unauthorised flights over sensitive and/or dangerous areas like airports, military zones, power stations, private land etc. Knowing the airspace classification is a fundamental prerequisite for making safe and legal flights with an unmanned aerial system (UAS).

In the UK, the Civil Aviation Authority defines airspace classes from A to G. There are specific permissions and restrictions associated with each class and they are mapped on VFR charts, for example here.

Class A airspace is the most heavily restricted and is less relevant for small UAS operators because only aircraft operating under IFR (instrument flight rules) are permitted to fly – limiting users mainly to commercial and private jets. Generally Class A starts from 18000 feet above mean sea level.

There is no Class B airspace in the UK, but it is commonly used to restrict airspace around large airports in the US.

Class C airspace usually extends vertically from 19,500 feet to 60,000 feet. It is permitted to fly using both instrument and visual flight rules (IFR and VFR) but clearance from air traffic control is necessary to enter. It is unlikely that a small UAS operator could end up in Class C airspace for several reasons, but especially because it would be very difficult to climb to 19500 feet!

Class D airspace is also available for VFR and IFR flights with clearance from air traffic control and at a speed less than 250 knots when flying below 10,000 feet. Typically the airspace around aerodromes (any location where flight operations occur) are Class D.

Class E airspace is also available for IFR and VFR use. Aircraft flying under VFR do not need clearance or two-way radio communication with air traffic control to enter but the pilot must comply with instructions from air traffic control.

Class G airspace is unregulated, meaning UAS pilots can fly as they please as long as the flight is within visual line of sight up to a maximum of 400 ft vertical and 500 m horizontal distance from the pilot in command and in accordance with the regulations set out in CAP 393 Articles 94 and 95 and CAP 722, for example being at least 50 m from any person, obstacle or vessel not under the direct control of the pilot, and at least 150 m from congested areas or open air gatherings of <1000 people.



Eyes in the Sky 1: METAR

I’m currently studying for my CAA permission for commercial operations (PfCO) – what is commonly thought of as the UK drone pilot’s license. Flying small unmanned aerial systems (SUAS) is an increasingly common part of field science especially in polar science where a) scaling in-field observations over space is critical, b) we rely heavily on satellite observations that require sub-pixel validation, and c) it is often hazardous to manually survey areas that can be easily surveyed using a UAS. We an achieve so much more when we have eyes in the sky as well as feet on the ground. Legislation covering SUAS users is also changing rapidly and is likely to (rightly) become much stricter in the near future. I plan to write about several aspects of the PfCO that are relevant to UAS flights in polar regions, partly for interest and partly as a revision tool for myself in preparation for the PfCO assessment!

One of the most important aspects of flying anywhere, and especially in polar regions, is up to date and accurate information about the weather. Airports and many weather stations report current and forecast weather conditions in a condensed format known as METAR, or the slightly more in-depth TAF. METAR stands for Meteorological Aerodrome Report, and TAF stands for Terminal Aerodrome Forecast. As well as being a standard aeronautical system, I think using METAR symbology would make an excellent way to log detailed meteorological observations in metadata for field scientists.

Below is a METAR forecast for 27th March 2019 for the airport in Longyearbyen, Svalbard:

ENSB 270850Z 13020KT 9999 FEW025 SCT080 M05/M12 Q0994 NOSIG RMK WIND 1400FT 12017KT


The METAR starts with a four letter location code: ENSB is the code for the Longyearbyen airport. Then the date and time that the forecast was posted, starting with the day of the month and the time in HHMM format followed by ‘Z’. The Z indicates that the forecast is in Greenwich Mean Time or simply “Zulu”, so this forecast was posted on 27th march at 0850 GMT. As pointed out by @arwynedwards on Twitter, this is a deviation from the NATO standard format (270850Z MAR19) by omitting the month and year information. I suspect this is because the high frequenc of METAR updates makes this information largely redundant. For recording field meteorology it might be more useful to use the NATO standard – while the month and year of the field work may usually be obvious on an individual project basis, it could be crucial info when collating data from a range of sources.

The third block of characters describes the wind speed, with the first three numbers showing the direction the wind is coming from (in this case 130 degrees, or approximately South East). The last two digits are the speed (20) and KT shows that the speed is measured in knots.

The next set of four digits shows the visibility in statute miles or kilometers. In this case the visibility is greater than the maximum shown on a METAR so it is recorded as 9999, which can be interpreted as greater than 10km.

Next is cloud conditions. This is achieved using a set of three letter codes offering a qualitative description of the cloud cover. In this case FEW means there are a few clouds and their height in feet/100 is shown. Here the few clouds sit at 2500ft (025*100). there are also scattered clouds sitting at 8000 ft (080*100).

The temperature and dew point are described using M for minus, so here the temperature is -5 C an the dew point is -12 C. When these values are similar (say, within 3 C) we need to worry about mist, fog or precipitation.

QO994 shows that the pressure is 994 hPa. NOSIG is a flag to suggest that no significant changes to these conditions are expected over the forecast period.

The rest of the information in the METAR is classified as “remarks” as signified by the code RMK. In this case the remark is that the winds aloft at 1400 ft are much stronger and coming from a different direction to the winds at surface. In this case, 17 knots coming from 120 degrees.

Overall, this looks like a good day to fly in terms of high visibility and low chance of precipitation, but the low temperatures will reduce the battery life and risk icing, and the wind speed is just outside of the safe flight envelope for many small UASs including the DJI Mavic and Phantom series. For those reasons I’d call a NO-GO for a small quadcopter flight.

The power of the METAR is that all that information can be conveyed in a simple string of unambiguous characters. They are frequently updated to reflect changing conditions and forecasts. This METAR was sourced from who provide up to date METAR for over 4000 stations.


AI Adventures in Azure: Ice Surface Classifiers

For this post I will introduce what I am actually trying to achieve with the AI for Earth grant and how it will help us to understand glacier and ice sheet dynamics in a warming world.

The Earth is heating up – that’s a problem for the parts of it made of ice. Over a billion people rely directly upon glacier fed water for drinking, washing, farming or hydropower. The sea level rise resulting from the melting of glaciers and ice sheets is one of the primary species level existential risks we face as humans in the 21st century, threatening lives, homes, infrastructures, economies, jobs, cultures and traditions. It has bee projected that $14 trillion could be wiped off the global economy annually by 2100 due to sea level rise. The major contributing factors are thermal expansion of the oceans and melting of glaciers and ice sheets, which in turn is primarily controlled by the ice albedo, or reflectivity. However, our understanding of albedo for glaciers and ice sheets is still fairly basic. Our models make drastic assumptions about how the albedo of glaciers behaves, some assign a constant value to it, some assume it varies as a simple function of exposure time in the summer, and the more sophisticated models use radiative transfer but on the assumption that the ice behaves in the same way as snow (i.e. it can be adequately represented as a collection of tiny spheres). Our remote sensing products also struggle to resolve the complexity of the ice surface and fail to detect the albedo reducing processes operating there, for example the accumulation of particles and growth of algae on the ice surface, and the changing structure of the ice itself. This limits our ability to observe the ice surface changing over time and to attribute melting to specific processes that would enable us to make better predictions of melting – and therefore sea level rise – into the future.

Aerial view of a field camp on the Greenland Ice Sheet in July 2016. The incredible complexity of this environment is clear – there are areas of bright ice, standing water, melt streams, biological aggregates known as cryoconites and areas of intense contamination with biological growth, mineral dust and soots – none of which is resolved by our current models or remote sensing but all of which affect the rate of glacier melting.

I hope to contribute to tackling this problem with AI for Earth. My idea is to use a form of machine learning known as supervised classification to map ice surfaces from drone images and then at the scale of entire glaciers and ice sheets using multispectral data from the European Space Agency’s Sentinel-2 satellite. The training data will come from spectral measurements made on the ice surface that match the wavelengths of the UAV and Sentinel sensors. I’ll be writing the necessary code in Python and processing the imagery in the cloud using Microsoft Azure, with the aim of gaining new insights into glacier and ice sheet melting and developing an accessible API to host on the AI for Earth API hub. I have been working on this problem for a while and the code (in active development) is being regularly updated on my Github repository. A publication is currently under review.

I have already posted about my Azure setup and some ways to start programming in Python on Azure virtual machines, and from here on in the posts will be more about coding specifically for this project.

National Geographic Explorers Festival London

A few weeks ago I had the pleasure of presenting at the National Geographic Explorer’s Festival in London. This was an amazing opportunity to meet the inspirational Explorers and listen to them talk about AI solutions to conservation problems around the world. In the afternoon I spoke about my work on machine learning and remote sensing for monitoring glacier and ice sheet melting, and then participate in a panel discussion about the challenges of applying AI to environmental problems. The event was livestreamed and is now archived here (my part starts at 1:48).

The work I presented is supported by Microsoft and National Geographic through their AI for Earth scheme.


AI Adventures in Azure: Ways to Program in Python on the DSVM

Having introduced the set up and configuration of a new virtual machine and the ways to interact with it, I will now show some ways to use it to start programming in Python. This post will assume that the VM is allocated and that the user is accessing the VM using a remote desktop client.

1. Using the terminal

I am running an Ubuntu virtual machine, so the command line interface is referred to as the terminal. The language used to make commands is (usually) “bash”. Since the package manager Anaconda is already installed on the data science VM, it is very easy to start building environments and running Python code in the terminal. Here is an example where I’m creating a new environment called “AzurePythonEnv” that includes some popular packages:

>> conda create -n AzurePythonEnv python=3.6 numpy matplotlib scikit-learn pandas

Now this environment can be activated any time via the terminal:

>> source activate AzurePythonEnv

Now, with the environment activated, python code can be typed directly into the terminal, or scripts can be written as text files (e.g. using the pre-installed text editors Atom or Vim) and called from the terminal:

>> python /data/home/tothepoles/Desktop/script.txt


2. Using an IDE

The data science VM includes several IDEs that can be used for developing Python Code. My preferred option at the moment in PyCharm, but Visual Studio Code is also excellent and I can envisage using this as my primary IDE later on. IDEs are available under Applications > Development in the desktop toolbar or accessible via the command line. IDEs for other languages are also pre-installed on the Linux DSVM including R-Studio. Simply open the preferred IDE and start programming. In PyCharm the bottom frame in the default view can be toggled between the terminal and the python console. This means new packages can be installed into your environment and new environments created and removed from within the IDE, along with all the other functions associated with the command line. The basic workflow for programming in the IDE is to start a new project, link it to your chosen development environment, write scripts in the editor window then run them (optionally running them in the console so that variables and datasets remain accessible after the script has finished running).

Screenshot from 2019-03-15 09-46-52
Development in the PyCharm IDE

3. Using Jupyter Notebooks

Jupyter notebooks are applications that allow active code to be run in a web browser, and the outputs displayed interactively within the same window. They are a great way to make code accessible to other users. The code is written nearly indentically to a normal python script except that it is divided into individual executable cells. Jupyter notebooks can be run in the cloud using Azure notebooks, making it easy to access Azure data storage, configure custom environments, deploy scripts and present it as an accessible resource hosted in the cloud. I will be writing more about this later as I develop my own APIs on Azure. For now, the Azure Notebook documentation is here. On the DSVM JupyterLab and Jupyter Notebooks are preinstalled and accessed simply by typing the command

>> jupyter notebook
Screenshot from 2019-03-15 09-49-48
A Jupyter notebook running in a web browser