ASD spectra processing with Linux & Python

I’m sharing my workflow for processing and analysing spectra obtained using the ASD Field Spec Pro, partly as a resource and partly to see whether others have refinements or suggestions for improving the protocols. I’m specifically using Python rather than any proprietary software to keep it all open source, transparent and to keep control over every stage of the processing..

Working with .asd files

By default the files are saved as a filetype with the extension .asd which can be read by the ASD software ‘ViewSpec’. The software does allow the user to export the files as ascii using the “export as ascii” option in the dropdown menus. My procedure is to use this option to resave the files as .asd.txt. I usually keep the metadata by selecting the header and footer options; however I deselect the option to output the x-axis because it is common to all the files and easier to add once later on. I choose to delimit th data using a comma to enable the use of Pandas ‘read_csv’ function later.

To process and analyse the files I generally use the Pandas package in Python 3. To read the files into Pandas I first rename the files using a batch rename command in the Linux terminal:

cd /path/folder/

rename “s/.asd.txt/.txt/g”**-v

Then I open a Python editor – my preference is to use the Spyder IDE that comes as standard with an Anaconda distribution. The pandas read_csv function can then be used to read the .txt files into a dataframe. Put this in a loop to add all the files as separate columns in the dataframe…

import pandas as pd

import os

spectra = pd.DataFrame()

filelist = os.listdir(path/folder/)

for file in filelist:

spectra[file] = pd.read_csv(‘/path/folder/filename’, header=None, skiprows=0)

If you chose to add any header information to the file exported from ViewSpec, you can ignore it by skipping the appropriate number of rows in the read_csv keyword argument ‘skiprows’.

Usually each acquisition comprises numerous individual replicate spectra. I usually have 20 replicates as a minimum and then average them for each sample site. Each individual replicate has its own filename with a dequentially increasing number (site1…00001, site1….00002, site1…00003 etc). My way of averaging these is to cut the extension and ID number from the end of the filenames, so that the replicates from each sample site are identically named. Then the pandas function ‘groupby’ can be used to identify all the columns with equal names and replace them with a single column containing the mean of all the replicates.

filenames = []

for file in filelist:

file = str(file)

file = file[:-10]

filenames.append(file)

#rename dataframe columns according to filenames

filenames = np.transpose(filenames)

DF.columns = [filenames]

# Average spectra from each site

DF2 = DF.transpose()

DF2 = DF2.groupby(by=DF2.index, axis=0).apply(lambda g: g.mean() if isinstance(g.iloc[0,0],numbers.Number) else g.iloc[0])

DF = DF2.transpose()

Then I plot the dataset to check for any errors or anomalies, and then save the dataframe as one master file organised by sample location

spectra.plot(figsize=(15,15)),plt.ylim(0,1.2)

spectra.to_csv(‘/media/joe/FDB2-2F9B/2016_end_season_HCRF.csv’)

Common issues and workarounds…

Accidentally misnamed files

During a long field season I sometimes forget to change the date in the ASD software for the first few acquisitions and then realise I have a few hundred files to rename to reflect the actual date. This is a total pain, so here is a Linux terminal command to batch rename the ASD files to correct the data at the beginning of the filename.

e.g. to rename all files in folder from 24_7_2016 accidentally saved with the previous day’s date, run the following command…

cd /path/folder/

rename “s/23_7/24_7/g” ** -v

Interpolating over noisy data and artefacts

On ice and snow there are known wavelengths that are particularly susceptible to noise due to water vapour absorption (e.g. near 1800 nm) and there may also be noise at the upper and lower extremes of the spectra range measured by the spectrometer. Also, where a randomising filter has not been used to collect spectra, there can be a step feature present in the data at the crossover point between the internal arrays of the spectrometer (especially 1000 nm). This is due to the spatial arrangement of fibres inside the fibre optic bundle. Each fibre has specific wavelengths that it measures, meaning if the surface is not uniform certain wavelengths are over sampled and others undersampled for different areas of the ice surface. The step feature is usually corrected by raising the NIR (>1000) section to meet the VIS section (see Painter, 2011). The noise in the spectrum is usually removed and replaced with interpolated values. I do this in Pandas using the following code…

for i in DF.columns:

# calculate correction factor (raises NIR to meet VIS – see Painter 2011)

corr = DF.loc[650,i] – DF.loc[649,i]

DF.loc[650:2149,i] = DF.loc[650:2149,i]-corr

# interpolate over instabilities at ~1800 nm

DF.loc[1400:1650,i] = np.nan

DF[i] = DF[i].interpolate()

DF.loc[1400:1600,i] = DF.loc[1400:1600,i].rolling(window=50,center=False).mean()

DF[i] = DF[i].interpolate()

The script is here for anyone interested… https://github.com/jmcook1186/SpectraProcessing

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s