Joining several data files with CDO and IsoFuse: Difference between revisions

From gfi
No edit summary
No edit summary
Line 23: Line 23:
While CDO gets the job done when working with many datafiles from the same instrument, it is more challenging to combine variables from several instruments in one file. Time intervals may be different, there may be duplicate variable names, different units, and so on. However, data analysis becomes much more simple and powerful once several instruments are on a common averaging time scale.
While CDO gets the job done when working with many datafiles from the same instrument, it is more challenging to combine variables from several instruments in one file. Time intervals may be different, there may be duplicate variable names, different units, and so on. However, data analysis becomes much more simple and powerful once several instruments are on a common averaging time scale.


At GFI, we have the python tool IsoFuse available to help with that work.
At GFI, we have the python tool IsoFuse available to help with that work. IsoFuse with some example usage scripts can be found at
 
<code>/Data/gfi/scratch/metdata/scripts/IsoFuse</code>
 
Check out <code>fuse_Husavik_IGP2018.sh</code> for an example shell script, that also includes the necessary module load statements. The project itself is hosted now on UiB gitlab at <url>https://git.app.uib.no/uib-gfi/isofuse</url>.


'''Purpose of IsoFuse'''
'''Purpose of IsoFuse'''

Revision as of 10:25, 14 April 2020

CDO mergetime command

Instruments often process their data on a daily or hourly basis. Longer analyses then require that many individual datafiles are merged together into one longer time series. With daily data files for several years, this quickly becomes a process that is difficult to handle.

It is then often easier to join several daily files into monthly or yearly data files. The command line tool CDO with command "mergetime" can help with that.

module load CDO
cdo mergetime <inputfiles.nc> <outputfile.nc>

The mergetime command will create one new output file with a common time axis. Therefore, all inputfiles must have a common time axis, and the same variables, essentially they must have the same netcdf file format.

Here is an example for how to create a monthly file from the daily TPS-3100 (Hotplate) data files:

cd /Data/gfi/scratch/metdata/hotplate/netcdf/2019/01
cdo mergetime TPS_201901* ../TPS_201901.nc

Joining data files from different instruments with common time averaging

While CDO gets the job done when working with many datafiles from the same instrument, it is more challenging to combine variables from several instruments in one file. Time intervals may be different, there may be duplicate variable names, different units, and so on. However, data analysis becomes much more simple and powerful once several instruments are on a common averaging time scale.

At GFI, we have the python tool IsoFuse available to help with that work. IsoFuse with some example usage scripts can be found at

/Data/gfi/scratch/metdata/scripts/IsoFuse

Check out fuse_Husavik_IGP2018.sh for an example shell script, that also includes the necessary module load statements. The project itself is hosted now on UiB gitlab at <url>https://git.app.uib.no/uib-gfi/isofuse</url>.

Purpose of IsoFuse

IsoFuse is a tool to join calibrated native-resolution vapour isotope data files with any existing auxiliary netCDF data files. This can include temperature and wind measurements, or other time series data. IsoFuse is a python script that needs to be run two times with a manual user intervention in between.

Functionality

Internally, IsoFuse creates a set of time series with a pre-defined time interval. For all selected variables of the provided input files, it is then averaged or interpolated centered on each time interval. In case of variables denoted as flags (discrete variables), the median rather than the mean is calculated. In case of insufficient data to average for a time interval, the variable with be interpolated using a linear interpolation (continuous variable) or a nearest-neighbor method (discrete variables).

IsoFuse provides some basic cross-correlation analysis using a moving window for selected variables. However, it is recommended to work in iterations with the fused output file to identify time shifts were needed using custom data analysis procedures.

Usage

For testing purposes it is quicker to run the procedure with a single day at first, before proceeding to the entier time series. Similarly, testing is substantially quicker with larger averaging time intervals.

The specific steps to fuse data are:

1. Calibrate Picarro vapour isotope data using FaVaCal to produce calibrated native-resolution netCDF files

2. run "cdo mergetime <isofiles>*.nc joint_iso.nc" to produce one single file covering the entire period.

3. run "cdo mergetime <auxfilex>*.nc joint_met.nc" to produce joint auxiliary netcdf file. There can be an unlimited number of netCDF files from different instruments.

4. Create a text file <files.txt> (ASCII format) with the full path and name of each input file to be joined in a separate line.

Example:

/path/to/file1.nc
/path/to/file2.nc
/path/to/file2.nc etc.

5. Run IsoFuse in scan mode to produce the configuration file <config.ini> for the datafiles in <files.txt> that are to be joined, where including option -v will give additional output. Example:

python IsoFuse.py scan [-v] files.txt config.ini

6. Edit the settings in file config.ini according to user needs. The file is organized in sections denoted by a pair of square brackets [], and parameter keywords within sections that are assigned values. Section names in square brackets must be unique within the file.

Example:

[SETTINGS]
time_delta = 30.0

7. Settings to be edited include:\

   - Section [SETTINGS]: overall fusing settings
       - time_delta: averaging time interval [s]
       - start_date: start of fusing time interval [yyyy-mm-dd HH:MM:SS]
       - end_date: end of fusing time interval [yyyy-mm-dd HH:MM:SS]
       - write_standard_deviations: True/False to include standard deviation from averaging intervals (ignoring NaN values)
   - Section [NCFILE_01]: file-specific settings for first file
       - file_id: file identifier, auto-generated, do not modify
       - filename: name of input file, auto-generated
       - start_date: start of fusing time interval for file 01
       - end_date: end of fusing time interval for file 01
       - average_frequency: calculated average data frequency, auto generated, do not modify
   - Section [VARIABLE_00-00]: variable-specific settings for first variable on first file
       - file_id: corresponding file identifier, auto-generated
       - var_id: global variable identifier, auto-generated
       - short_name: variable name on file, do not modify
       - short_rename: variable name on output file
       - long_name: long variable name on output file
       - standard_name: CF standard name on output file
       - units: variable units
       - offset: offset to be applied
       - scale: scaling factor to be applied
       - shift: time shift to be applied [s]
       - flag: True/False to denote non-continuous variable
   - Section [CORRELATE_01]: variable-window cross-correlation analysis for variables to identify time shifts
       - title: title of correlation plot
       - var_id_x: global variable identifier for x axis
       - var_id_y: global variable identifier for y axis

Section of an example file

# IsoFuse data fusing configuration file
[SETTINGS]
time_delta = 30.0
start_date = 2019-06-12 00:00:00
end_date = 2019-06-22 23:59:59
write_standard_deviation = True
# 
[NCFILE_00]
file_id = 0 # do not modify
filename = /Users/hso039/Library/Mobile Documents/com~apple~CloudDocs/projects/GFI_python_tools/iMet/iMet-XQ2-61124/2019/06/iMet-XQ2-61124_LEMON2019.nc
start_date = 2019-06-13 10:55:55
end_date = 2019-06-22 08:43:49
average_frequency = 0.251 # (Hz) do not modify
# 
[VARIABLE_00-00]
file_id = 0 # do not modify
var_id = 0  # do not modify
short_name = time # do not modify
short_rename = time
long_name =  time
standard_name = time
units = days since 1970-01-01 00:00:00.0
offset = 0
scale = 1
shift = 0
flag = 0
#
[CORRELATE_01]
# specify correlation analysis
title = Pressure correlation
var_id_x = 2
var_id_y = 14

8. Run IsoFuse in fuse mode to produce the fused output file <fused_output.nc> in netCDF format, where the option -v provides additional output during the operation. Example:

python IsoFuse.py fuse [-v] config.ini fused_output.nc

9. Check output using ncdump/ncview.

10. Done.