Initializing a new data release#

Configuration repository#

The glamod-marine-config repository (glamod/glamod-marine-processing) serves as container for the configuration used to create the different data releases for C3S. The Observations Suite configuration files are stored in obs-suite/release-update/dataset directories within this repository.

Currently, the following configuration sets are available:

Title#

Data release

Path in repo/obs-suite

Marine code version

r092019

r092019-000000/ICOADS_R3.0.0T

v1.0

release_2.0

release_2.0-000000/ICOADS_R3.0.0T

v1.1

Demo release

release_demo-000000/ICOADS_R3.0.0T

v1.1

Previous release

release_7.0/000000/ICOADS_R3.0.2T

v7.0.1

Stable release

release_8.0/000000/ICOADS_R3.0.0T

v8.0.0

release_8.0/000000/ICOADS_R3.0.2T

v8.0.0

release_8.0/000000/C-RAID_1.2

v8.0.0

Up until v1.1 (release_2.0), the configuration files were not maintained in the configuration repository, but in the code repository. They have been now included in the configuration repository for traceability. It is also worth noting, that some changes have been made to the configuration files after v1.1: the format in the Demo release files must be applied when running the observations suite.

Create the configuration files for the release and dataset#

Every data release is identified in the file system with the following tags:

  • release: release name (eg. release_7.0)

  • update: update tag (eg. 000000)

  • dataset: dataset name (eg. ICOADS_R3.0.2T)

Create a new directory release-update/dataset/ in the obs-suite configuration directory (config_directory) of the configuration repository (note the hyphen as field separator between release and update). We will now refer to this directory as release_config_dir.

The files described in the following sections need to be created, with the Release periods file and the Process list file required from the setup of the new data release. The rest of the files can be generated as the processing gets to the corresponding level.

The sample files in the following sections can be found in the release_demo directory of the configuration repository.

Release periods file#

Create file release_config_dir/source_deck_periods.json

This file is a json file with each of the source-deck partitions to be included in the release, and the associated periods (year resolution) to process.

The figure below shows a sample of this file:

{
  "year_init": 2014,
  "year_end": 2024
}

Process list file#

Create file release_config_dir/source_deck_list.txt

This is a simple ascii file with the list of source-deck partitions to process. Create the master list with the keys of file source_deck_periods.json. This file can later be subsetted if a given process is to be run in batches.

The figure below shows a sample of this file:

103-792
103-793
103-794
103-795
103-797
114-992
114-993
114-994
114-995
172-798

Post process list#

Create file release-config_dir/source_deck_list_post.json

This is a simple ascii file with a single source-deck partition to process. Both source_deck_list and source_deck_list_pots are used in merge_suite or split_suite to merge multiple source-deck partitions into a single source-deck partition or to split a single source-deck partition into multiple source-deck partitions.

PT2

Level 1a configuration file#

Create file release_config_dir/level1a.json.

This file includes information on the initial dataset files data model(s), filters used to select reports and mapping to apply convert the data to the CDM.

The figure below shows a sample of this file:

{
  "process_list_file": "source_deck_list.txt",
  "release_periods_file": "source_deck_periods.json",
  "job_memo_mb": 16000,
  "job_time_hr": "01",
  "job_time_min": "30",
  "read_sections": [
    "core",
    "c1",
    "c98"
  ],
  "filter_reports_by": {
    "c1.PT": [
      "0",
      "1",
      "2",
      "3",
      "4",
      "5"
    ]
  },
  "103-792": {
    "data_model": "icoads_r302_d792"
  },
  "103-793": {
    "data_model": "icoads_r302_d793"
  },
  "103-794": {
    "data_model": "icoads_r302_d794"
  },
  "103-795": {
    "data_model": "icoads_r302_d795"
  },
  "103-797": {
    "data_model": "icoads_r302_d797"
  },
  "114-992": {
    "data_model": "icoads_r302_d992"
  },
  "114-993": {
    "data_model": "icoads_r302_d993"
  },
  "114-994": {
    "data_model": "icoads_r302_d994"
  },
  "114-995": {
    "data_model": "icoads_r302_d995"
  },
  "172-798": {
    "data_model": "icoads_r302_d798"
  },
  "blacklisting": {
    "header": {
      "func": "do_blacklist",
      "params": {
        "id": [
          "core",
          "ID"
        ],
        "deck": [
          "c1",
          "DCK"
        ],
        "year": [
          "core",
          "YR"
        ],
        "month": [
          "core",
          "MO"
        ],
        "latitude": [
          "core",
          "LAT"
        ],
        "longitude": [
          "core",
          "LON"
        ],
        "platform_type": [
          "c1",
          "PT"
        ]
      }
    },
    "observations-dpt": {
      "func": "do_humidity_blacklist",
      "params": {
        "platform_type": [
          "c1",
          "PT"
        ]
      }
    },
    "observations-at": {
      "func": "do_mat_blacklist",
      "params": {
        "deck": [
          "c1",
          "DCK"
        ],
        "year": [
          "core",
          "YR"
        ],
        "latitude": [
          "core",
          "LAT"
        ],
        "longitude": [
          "core",
          "LON"
        ],
        "platform_type": [
          "c1",
          "PT"
        ]
      }
    },
    "observations-ws": {
      "func": "do_wind_blacklist",
      "params": {
        "deck": [
          "c1",
          "DCK"
        ]
      }
    }
  },
  "generic_ids": {
    "params": {
      "inid": [
        "core",
        "ID"
      ],
      "inyear": [
        "core",
        "YR"
      ]
    }
  }
}

This file has its default configuration parameters in the outer keys. Source-deck specific configuration can be applied by specifying a configuration parameter under a sid-dck key. In the sample given, all the source and decks will be processed with the default configuration, but using their source-deck partition specific data_model for mapping. Optionally, some specific reports or observations can be set on a blacklist or on a list of generic IDs. This is important for quality control in level1e.

Configuration parameters job* are only used by the slurm launchers, while the rest by the corresponding level1a.py script.

Level 1b configuration file#

Create file release_config_dir/level1b.json.

This file contains information on the NOC corrections version to be used and the correspondences between the CDM tables fields on which the corrections are applied and the subdirectories where these corrections can be found. The CDM history stamp for every correction is also configured in this file. Alternatively, you can use the duplicate checker from the cdm_reader_mapper module.

The figure below shows a sample of this file:

{
  "process_list_file": "source_deck_list_post.txt",
  "release_periods_file": "source_deck_periods.json",
  "job_memo_mb": 4000,
  "job_time_hr": "01",
  "job_time_min": "00",
  "correction_version": "null",
  "delete_no_obs": true,
  "duplicates": {
    "ignore_entries": {
      "primary_station_id": [
        "SHIP",
        "MASKSTID"
      ],
      "station_speed": "null",
      "station_course": "null"
    }
  }
}

This file has its default configuration parameters in the outer keys. Source-deck specific configuration can be applied by specifying a configuration parameter under a sid-dck key. In the sample above, only the default configuration is applied.

Configuration parameters job* are only used by the slurm launchers, while the rest by the corresponding level1b.py script.

Level 1c configuration file#

Create file release_config_dir/level1c.json.

This file contains information on the NOC corrections version to be used.

The figure below shows a sample of this file:

{
  "process_list_file": "source_deck_list_post.txt",
  "release_periods_file": "source_deck_periods.json",
  "job_memo_mb": 4000,
  "job_time_hr": "00",
  "job_time_min": "30",
  "noc_version": "v2025"
}

This file has its default configuration parameters in the outer keys. Source-deck specific configuration can be applied by specifying a configuration parameter under a sid-dck key. In the sample above, only the default configuration is applied.

Level 1d configuration file#

Create file release_config_dir/level1d.json.

This file contains information on the metadata sources that are merged into the level1c data. Currently the only MD source is the Pub47 files and the full process is basically tailored to Pub47 as pre-processed in NOC.

This file contains information of the subdirectory in the release data directory where the metadata can be found (“md_subdir”) and the name of the mapping within the Common Data Model mapper module used to map Pub47 to the CDM (“md_model”).

The level1d process will fail if it doesn’t find a metadata file for a month partition. To account for periods where metadata are not available, the following optional keys can be used:

  • “md_not_avail”: true indicates the process that for the full release period, there is not metadata available. Defaults to false.

  • “md_first_yr_avail”: indicates the first year for which metadata files should be available in the release period. Defaults to first year in the release period.

  • “md_last_yr_avail”: indicates the last year for which metadata files should be available in the release period. Defaults to last year in the release period.

By using the above keys, the process is indicated to securely progress data files to the next processing level without merging any metadata when it is not available.

The figure below shows a sample of this file:

{
  "process_list_file": "source_deck_list_post.txt",
  "release_periods_file": "source_deck_periods.json",
  "job_memo_mb": 4000,
  "job_time_hr": "00",
  "job_time_min": "30",
  "md_model": "pub47",
  "md_version": "v202501",
  "md_subdir": "Pub47",
  "md_first_yr_avail": 1956,
  "md_last_yr_avail": 2025
}

This file has its default configuration parameters in the outer keys. Source-deck specific configuration can be applied by specifying a configuration parameter under a sid-dck key. In the sample above, only the default configuration is applied.

Level 1e configuration file#

Create file release_config_dir/level1e.json.

The level1e specific quality control parameters included in this file are:

  • “history_explain” : text added to the header file history field when flags are merged.

  • “qc_settings”: Settings used by marine_qc.

    • “copies”: Skip quality control of the key observations, instead use quality flags of the value observation.

    • “individual_reports”: Settings applied on individual reports

      • “preprocessing”: Define external climtology files and read them as background climatologies

      • “header”: QC functions applied on header files

      • “observations”: QC functions applied on observations files

      • “combined”: Combined QC functions applied on two observation files

    • “sequential_reports”: Settings applied on tracks of primary_station_ids.

      • “header”: QC functions applied on header files

      • “observations”: QC functions applied on observations files

      • “combined”: Combined QC functions applied on two observation files

    • “grouped_reports”: Settings applied on grouped observations.

      • “preprocessing”: Define external climtology files and read them as background climatologies

      • “observations”: QC functions applied on observations files

The figure below shows a sample of this file:

{
  "process_list_file": "source_deck_list_post.txt",
  "release_periods_file": "source_deck_periods.json",
  "job_memo_mb": 4000,
  "job_time_hr": "00",
  "job_time_min": "59",
  "history_explain": "Position, tracking and parameter QC flags added",
  "qc_settings": {
    "copies": {
      "observations-wbt": "observations-dpt"
    },
    "individual_reports": {
      "return_method": "failed",
      "preprocessing": {
        "observations-at": {
          "climatology": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "ERAClimatologies/t2m_pentad_1by1marine_ERA-Interim_data_19792015.nc",
              "clim_name": "t2m_clims",
              "time_axis": "pentad_time",
              "target_units": "K",
              "source_units": "degC"
            }
          },
          "standard_deviation": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "ERAClimatologies/t2m_pentad_1by1marine_ERA-Interim_data_19792015.nc",
              "clim_name": "t2m_stdevs",
              "time_axis": "pentad_time"
            }
          }
        },
        "observations-dpt": {
          "climatology": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "ERAClimatologies/td2m_pentad_1by1marine_ERA-Interim_data_19792015.nc",
              "clim_name": "td2m_clims",
              "time_axis": "pentad_time",
              "target_units": "K",
              "source_units": "degC"
            }
          },
          "standard_deviation": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "ERAClimatologies/td2m_pentad_1by1marine_ERA-Interim_data_19792015.nc",
              "clim_name": "td2m_stdevs",
              "time_axis": "pentad_time"
            }
          }
        },
        "observations-sst": {
          "climatology": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "SST/HadSST2_daily_1x1_climatology.nc",
              "clim_name": "sst",
              "target_units": "K",
              "source_units": "degC"
            }
          },
          "standard_deviation": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "SST/HadSST2_pentad_stdev_climatology.nc",
              "clim_name": "sst"
            }
          }
        },
        "observations-slp": {
          "climatology": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "SLP/SLP_pentad_climatology.nc",
              "clim_name": "slp",
              "target_units": "Pa",
              "source_units": "hPa"
            }
          },
          "standard_deviation": {
            "func": "get_climatological_value",
            "names": {
              "lat": "latitude",
              "lon": "longitude",
              "date": "date_time"
            },
            "inputs": {
              "file_name": "SLP/SLP_pentad_stdev_climatology.nc",
              "clim_name": "slp",
              "target_units": "Pa",
              "source_units": "hPa"
            }
          }
        }
      },
      "header": {
        "position_check": {
          "POS": {
            "func": "do_position_check",
            "names": {
              "lat": "latitude",
              "lon": "longitude"
            }
          }
        },
        "time_check": {
          "DATE": {
            "func": "do_date_check",
            "names": {
              "date": "report_timestamp"
            }
          },
          "TIME": {
            "func": "do_time_check",
            "names": {
              "date": "report_timestamp"
            }
          }
        }
      },
      "observations": {
        "missing_values": {
          "MISSVAL": {
            "func": "do_missing_value_check",
            "names": {
              "value": "observation_value"
            }
          }
        },
        "observations-at": {
          "HLIMITS": {
            "func": "do_hard_limit_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "limits": [
                193.15,
                338.15
              ]
            }
          },
          "CLIM1": {
            "func": "do_climatology_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "climatology": "__preprocessed__",
              "maximum_anomaly": 10.0
            }
          },
          "CLIM2": {
            "func": "do_climatology_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "climatology": "__preprocessed__",
              "standard_deviation": "__preprocessed__",
              "standard_deviation_limits": [
                1.0,
                4.0
              ],
              "maximum_anomaly": 5.5
            }
          }
        },
        "observations-dpt": {
          "CLIM2": {
            "func": "do_climatology_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "climatology": "__preprocessed__",
              "standard_deviation": "__preprocessed__",
              "standard_deviation_limits": [
                1.0,
                4.0
              ],
              "maximum_anomaly": 5.5
            }
          }
        },
        "observations-slp": {
          "CLIM3": {
            "func": "do_climatology_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "climatology": "__preprocessed__",
              "standard_deviation": "__preprocessed__",
              "maximum_anomaly": 3.0,
              "lowbar": 1000.0
            }
          }
        },
        "observations-sst": {
          "HLIMITS": {
            "func": "do_hard_limit_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "limits": [
                268.15,
                318.15
              ]
            }
          },
          "FREEZE": {
            "func": "do_sst_freeze_check",
            "names": {
              "sst": "observation_value"
            },
            "arguments": {
              "freezing_point": 271.35,
              "freeze_check_n_sigma": 2.0
            }
          },
          "CLIM1": {
            "func": "do_climatology_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "climatology": "__preprocessed__",
              "maximum_anomaly": 8.0
            }
          }
        },
        "observations-wd": {
          "HLIMITS": {
            "func": "do_hard_limit_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "limits": [
                0,
                360
              ]
            }
          }
        },
        "observations-ws": {
          "HLIMITS": {
            "func": "do_hard_limit_check",
            "names": {
              "value": "observation_value"
            },
            "arguments": {
              "limits": [
                0.0,
                50.0
              ]
            }
          }
        },
        "combined": {
          "SUPERSAT": {
            "func": "do_supersaturation_check",
            "tables": {
              "dpt": "observations-dpt",
              "at2": "observations-at"
            },
            "names": {
              "dpt": "observation_value",
              "at2": "observation_value"
            },
            "get_flagged": [
              "observations-dpt"
            ]
          },
          "WIND": {
            "func": "do_wind_consistency_check",
            "tables": {
              "wind_speed": "observations-ws",
              "wind_direction": "observations-wd"
            },
            "names": {
              "wind_speed": "observation_value",
              "wind_direction": "observation_value"
            }
          }
        }
      }
    },
    "sequential_reports": {
      "header": {
        "TRACK": {
          "func": "do_track_check",
          "names": {
            "vsi": "station_speed",
            "dsi": "station_course",
            "lat": "latitude",
            "lon": "longitude",
            "date": "report_timestamp"
          },
          "arguments": {
            "max_direction_change": 60.0,
            "max_speed_change": 10.0,
            "max_absolute_speed": 40.0,
            "max_midpoint_discrepancy": 150.0
          }
        },
        "IQUAM": {
          "func": "do_iquam_track_check",
          "names": {
            "lat": "latitude",
            "lon": "longitude",
            "date": "report_timestamp"
          },
          "arguments": {
            "speed_limit": 60.0,
            "delta_d": 1.11,
            "delta_t": 0.01,
            "n_neighbours": 5
          }
        }
      },
      "observations": {
        "SPIKE": {
          "tables": [
            "observations-sst"
          ],
          "func": "do_spike_check",
          "names": {
            "value": "observation_value",
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time"
          },
          "arguments": {
            "max_gradient_space": 0.5,
            "max_gradient_time": 1.0,
            "delta_t": 2.0,
            "n_neighbours": 5
          }
        }
      },
      "combined": {
        "SAT": {
          "func": "find_saturated_runs",
          "tables": {
            "at": "observations-at",
            "dpt": "observations-dpt",
            "lat": "observations-at",
            "lon": "observations-at",
            "date": "observations-at"
          },
          "names": {
            "at": "observation_value",
            "dpt": "observation_value",
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time"
          },
          "arguments": {
            "min_time_threshold": 48.0,
            "shortest_run": 4
          },
          "get_flagged": [
            "observations-dpt"
          ]
        }
      }
    },
    "grouped_reports": {
      "buoy_dataset": "C-RAID_1.2",
      "buoy_dck": "202412",
      "preprocessing": {
        "observations-at": {
          "climatology": "__individual_reports__",
          "standard_deviation": "__individual_reports__",
          "stdev1": {
            "inputs": {
              "file_name": "SST/OSTIA_compare_1x1x5box_to_buddy_average.nc",
              "clim_name": "sst"
            }
          },
          "stdev2": {
            "inputs": {
              "file_name": "SST/OSTIA_compare_one_ob_to_1x1x5box.nc",
              "clim_name": "sst"
            }
          },
          "stdev3": {
            "inputs": {
              "file_name": "SST/OSTIA_buddy_range_sampling_error.nc",
              "clim_name": "sst"
            }
          }
        },
        "observations-sst": {
          "climatology": "__individual_reports__",
          "standard_deviation": "__individual_reports__",
          "stdev1": {
            "inputs": {
              "file_name": "SST/OSTIA_compare_1x1x5box_to_buddy_average.nc",
              "clim_name": "sst"
            }
          },
          "stdev2": {
            "inputs": {
              "file_name": "SST/OSTIA_compare_one_ob_to_1x1x5box.nc",
              "clim_name": "sst"
            }
          },
          "stdev3": {
            "inputs": {
              "file_name": "SST/OSTIA_buddy_range_sampling_error.nc",
              "clim_name": "sst"
            }
          }
        },
        "observations-dpt": {
          "climatology": "__individual_reports__",
          "standard_deviation": "__individual_reports__"
        },
        "observations-slp": {
          "climatology": "__individual_reports__",
          "standard_deviation": "__individual_reports__"
        }
      },
      "observations": {
        "BAYESIAN": {
          "tables": [
            "observations-at",
            "observations-sst"
          ],
          "func": "do_bayesian_buddy_check",
          "names": {
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time",
            "value": "observation_value"
          },
          "arguments": {
            "climatology": "__preprocessed__",
            "stdev1": "__preprocessed__",
            "stdev2": "__preprocessed__",
            "stdev3": "__preprocessed__",
            "prior_probability_of_gross_error": 0.05,
            "quantization_interval": 0.1,
            "one_sigma_measurement_uncertainty": 1.0,
            "limits": [
              2,
              2,
              4
            ],
            "noise_scaling": 3.0,
            "maximum_anomaly": 8.0,
            "fail_probability": 0.3
          }
        },
        "MDS": {
          "tables": [
            "observations-at",
            "observations-dpt",
            "observations-slp",
            "observations-sst"
          ],
          "func": "do_mds_buddy_check",
          "names": {
            "lat": "latitude",
            "lon": "longitude",
            "date": "date_time",
            "value": "observation_value"
          },
          "arguments": {
            "climatology": "__preprocessed__",
            "standard_deviation": "__preprocessed__",
            "limits": [
              [
                1,
                1,
                2
              ],
              [
                2,
                2,
                2
              ],
              [
                1,
                1,
                4
              ],
              [
                2,
                2,
                4
              ]
            ],
            "number_of_obs_thresholds": [
              [
                0,
                5,
                15,
                100
              ],
              [
                0
              ],
              [
                0,
                5,
                15,
                100
              ],
              [
                0
              ]
            ],
            "multipliers": [
              [
                4.0,
                3.5,
                3.0,
                2.5
              ],
              [
                4.0
              ],
              [
                4.0,
                3.5,
                3.0,
                2.5
              ],
              [
                4.0
              ]
            ]
          }
        }
      },
      "observations-slp": {
        "MDS": {
          "limits": [
            [
              1,
              1,
              0
            ],
            [
              2,
              2,
              0
            ],
            [
              3,
              3,
              0
            ],
            [
              4,
              4,
              0
            ]
          ]
        }
      }
    }
  }
}

This file has its default configuration parameters in the outer keys. Source-deck specific configuration can be applied by specifying a configuration parameter under a sid-dck key. In the sample above, only the default configuration is applied.