Initial Dataset
The GPP measurements collected by the flux towers are shared voluntarily. Hence, even though there around couple hundreds of flux towers around the world, only around 260 flux towers are available within the GPP dataset, shown in the map here. Most of them located in North America and Europe.
The observation period for each site varies by length from decades to less than a month. Many have large gaps in between due to maintenance, equipment failures, or even just simply lack of resource to collect the carbon flux data.
Data Processing
Due the resource limitation in time and compute power, this study focuses on the data between 2010 to 2015, as it is 5-year time period with the most active flux towers. To limit the impact of gap-filling and to retain the year-long seasonality within the dataset, the flux towers with less than a year of available data and/or more than 20\% of missing records in its observation duration are removed, resulting 192 flux tower sites in the dataset.
Stratified Train/Test Split
With the limited number of flux towers in the dataset, it is crucial to maintain enough diversity in training, validation, and test datasets so the model can work well globally. Hence, the train-validation-test split is stratified by a generic IGBP grouping shown in table below, leaving 78 sites for training, 26 sites for validation, 25 sites for testing.
Category | Covered IGBP | Number of Sites |
Evergreen Needleleaf Forests | Evergreen Needleleaf Forests (ENF) | 24 |
Evergreen Broadleaf Forests | Evergreen Broadleaf Forests (EBF) | 6 |
Deciduous Broadleaf Forests | Deciduous Broadleaf Forests (DBF) | 20 |
Mixed forests | Mixed Forests (MF) | 7 |
Shrublands | Closed Shrublands (CSH) Open Shrublands (OSH) | 12 |
Savanas | Woody Savannas (WSA) Savannas (SAV) | 12 |
Grasslands | Grasslands (GRA) | 24 |
Cropland | Cropland (CRO) | 15 |
Wetlands | Permanent Wetlands (WET) | 9 |
The locations and the timelines of the sites in each dataset can be seen below.
Commenti