Sub-pixel area estimation of European forests

using NOAA- AVHRR data

Pamela Kennedy and Sten Folving,

Joint Research Centre, Space Applications Institute

Tuomas Häme, Kaj Andersson and Seppo Väätäinen

VTT Automation, Finland

Pauline Stenberg

University of Helsinki, Department of Forest Ecology, Finland

Janne Sarkeala

Enso Forest Development Ltd., Finland


A map of European forests has been produced for the pan-European area. In the database, the forest area within each NOAA-AVHRR (National Oceanic and Atmospheric Administration-Advanced Very High Resolution Radiometer) pixel has been estimated. A new approach, presented in another paper at this conference was utilized in the estimation procedure. The method takes into account both the uncertainty of a pixel to belong to a specific ground class and the mixed ground contents of a spectral class.

The image interpretation was carried out using an AVHRR image mosaic compiled from the red and near infra-red channels, for the entire pan-European area. This mosaic was composited from forty-nine AVHRR images acquired in late summer 1996. Atmospheric correction of the data was performed to the individual images using the SMAC (Simplified Method for Atmospheric Corrections) procedure. An additional BRDF (Bi-directional Reflectance Distribution Function) correction was made to each image. The reflectance mosaic was computed using the weighted mean of the reflectance values of cloud-free pixels. The CORINE Land Cover database was used to represent the ground information.

This study fulfils some of the key objectives of the FIRS (Forest Information from Remote Sensing) Project of the Space Applications Institute (SAI) at the Joint Research Centre, Ispra, of the European Commission. In particular, the work represents an attempt to improve upon the reliability of existing mapped information for Europe. Firstly, the method results in a continuous variable of forest probability representing an estimate of forest area for each pixel. Secondly, the likelihood of under-estimating the forest cover in areas where the forest is fragmented or over-estimating if the cover is uniform and homogeneous, is reduced. Thirdly, the resulting image database could also be utilized to estimate other forest characteristics, if relevant ground data are available.

Forest area statistics derived from the probability database for Italy and France were compared with area estimates taken from EUROSTAT’s (Statistical Office of the European Communities) database for 1995. The correlation coefficients were found to be 0.89 and 0.85 for Italy and France respectively.

  • Introduction
  • The forested land in Europe is around 195 million ha (Kuusela, 1994). This means that forest and other wooded land cover about 27% of Europe’s land surface and constitute one of the most important habitat types for a wide variety of species of flora and fauna.

    Over the past twenty years, the majority of data compiled and published on European forests has been statistical in nature. Despite the efforts of EUROSTAT (Statistical Office of the European Communities), the archived data are of varying quality, rarely directly comparable, irregularly up-dated and often confined to state-owned and other public land. This information is derived from national inquiries carried out at different times, using different methods of data collection and with pronounced differences in the underlying definitions. The Temperate and Boreal Forest Resources Assessments (TBFRA) of the UN-ECE/FAO (United Nations Economic Commission for Europe/Food and Agricultural Organization), is carried out once every ten years as part of the global Forest Resources Assessment (FRA). TBFRA-2000 is the latest in a series of surveys of the temperate and boreal industrialized countries covering fifty-five countries in four main regions, i.e., North America, Europe, Commonwealth of Independent States and Asia-Pacific. The statistical data are compiled from questionnaires sent out to national correspondents.

    A methodology for the production of satellite-derived mapped information on land cover in Europe was developed for the first time within the frame of the Commission of the European Communities (CEC)-CORINE Land Cover Project (EEA Task Force, 1992). The database contains data for three broad forest classes: coniferous, deciduous and mixed forest, plus agro-forestry. Forest areas smaller than 25 ha are not considered because they fall below the pre-defined size threshold of the land use units.

    Other studies utilizing satellite data for the production of a European forest / non-forest database include the work completed for the International Space Year (1992). The product, derived from AVHRR data has a reported classification accuracy of 82.5% (Häusler et al., 1993). This pan-European database was then used as a source of ground reference data for the production of another forest / non-forest database for Europe, carried out within the framework of the FIRS (Forest Information from Remote Sensing) Project (Kennedy and Folving, 1994). Satellite derived surface temperature (Ts) was investigated in addition to the normalized difference vegetation index (NDVI) for improved discrimination of forest / non-forest. The classification was performed in a stratified way so as to respect intrinsic variations in the growing conditions of the forest caused by different ecological and climatic regimes (Roy et al., 1997; Roy, 1997).

    The objective of the work reported here and in Häme et al., (19992), was to develop a method to produce a probability-based forest database of European forests, and to evaluate its performance. The novel aspect of developing such an approach is that, unlike previous more traditional classification techniques, a single pixel is described as the sum of its contributions to one or more forest cover classes as previously defined using ground data. The final result is a continuous variable of forest probability representing an estimate of forest area for each pixel. This compares to the more classical binary result indicating whether a pixel belongs to one land cover class i.e., forest in this case, or not (e.g., Häusler et al., 1993; Roy et al., 1997).

    The work not only contributes directly to the objectives of the FIRS Project, but also describes an alternative technique (i.e., to the more commonly approach of using spectral vegetation indices derived from time series of NOAA-AVHRR satellite data) for estimating forest area (and other forest characteristics) at a continental scale. The latter obviously has direct relevance to global programmes such as the IGBP’s (International Geosphere-Bioshpere Programme) Data and Information System (IGBP-DIS) 1 km Land Cover Project (Belward and Loveland, 1995), and the CEOS (Committee of earth Observation Satellites) Global Observations of Forest Cover Project, where satellite-derived information is paramount to the provision of global data sets.

    The work reported in this paper and in Häme et al., (19992), was carried out under contract from the Space Applications Institute of the Joint Research Centre, Ispra Italy. The contract was launched in June 1998 and completed in March 1999. The lead contractor was VTT Automation, Finland, in association with StoraEnso Forest Development, Finland and the University of Helsinki.

  • Data
  • An AVHRR mosaic was compiled from forty-nine relatively cloud-free images from the NOAA-14 series of satellites. A description of the pre-processing and mosaicking procedures can be found in the final report of the contract (Häme et al., 19991). Forty-eight scenes were acquired between June 30th 1996 and September 3rd 1996 and one scene from August 1997. This period of the year was chosen as representing the most stable in terms of the development phase of forest ecosystems. The mosaic, compiled from the AVHRR Channel -1 (red), and Channel-2 (near infra-red) covers the geographical area of Europe from Portugal in the south east to the Ural mountains in the north west, and from Nordkapp in northern Norway, to the Isle of Crete in the Mediterranean Basin. This is the geographical study area of the FIRS Project.

    The other two data sets used in the image interpretation phase were that of the CORINE Land Cover database (EEA Task Force, 1992), and the stratification of European Forest Ecosystems database, produced for the FIRS Project (European Commission, 1995). The EEA CORINE Land Cover definition of forested land is ‘areas occupied by forests and woodlands with a vegetation pattern composed of native or exotic coniferous and / or deciduous trees, and which can be used for the production of timber or other forest products. The forest trees are, under normal conditions, higher than 5m with a canopy closure of 30% at least’. The vector database of the CORINE Land Cover was used to represent conditions on the ground. The data from central and southern Europe were available for use in this study, and represent the product of computer-assisted photo-interpretation of Landsat images, into forty-four classes of land cover for Europe (EEA Task Force, 1992).

    The regionalization and stratification of European forest ecosystems was initially undertaken for the FIRS Project by a consortium of forest experts lead by two major partners, SCOT CONSEIL (France) and GAF mbH (Germany). This study was completed in January 1995, (European Commission, 1995). A second study which used forest management regimes as the prime delineating factor was completed in January, 1996. The rationale behind the stratification of Europe’s forest ecosystems, was to try to respect the differences in the climatic regimes, vegetation cover types, vegetation phenology, soils, cultures and social customs across the pan-European area. A two-tiered approach was taken, first, Europe was regionalized into major forest ecosystems using bio-climatological criteria and secondly, sub-divided into homogeneous forest strata, using criteria directly related to management regimes and forest characteristics. A total of nineteen regions were delineated and validated by regional experts. There are six core regions, twelve transitional regions and one Orobiom region. The latter of which describes the mountainous areas. A total of 119 strata were delineated and characterized in detail in the second study. The minimum size for a single stratum was restricted to 25,000 km2 (European Commission, 1995).

  • Methodology
  • The procedures used in the compilation of the mosaic, including all pre-processing procedures (i.e., cloud masking radiometric correction, geo-coding, mapping to a geographic co-ordinate system and mosaicking) are described in Häme et. al., (19991). Image clustering and classification routines used to select the ground samples are also described in Häme et. al., (1998, 19992).

    In brief, an unsupervised clustering program, Häme et. al., (1998), was run to automatically select training samples consisting of 2 x 2 pixels representing homogeneous ground targets in the red / near infra-red mosaic. The number of clusters was pre-set to fifty, and the homogeneity was defined by the standard deviation of the reflectance values of the 2 x 2 pixel blocks, compared to that of the entire image. Pixels representing water were excluded.

    A maximum likelihood classification method with equal a priori probabilities was used to purify the 2 x 2 pixel blocks in each cluster, thus minimizing the influence of members whose spectral characteristics fell too far from the class mean. Twenty percent of the original samples were excluded. The remaining samples in each cluster were considered to represent the ‘centre’ of the cluster. For each 2 x 2 pixel block of each cluster, the probability of forest was computed. This was the area of forest found within the 2 x 2 pixel block as extracted from the forest class in the CORINE Land Cover database. The final value of forest cover assigned to each cluster was obtained by calculating the mean of the values from its sample components.

  • Stratification of the target area
  • Part of the evaluation procedure of this study was to investigate the results obtained by treating the pan-European area as a single entity, and by treating the Mediterranean and non-Mediterranean regions as independent areas. The five combinations of clustering and sampling shown in Figure 1 were tested. The most crucial question was to decide whether the pre-clustering stratification was effective or not. The border between the Mediterranean and non-Mediterranean regions followed the boundary delineated between core region IV (Mediterranean) and core region V (Warm Temperate) of the FIRS Project’s database (European Commission, 1995).

  • Computation of forest probability using the CORINE database
  • The input data for the extraction of estimates of forest area from the CORINE database were the co-ordinates of the 2 x 2 pixel sample blocks, and their respective cluster identification. The size of the sampling unit applied to the CORINE was 1 km x 1 km. This was chosen on the basis that it would fit within the 2 x 2 pixel block, bearing in mind the likelihood of geo-rectification errors. The 1 km2 squares were used to extract the data from the CORINE Land Cover polygons. Three CORINE classes were used to represent forest cover. These were Broadleaved forest (Code 3.1.1), Coniferous forest (Code 3.1.2) and Mixed forest (Code 3.1.3). Figure 2, illustrates this sampling strategy which was repeated five times following the five approaches being tested (§ 3.1).

  • Results
  • It was concluded that the best result could be obtained by combining the results from options four and five in Figure 1. These meant, that by treating the Mediterranean and non-Mediterranean region independently, i.e., by undertaking the clustering procedure and the CORINE sampling for each region, rather than for the mosaic as a whole, a more reliable database could be compiled. The justifications for this choice are given below.




  • C1 : S1 Clustering and sampling from entire mosaic
  • C1 : S2 Clustering from entire mosaic and sampling from Mediterranean region
  • C1 : S3 Clustering from entire mosaic and sampling from Temperate and Boreal region
  • C2 : S2 Clustering and sampling from the Mediterranean region
  • C3 : S3 Clustering and sampling from the temperate and Boreal region
  • Figure 1 : Clustering and sampling combinations tested, to find the best approach for the final forest database

  • Cluster and sample properties
  • Table 1 shows the number of observations (2 x 2 pixel blocks) belonging to the fifty clusters, and the corresponding number of samples extracted from the CORINE database for the three clustering classifications and five sampling combinations. The relatively low number of samples extracted from the CORINE database (last column) is a reflection of the fact that the database does not cover the entire pan-European area. Table 1 also shows that almost 10% of the entire Mediterranean region is represented by the number of 2 x 2 pixel blocks, whilst less than 3% of the non-Mediterranean region (which is much larger) is represented by the 50 clusters. This suggests, that in order to keep the proportion of the region being sampled in the same order of magnitude, perhaps more than 50 clusters are required for the larger expanse of the Temperate and Boreal region, or that the latter should be further divided into an independent temperate region and boreal region. In addition, ground data (missing from the CORINE coverage) for this region are required.

    Figure 2: Sampling strategy for extraction of forest area for each unit of 1 km x 1 km.

    Polygons represent the CORINE Land Cover classes; light area within the squares = forest, dark area = non-forest.

    Table 1. Summary statistics for the three clustering classifications. Numbers in parenthesis show the sample size in the post-clustering stratification

    Clustering region

    Clustering characteristics

    (Obs. = 2 x 2 pixel block)

    Size of

    CORINE sample

    (1 km x 1km)

    No. of clusters


    dev. limit,


    No. of

    obs. in clusters


    of whole image, %

    No. of

    obs. in largest


    No. of

    obs. in





















    Temperate & Boreal








    Since a large proportion of the boreal region is forested, but, at the same time not covered by the CORINE database, many of the 2 x 2 pixel blocks originally falling in this region, could not be sampled. This is clearly illustrated in Figure 3. The proportion of the number of original 2 x 2 pixel blocks, and the proportion of the 1 km x 1 km squares sampled from the CORINE, correspond well for each of the 50 clusters in the Mediterranean region for the pre-stratified approach (Figure 3A). For the non-Mediterranean region, there is a large disparity between the two curves, with a concentration of CORINE samples taken from the higher numbered clusters, representing areas of low to moderate forest cover (Figure 3B). With reference to Table 1, it can be seen that 72% of the original pixel blocks were sampled in the CORINE for the Mediterranean region. This compares with only 13% for the temperate and boreal region. It was for this reason that the pre-stratification approach, i.e., combination 4 and 5 in Figure 1 was selected for the final production of the probability database.

    The mean and standard deviations of the percentage forest cover computed for the first fifteen clusters in the Mediterranean and non-Mediterranean regions are given in Table 2. The standard deviations are high for each cluster, and typically more than 30 percent, for the classes with more forest (low cluster numbers), for both regions. In the temperate and boreal region the standard deviations were found to be slightly lower for the clusters representing higher percentage forest cover.

    A B

    Figure 3: Proportion of area covered by the 2 x 2 pixel observation blocks in each cluster with respect to the area of that cluster present in the entire mosaic (solid line) and the proportion of samples taken from the CORINE as a percentage of the total number of samples (dashed line)

    Table 2: Number of Observations ( 1 km x 1 km) used to calculate percentage forest cover in each cluster, mean percentage forest cover and standard deviation


    Cluster No. Mean Cluster No. Mean St. Dev.

    Number Obs. % Forest number Obs. % Forest

    1 1585 44.64 36.21 1 209 89.14 22.09

    2 1372 51.38 37.27 2 265 87.06 22.63

    3 1499 54.68 35.18 3 116 93.20 14.63

    4 584 68.01 34.79 4 384 84.54 26.29

    5 938 65.62 32.14 5 512 83.58 28.62

    6 2196 43.75 37.32 6 1113 80.20 29.29

    7 262 53.09 38.52 7 199 83.17 27.17

    8 2432 49.20 34.54 8 421 90.70 19.75

    9 3048 38.63 35.47 9 225 75.31 33.34

    10 2418 42.39 36.46 10 1384 66.04 33.49

    11 1280 39.06 33.80 11 176 73.49 33.88

    12 2872 37.21 32.79 12 247 61.01 35.95

    13 1418 42.62 36.55 13 561 60.89 36.40

    14 2570 31.66 30.89 14 264 80.58 29.59

    15 3271 31.19 34.69 15 185 56.60 37.71

    Obviously, the relatively high standard deviations show that within each spectral cluster there are large variations in the estimate of forest cover as extracted from the CORINE database. This, in turn suggests that the same percentage forest cover estimate can manifest itself in a different spectral response at the pixel level depending on such factors as, the spatial distribution of the forest, the fragmentation of the cover and its juxtaposition with other land cover types. Conversely, different spectral signals may also result in similar or identical estimates of percentage forest cover. It should also be stated that the CORINE definition of forested land (Section 2), contains specific values for tree height and tree canopy closure which obviously can not be met (or met with any degree of confidence), from a visual interpretation of Landsat imagery. It is therefore, hardly surprising to find that the percentage forest cover estimated for the 1 km x 1 km samples within each cluster has a high degree of variation. This is a typical result of trying to apply a ‘classical’ (i.e., one designed for field surveys) definition of forest (where the canopy closure and tree height are precisely specified), to the interpretation of spectral data derived from satellite imagery. As a point of interest, the 30% canopy closure of the CORINE definition of forest would lead to the ‘loss’ of approximately two thirds of the ‘forested’ land in Finland. However, due to the fact that Finland was not one of the original 15 EU countries covered by the CORINE Land Cover exercise, the country has not yet been mapped using the CORINE methodology or nomenclature.

  • Regional forest maps and statistics
  • The final forest probability map is shown in Figure 4. There are obvious errors in Iceland, and it is likely that the forest area is, in general over-estimated in the higher latitudes. This most probably reflects the absence of the CORINE data for higher latitudes, with the result that the temperate and boreal region was under sampled.

    A rough comparison was made with the NUTS (Nomenclature of Territorial Units for Statistics ) -2 level land cover statistics collected by EUROSTAT (European Communities, 1998). The relationship between area of forest within each NUTS-2 administrative region as derived from the probability database and from the EUROSTAT archives for 1995, are shown in Figures 5 and 6 for France and Italy respectively.

    Despite the one year difference in the reporting period and some missing data (due to clouds) from the satellite-derived database, together with the differences in the definition of forested land (i.e., forested land has more than 20% tree crown cover in the EUROSTAT assessment) between the two databases, the correlation coefficients were found to be 0.85 for France and 0.89 for Italy. It should also be noted that the actual wooded land as depicted by the spectral response of an AVHRR pixel does not strictly correspond to the area of forest as estimated using the CORINE database. This was clearly illustrated by the high standard deviations obtained for the forest cover estimates for each cluster (Table 2). However, the relationship suggests that such an approach could be adopted to provide information on the spatial distribution of forest cover within NUTS-2 Level units, and thus contribute to establishing the much needed link between forest statistics and geo-referenced forest databases.

  • Benefits and drawbacks of the probability approach
  • Despite the rather high variation of forest cover estimates obtained for each spectral cluster using the CORINE database, the approach appears more appropriate for forest cover estimation than the more traditional binary classification methods in which a single pixel is assigned to a single spectral class and labelled accordingly (e.g., Häusler et al., 1992; Roy, 1997).

    One of the most appealing aspects of this approach, but at the same time perhaps the main limiting factor to its successful application, is that the method could be applied using higher spatial resolution satellite data and ground data (or a data set representing ground conditions) describing another characteristic of the forest. The limiting factor being, that the accuracy of the output depends on the quality and reliability of the so-called ground data. This is inherent to all methodologies, (including the use of spectral vegetation indices and image classification) which are dependent upon the formulation of some empirical relationship between the remotely sensed signal and the scene components. The ground data substitute in this exercise, i.e., the CORINE Land Cover database, was less than ideal. Firstly, the database covered only a part of the target area, and did not well represent the conditions in the boreal region to the north and in eastern Europe and secondly, the definition of forest may not be the most appropriate when applied to the 1 km x 1 km pixels of the AVHRR. Furthermore, the reliability of the CORINE database is not known, and could vary from country to country.

    Figure 5: france: comparison of forest area statistics derived from the forest probability database and

    eurostat’s statistical database (nuts-level-ii) for 1995

    The sampling procedures used to derive the most homogeneous pixel clusters may have introduced a certain bias. Since by default, the CORINE was sampled from spectrally homogeneous areas, it is likely that areas of low percentage forest cover (i.e., those with fragmented sparse cover) may have been underestimated, whereas those with higher forest cover may similarly have been over-estimated. It may be for this reason that forest areas in the extreme north may have been over-estimated.

    The high standard deviations obtained for forest area estimates computed for each cluster, could perhaps be reduced by increasing the number of strata and samples and / or using a smaller sampling unit for extraction of cover estimates from the CORINE. The latter would reduce the errors arising from image geo-rectification which might have meant that the actual area sampled from the CORINE fell outside the original 2 x 2 pixel block. Increasing the number of regions (i.e., from just the Mediterranean and non-Mediterranean), would have the direct disadvantage that computational and processing time would also increase.

    Click to enlarge

    Figure 6: ITALY: Comparison of forest area statistics derived from the forest probability database and EUROSTAT’S statistical database (NUTS-Level-II) for 1995

    Unlike conventional image classification, this approach allows that a pixel may be composed of several scene components (classes,) and that the proportion of each component can be estimated. In addition, the final estimates of percentage forest cover are presented as a continuous variable, rather than as a database containing a limited number of pre-defined classes, which is the typical result of automatic classification.

    The method presented here, thus offers a flexible approach to the interpretation of AVHRR data by allowing the extraction of information at the sub-pixel level. Clearly, the application of the method, and the accuracy of the derived information is strongly dependent on the spectral and spatial resolution of the satellite data and the quality and reliability of the ground data. The method (Häme et al., 19992), is reproducible for use with other satellite data, and could be applied, for example, to the newly available imagery from the SPOT-VEGETATION instrument, where the geometric and radiometric qualities of the data are reported as being substantially better than for the AVHRR. Furthermore, the presence of a mid-infra-red channel on the SPOT-VEGETATION could improve forest type discrimination, not least due to it being less susceptible to atmospheric condition.


    Belward, A. and Loveland, T.R. 1995. The IGBP 1km Land Cover Project, Proceedings of the 21st Annual Conference of the Remote Sensing Society, Southampton, UK, pp. 1099-1106.

    EEA Task Force. 1992. CORINE Land Cover. Brochure prepared as contribution to European Conference of the International Space Year. Munich. 30 Mar. - 4 Apr., 1992. 22p.

    European Commission, 1995. Regionalization and Stratification of European Forest Ecosystems. Internal Special Publication of the Joint Research Centre of the European Commission. S.P.I.95.44.

    European Communities, 1998. Forestry Statistics 1992-1996. Luxembourg Office of Official Publications of the European Communities. ISBN 92-828-3684-3.

    Häme T., Heiler, I. and San-Miguel Ayanz, J. 1998. An unsupervised change detection and recognition system for forestry. Int. J. of Remote Sensing, 19(6): 1079-1099.

    Häme, T., Andersson, K., Stenberg, P., Sarkeala, J., Rauste, Y., Väätäinen, S and Lohi, A. 19991 AVHRR-Based forest probability map of the pan-European area. Joint Research Centre/Space Applications Institute- Contract No. 13911-1998-04 F1ED ISP FI.

    Häme, T., Stenberg, P. and Rauste, Y. 19992. A new method to estimate forest values at a sub-pixel level. In proceedings to: IUFRO Conference on Remote Sensing and Forest Monitoring, Rogow, Poland, June 1st-3rd 1999.

    Häusler, T., Saradeth, S. and Amitai, Y., 1993, NOAA-AVHRR forest map of Europe, Proceedings of the International Symposium Operationalization of Remote Sensing, 19-23 April, ITC Enschede, The Netherlands, pp. 37-48.

    Kennedy, P. and Folving, S. 1994. FIRS Project Description - 1994/1999. S.P. /I.94.54.

    Kuusela, K. 1994. Forest Resources in Europa 1950-1990. European Forest Institute. Research Report 1. Cambridge University Press.

    Roy, D.P., 1997, Investigation of the maximum normalised difference vegetation index (NDVI) and the maximum surface temperature (Ts) AVHRR compositing procedures for the extraction of NDVI and Ts over forest, International Journal of Remote Sensing, 18: 2383-2401.

    Roy, D., Kennedy, P.J. and Folving, S. 1997 Combination of the Normalized Difference Vegetation Index and surface temperature for regional scale European Forest cover mapping using AVHRR data. Int. J. Remote Sensing, 18: 1189-1195.

    Click to enlarge

    HomeSearch Please send WWW related questions and comments to Webmaster.
    You can ask about hardcopy version Tomasz Zawi³a Nied¼wiedzki.
    Last modification of this page: 03.01.2010.