Machine learning benchmarking for land cover map production

Land cover map validation is a complex task. If you read French, you can check this post by Vincent Thierion which shows how the 2016 LC map of France produced by CESBIO stands with respect to data sources independent from those used for its production. But this is only one aspect of the validation. A land cover map is a map, and therefore, there are other issues than checking if individual points belong to the correct class. By the way, being sure that the correct class is known, is not so easy neither.


In this epoch of machine learning hype 1, it is easy to fall in the trap of thinking that optimising a single metric accounts for all issues in map validation. Typical approaches used in machine learning contests are far from enough for this complex task. Let's have a look at how we proceed at CESBIO when we assess the quality of a LC map produced by classification.


Supervised classification, training, testing, etc.

The iota2 processing chain is highly configurable in terms of how the images are pre-processed, how the reference data is prepared, how the classifiers are paremeterised, etc. We also continuously add new approaches. During this development work, we need to assess whether a change to the workflow performs better than the previous approaches. In order to do this, we use standard metrics for classification derived from the confusion matrix (Overall Accuracy, κ coefficient, F-Score). The confusion matrix is of course computed using samples which are not used for the training, but we go a little further than that by splitting the train and test sets at the polygon level. Indeed, our reference data is made of polygons which correspond to agricultural plots, forests, urban settlements, etc. Since images have a strong local correlation, pixels belonging to the same polygon have a high likelihood of being very similar. Therefore, allowing a polygon to provide pixels for both the training and test sets yields optimistic performance estimations.


Most of our tests are performed over very large areas (at least 25% of metropolitan France, often more than that), which means that, using reference data from Corine Land Cover, we have many more samples than we can deal with. Even in this situation, we perform several runs of training and testing by drawing different polygons for each run, which allows us to estimate confidence intervals for all our metrics and therefore assess the significance of the differences in performance between different parameter settings.


All this is good and well, but this is not enough for assessing the quality of the results of a particular algorithm.


Beyond point-wise validation

The data we feed to the classifier are images and they are pre-processed so that application agnostic machine learning approaches can deal with that. In iota2, we perform eco-climatic stratification, which can introduce artifacts around strata boundaries. We also perform temporal gapfilling followed by a temporal resampling of all data so that all the pixels have the same number of features regardless of the number of available clear acquisitions. After that, sometimes we compute contextual features which take into account the neighbourhood of the pixels, in Convolutional Neural Networks, a patch size is defined, etc.


All these pre-processing steps have an influence on the final result, but most of the time, their effect can't be observed on the global statistics computed from the confusion matrix. For instance, contextual features may produce a smeared out image, but since most of the validation pixels are inside polygons and not on their edges, the affected pixels will not be used for the validation. In our case, the reference data polygons are eroded in order to compensate for possible misregistrations between the reference data and the images. Therefore, we have no pixels on the boundaries of the objects.


In our paper describing the iota2 methodology, we presented some analysis of the spatial artifacts caused by image tiling and stratification, but we lack a metric for that. The same happens when using contextual features or CNNs. The global point-wise metrics increase when the size of the neighbourhoods increase, but the maps produced are not acceptable from the user point of view. The 2 images below (produced by D. Derksen, a CESBIO PhD candidate) illustrate this kind of issues. The image on the right has higher values for the classical point wise metrics (OA, κ, etc), but the lack of spatial accuracy is unacceptable for most users.


Even if we had an exhaustive reference data set (labels for all the pixels), the number of pixels affected by the over-smoothing are a small percentage of the whole image and they would just weight a little in the global metrics. We are working on the development of quantitative tools to measure this effects, but we don't have a satisfactory solution yet.


How good is your reference data?

All what has been said above does not consider the quality of the reference data. At CESBIO, we have learned many things over the years about the different kinds of impacts of the quality of reference data, both in the classifier training and the map validation step. We have people here who collect data on the field every year on hundreds of agricultural plots. We have also a bit of experience using off-the-shelf reference data. The quality of the results is much better when we use the data collected by our colleagues and we have a rather good understanding on what happens during training and validation. Ch. Pelletier recently defended her PhD and most of her work dealt with this issue. For instance, she analysed the impact of mislabelled reference data on the classifier training and showed that Random Forests are much more robust than SVM. She also developed techniques for detecting errors in the reference.

We also use simple ways to clean the reference data. For instance, when using Corine Land Cover polygons which have a minimum mapping unit (MMU) of 25 hectares, we use information coming from other data bases, as described from slide 34 in this presentation. An illustration of the results is shown below.

The reasons for having label noise in the reference data can be many, but the 2 main we face are: the MMU and the changes occurred since the collection of the reference data.


For our 2016 map, we used Corine Land Cover 2012, and therefore, we may assume that more than 5% of the samples are wrong because of the changes. Therefore, when validating with this data, if for some classes we have accuracies higher than 95%, we must be doing something wrong. If we add the MMU issue to that, for the classes for which we don't perform the cleansing procedure illustrated above, accuracies higher than 90% should trigger an alarm.


Our ML friends like to play with data sets to improve their algorithms. Making available domain specific data is a very good idea, since ML folks have something to compete (this is why the work for free for Kaggle!) and they provide us with state of the art approaches for us to choose from. This is the idea of D. Ienco and R. Gaetano with the TiSeLaC contest: they used iota2 to produce gapfilled Landsat image time series and reference data as the ones we use at CESBIO to produce our maps (a mix of Corine Land Cover and the French Land Parcel Information System, RPG) and provided something for the ML community to easily use: CSV files with labelled pixels for training and validation.


The test site is the Reunion Island, which is more difficult to deal with than metropolitan France mainly due to the cloud cover. Even with the impressive (ahem …) temporal gapfilling from CESBIO that they used, the task is difficult. Add to that the quality of the reference data set which is based on CLC 2012 for a 2014 image time series, and the result is a daunting task.


Even with all these difficulties, several teams achieved FScores higher than 94% and 2 of them were above 99%. It seems that Deep Learning can generalise better than other approaches, and I guess that the winners use these kind of techniques, so I will assume that these algorithms achieve perfect learning and generalisation. In this case, the map they produce, is perfect. The issue is that the data used for validation is not perfect, which means that an algorithm which achieves nearly 100% accuracy, not only has the same amount of error than the validation data, but also that the errors are exactly on the same samples!


I don't have the full details on how the data was generated and, from the contest web site, I can't know how the different algorithms work 2, but I can speculate on how an algorithm can achieve 99% accuracy in this case. One reason is over-fitting3, of course. If the validation and training sets are too similar, the validation does not measure generalisation capabilities, but it rather gives the same results as the training set. Several years ago, when we were still working on small areas, we had this kind of behaviour due to a correlation between the spatial distribution of the samples and local cloud patterns: although training and test pixels came from different polygons, for some classes, they were close to each other and were cloudy on the same dates and the classifier was learning the gapfilling artifacts rather than the class behaviour. We made this mistake because we were not looking at the maps, but only optimising the accuracy metrics. Once we looked at the classified images, we understood the issue.



In this era of kaggleification of data analysis, we must be careful and make sure that the metrics we optimise are not too simplistic. It is not an easy task, and for some of the problems we address, we don't have the perfect reference data. In other situations, we don't even have the metrics to measure the quality.

The solutions we use to solve mapping problems need an additional validation beyond the standard machine learning metrics.



Please correct me if my assumptions are wrong!


Deep neural networks are able to fit random labels by memorising the complete data set.

Premières validations de la carte d'occupation du sol OSO

En 2017, le Centre d'Expertise Scientifique OSO (Occupation du SOl) par l'intermédiaire du CESBIO a produit une carte d'occupation du sol de l'année 2016 à l'échelle du territoire métropolitain français et corse. On l'appelle la carte d'occupation du sol OSO ! Cette carte est le résultat de traitements automatiques massifs de séries temporelles d'images satellites optiques Sentinel-2. Comme les images Sentinel-2, cette carte a une résolution spatiale de 10 m correspondant à une unité minimale de collecte (UMC) de 0.01 ha. L'occupation du sol est décrite grâce à 8 classes au premier niveau et 17 classes à second niveau de détail, définies en fonction des potentialités de détection de l'imagerie Sentinel-2 et des besoins exprimés par des utilisateurs finaux. Ces classes couvrent les grands thèmes d'occupation du sol (surfaces artificialisées, agricoles et semi-naturelles).

Son principal avantage en comparaison avec d'autres cartes d'occupation du sol existantes, (loin de nous l'idée de les critiquer) est son exhaustivité territoriale et surtout sa fraîcheur ! Disposer d'une carte d'occupation du sol exhaustive sur l'ensemble du territoire national au premier trimestre de l'année suivante, c'est ce qu'OSO vous propose !

Quelle richesse thématique ?

Les classes détectées par télédétection sont celles du second niveau, celles du premier niveau sont obtenues par agrégation des classes du second niveau :

  • Culture annuelle
    • Culture d'hiver
    • Culture d'été
  • Culture pérenne
    • Prairie
    • Verger
    • Vigne
  • Forêt
    • Forêt de feuillus
    • Forêt de conifères
  • Formation naturelle basse
    • Pelouse
    • Lande ligneuse
  • Urbain
    • Urbain dense
    • Urbain diffus
    • Zone industrielle et commerciale
    • Surface route / asphalte
  • Surface minérale
    • Surfaces minérales
    • Plages et dunes
  • Eau
    • Eau
  • Glaciers et neiges éternelles
    • Glaciers et neiges éternelles

Avec quelle qualité ?

Valider une carte d'occupation n'est pas une procédure simple. Il s'agit de s'interroger sur :

  • la spécification des classes
  • l'échelle de validation
  • le jeu de données de validation

Dans tous les cas, il est rarement possible d'établir une validation exhaustive sur l'ensemble d'un territoire. Classiquement, une validation statistique permet d'appréhender partiellement la précision de la cartographie obtenue, et ne permet pas d'identifier l'ensemble des confusions thématiques et des erreurs géométriques de classification.

La suite de cet article tente de qualifier la précision de la carte d'occupation du sol OSO de 2016 grâce à des jeux de données de partenaires du CES OSO. Une première validation, intrinsèque au processus de classification, a été effectuée. Les résultats statistiques sont visibles ici.

Le jeu de données d'échantillons de la couverture de surface a été produit grâce à des bases de données nationales telles que la BD Topo, le Registre Parcellaire Graphique (RPG) et Corine Land Cover. 70% de ces échantillons ont été utilisés pour l'apprentissage et 30% pour la validation a posteriori visible sur la figure ci-dessous. Cette validation, bien que pertinente, s'appuie sur des échantillons dont la génération suit la même procédure que les échantillons d'apprentissage, biaisant quelque peu l'indépendance de la validation.

Validation de la carte d'occupation du sol OSO avec 30% des échantillons extraits des 3 jeux de données utilisés lors de la classification - BD Topo, Registre Parcellaire Graphique et Corine Land Cover)

De plus, il nous était impossible de valider les deux cultures annuelles de la classification. En effet, l'indisponibilité du RPG pour l'année 2016 et 2015 (toujours indisponible le jour de l'écriture de cet article), nous a amené à développer une méthode d'apprentissage basée sur le principe de l'adaptation de domaine utilisant des échantillons du RPG 2014. Cette méthode est très bien expliquée ici. Quoiqu'il en soit, il nous était impossible de valider la classification des cultures d'été et d'hiver de 2016, seuls des échantillons issus du terrain nous le permettait, en voilà la preuve !

Continue reading

Venµs à l'honneur en Haute-Garonne et en Ariège en 2018

Le satellite Franco-Israelien Venµs, attendu depuis si longtemps, a été lancé le 2 août 2017. 110 sites dans le monde vont être observés en 2018 et 2019 à 10 m de résolution et avec 12 bandes spectrales. Alors que la plupart des sites ne correspondent qu’à l’emprise d’une scène Venµs (27 à 32 km de large (est-ouest) * 27 km nord-sud), le site ‘Toulousain’ couvre un transect de 168 km du nord de la Haute-Garonne (Grenade) jusqu’en Espagne, en passant par les Pyrénées ariégeoises (dont le Mont Vallier), prolongé par un 2ème transect de 157 km de long en Espagne jusqu’à l’embouchure de l’Ebre (carte en ligne).


L’intérêt d’avoir choisi un si grand transect est la grande diversité des conditions pédo-climatiques due au relief varié de la zone, des types de cultures et de végétation et enfin de pratiques humaines de gestion (type d’agriculture, d’élevage…), sur un nombre de kilomètres assez restreint. Ce transect Venµs permettra ainsi d’étudier de nombreux agro-écosystèmes différents.

Le transect Venµs, de Toulouse à l'espagne.

L’intérêt majeur de la mission scientifique Venµs est d’offrir une très forte revisite temporelle : chaque site sera observé tous les 2 jours. En combinant les données de Venµs avec celles de Landsat 8 et Sentinel-2, la revisite sera presque quotidienne. Au niveau scientifique, il s’agit de préparer les futures missions spatiales opérationnelles et de démontrer l’intérêt d’une fréquence temporelle très élevée. Au niveau thématique, ces 2 années 2018 et 2019 vont permettre de suivre finement les évolutions rapides des phénomènes naturels comme les variations du manteau neigeux, la croissance des cultures, les stades phénologiques des diverses végétations (forêts, prairies, cultures, autres milieux naturels), etc… Pour être pleinement valorisés, ces sujets nécessiteront des observations de terrain de qualité sur ces deux années 2018 et 2019. Nous faisons donc acte d’information, voire d’appel à volontaires, pour collecter des données de terrain pertinentes. Ci-dessous, nous listons les principaux sujets déjà prévus ou potentiels, pour chacune des 2 grandes zones géographiques du transect ; ainsi que les principaux acteurs pré-identifiés.


Continue reading

Building a global cropland mask, is not an easy task

Criticizing is easy, and doing is hard, especially when trying to create a global map of croplands. Some collegues from CESBIO have worked on that subject within the Sen2Agri project, and obtained good resuts, but only at the local or country scale. Finding a method that works everywhere must clearly be much harder.

These days, I have received a lot of emails, tweets and posts about a new cropland global product at 30 m resolution, edited by USGS. I have no doubt it was a serious work from a serious team, done with appropriate terrain data and methods, validation, and of course a tremendous data processing.



But there it is, I checked it over a lot of places that I know very well, and it seems to me that the cropland mask, at least in South West France, is clearly overestimated. Is it the same in tour region ? Here are some examples :

Continue reading

The Delta of the Ebro seen by Venμs


The Delta of Ebro

Delta of the Ebro observed by Venμs on August 18, 2017 (Copyright CNES 2017)

The Ebro (Iberus in Latin) flows into the Mediterranean in Catalonia after a journey of more than 900 km from its source in Cantabria. The current morphology of the Delta results from the significant increase in sediment inputs caused in the 15th century by the intensive deforestation of the river catchment: the conquest of the Americas required the construction of boats. It was at this time that the land emerged over the sea, making the delta this wide plain of which only 10% of the land culminates at more than 2 meters. The construction since 1930 of 187 upstream dams reduces the sediment flow (from 28 Mt / year to about 0.1 Mt / year), which partly explains the erosion currently observed in some parts of the delta.


80% of the area of ​​the delta is devoted to crops, mostly rice producing one-third of the country's production. The rest, protected by the Natural Park, consists of lagoons, reed and rush beds,  brackish marshes and sandbanks.


The Delta de l'Ebre is included in one of the sites that will be observed by Venμs every two days for two and a half years. It is part of ClimaDat, a long-term climate research network. Different issues will be adressed: land use, phenology, greenhouse gas fluxes (CO2, CH4, N2O), vegetation productivity, role of saline inputs.


Complement: the evolution of the Ebro delta from the 4th century:



Le Delta de l'Ebre vu par Venµs


Delta de l'Ebre

Delta de l'Ebre observé par Venµs le 18 août 2017 (Copyright CNES 2017)

L’Ebre (Iberus en latin) se jette dans la Méditerranée en Catalogne après un parcours de plus de 900 km depuis sa source en Cantabrie. La morphologie actuelle du Delta résulte de l'augmentation significative des apports de sédiments provoquée au XVe siècle par le déboisement intensif du bassin versant du fleuve : la conquête des Amériques nécessitait la construction de bateaux. C'est à cette époque que les terres émergées ont gagné sur la mer, faisant du delta cette large plaine dont seulement 10 % des terres culminent à plus de 2 mètres. La construction depuis 1930 de 187 barrages en amont réduit le flux de sédiments (de 28 Mt/an à 0,1 Mt/an environ), ce qui explique en partie l’érosion actuellement observée dans certaines parties du delta.


80 % de la superficie du delta sont consacrés aux cultures, en majorité de la riziculture qui produit le tiers de la production du pays. Le reste, protégé par le Parc naturel, est constitué de lagunes, roselières, jonchères, marais saumâtres et bancs de sable.


Le Delta de l’Ebre est inclus dans l’un des sites qui sera observé par Venµs tous les deux jours durant deux ans et demi. Il fait partie du réseau ClimaDat, un réseau de recherche à long terme sur le climat. Différents thèmes seront abordés : occupation du sol, phénologie, flux de gaz à effet de serre (CO2, CH4, N2O), productivité de la végétation, rôle des entrées salines.


Complément : l'évolution du delta de l'Ebre depuis le 4ème siècle :



NASA selected CESBIO's land cover map of France as "image of the day"


It is a well deserved recognition for the land cover product developed at CESBIO : NASA's earth observatory blog dedicated its "image of the day" blog post for the 15th of August to this product.


This work was done in the framework of Theia, using iota2 free software also developed at CESBIO. Thanks to the information conveyed by time series,iota2 is a fully automatic processor.

Jordi Inglada's team first published in 2015 a land cover map based on LANDSAT 8 data acquired in 2014, with a 30m resolution.This production was followed, in 2017 by a new map, based on Sentinel-2 data, acquired in 2016, at 10m resolution.

Continue reading

La NASA choisit le produit d'occupation des sols OSO du CESBIO comme "image du jour "


C'est une reconnaissance bien méritée pour le produit d'occupation des sols OSO développé et produit au CESBIO : le blog "earthobservatory" de la NASA a consacré son article "image du jour" du 15 août à ce produit.


Ce travail a été conduit dans le cadre de Theia, avec le logiciel libre iota2 développé au CESBIO. Grâce à l'utilisation de toute l'information contenue dans les séries temporelles  d'images, la chaîne iota2 arrive à fournir un résultat de grande qualité, tout en étant complètement automatique.


L'équipe de Jordi Inglada a publié une première carte en 2015, basée sur les données LANDSAT 8 acquises en 2015, avec une résolution de 30m. Cette production a été suivie, début 2017, d'une carte basée sur les données Sentinel-2, acquises en 2016, cette fois à une résolution de 10m.

Continue reading

Sen2agri system released

After 3 years of development, we are very happy to share the news of Sen2Agri system release. Sen2Agri system is a fully automatic production system to produce agriculture information from Sentinel-2 data, with a focus on food security applications. For this reason, the final user meeting was held in Rome at Food and Agriculture Organization and World Food Program. The Sen2Agri project was funded and managed by ESA, and developed by a consortium led by Université Catholique de Louvain, with CESBIO, CS France and CS-Romania.

A very attentive audience at the User Final Meeting, in the impressive World Food Programm conference room

The system manages the following operations :

  • Sentinel-2 and LANDSAT 8 data download,
  • L2A processing with MACCS/MAJA software (developed by CNES and CESBIO)
  • Monthly Synthesis product generation (with a method developed at CESBIO)
  • Generation of LAI products (based on a method developed at INRA, France, and updated, integrated to Orfeo Toolbox by CESBIO)
  • A Crop mask (issued several times per year), with two different methods :
    • without in situ data (method developed at UCL)
    • with in-situ data (method developed at CESBIO)
  • A crop type product (with a method developed at CESBIO, an early version of iota2 processor)

The scientific work behind the methods was described in a special issue of MDPI remote sensing.

Continue reading

2016/17 : un record d’écobuages dans les Pyrénées ?

Image du 10 décembre 2016Image SENTINEL2 du 30 novembre 2016, "SWIR"
Mais que se passe-t-il ce 10 décembre 2016 à midi, au-dessus du village de Villelongue (à 15 kilomètres au sud de Lourdes) ? Un des plus beaux écobuages des Hautes-Pyrénées de cet hiver 2016/17 ! Une très large bande de feu actif se dirige vers le sud. Ce feu a démarré la veille d’après les contacts terrain. En 2 jours, une grande zone a donc déjà été brulée. La répétitivité de SENTINEL2 permet d'observer la même scène quelques jours avant et après ce feu.