Machine learning for land cover map production - Follow-up on the TiSeLaC challenge

I discussed some important aspects to take into account when validating land cover maps in a previous post. In that same post I insisted on the fact that machine learning pipeline building using a blind optimisation of accuracy metrics can lead to unrealistic expectations about land cover maps produced using these approaches.


I cited as an example the TiSeLaC challenge, where 2 of the participating teams achieved FScores above 99%, which is an accuracy higher than the one we can expect from the reference data used for the validation.


I assumed that this unrealistic performances where due to over-fitting and the use of a validation set too similar to the training set. I have recently asked the challenge organisers about the procedure for splitting the reference data into train and test sets and they confirmed that the split was done at the pixel level and not at the polygon level. Therefore, nearly identical pixels coming from the same polygon could be used for training and validation.


Therefore, looking at the challenge results, one could expect that all the teams would have got similar high performances. Since this was not the case, I asked for references to the methods used. Two of the methods are published. I am assuming that these are the 2 winning methods.


One of the methods uses spatial nearest neighbour classification to decide the labels, that is, the class for a pixel is decided using the labels of the nearest pixels of the training set. Here, "nearest" means the closest in the image using an Euclidean distance on the spatial coordinates of the pixel. Indeed, the pixel coordinates where provided as a separate record, but I don't think they were intended to be used as features. And, yes, the best results are obtained if only pixel coordinates are used (no reflectances, no NDVI, nothing!). And 1 single neighbour works best than 2-NN or 10-NN.


This shows that indeed, neighbouring pixels were present in the training and test sets, and the fewer the information used (just the closest pixel) the better the result obtained.


To quickly check this, I ran a simple, out-of-the-box, Random Forest classifier using the coordinates as features and got 97.90% accuracy on the test set, while using the image features gives about 90%.


The second of the 2 winning methods (which is actually the first with an FScore of 99.29 while the method above obtains 99.03), uses 3 deep neural networks, 2 of which use temporal convolutions for each pixel. The third network is a multi-layer perceptron were the input features are statistics computed on all the pixels found in a spatial neighbourhood of the pixel to be classified. Different sizes of neighbourhoods between 1 and 17 are used. This is much more complex than using only the label of the closest pixel, but actually, contains the same information. Adding the information of the 2 first networks may allow to correctly classify the few pixels that the previous method got wrong. The performance difference between the 2 methods is less than 0.3%, which may probably fall within typical confidence intervals.


What can we learn from these results?


First of all, blind metric optimisation without domain knowledge can produce misleading results. Any remote sensing scientist knows that pixel coordinates only are not good predictors for producing a map. Otherwise, one could just spatially interpolate the reference data. Even when applying krigging, other variables are usually used!


Second, when organising this kind of contest, realistic data sets have to be used. The split between training and validation has to follow strict rules in order to avoid neighbouring pixels appearing in both data sets.


Third, map validation has to have a spatial component: are the shapes of the objects preserved, is there image noise in the generated map, etc. This is a tricky question which needs either to have dense reference data in some places or having specific metrics which are able to measure distortions without reference data. Obtaining dense reference data is very costly to and can even be impossible if some of the classes can't be identified by image interpretation (we are not tagging images of cats or road signs!). Developing specific metrics for spatial quality which don't need reference data is an open problem. Some solutions have been developed for the assessment of pan-sharpening algorithms, but the problem is rather different.


Finally, I hope that this critical analysis of the TiSeLaC contest will be useful for future contests, because I think that they may be very useful to get together the remote sensing and the machine learning communities.

Another validation of CESBIO's 2016 France land-cover map

In this post, a validation of the land-cover map of France produced by CESBIO for the 2016 period was presented. This validation used independent data (that is data collected by different teams and using different procedures than the data used for the classifier training), but the validation procedure consisted in applying classical machine learning metrics which, as described in this other post, have some limitations.

A fully independent validation following a sound protocol is costly and needs skills and expertise that are very specific. SIRS is a company which is specialised in the production of geographic data from satellite or aerial images. Among other things, they are the producers of Corine Land Cover for France and they are also responsible for quality control and validation of other Copernicus Land products.

SIRS has recently performed a validation of the 2016 France land-cover map. The executive summary of the report reads as follows:

This report provides the evaluation results of the CESBIO OSO 2016 10m layer and the CESBIO OSO 2016 20m layer.

The thematic accuracy assessment was conducted in a two-stage process:

  1. An initial blind interpretation in which the validation team did not have knowledge of the product’s thematic classes.
  2. A plausibility analysis was performed on all sample units in disagreement with the production data to consider the following cases:
  • Uncertain code, both producer and operator codes are plausible. Final validation code used is producer code.
  • Error from first validation interpretation. Final validation used is producer code
  • Error from producer. Final validation code used is from first validation interpretation
  • Producer and operator are both wrong. Final Validation code used is a new code from this second interpretation.

Resulting to this two-stage approach, it should be noticed that the plausibility analysis exhibit better results than the blind analysis.

The thematic accuracy assessment was carried out over 1,428 sample units covering France and Corsica.
The final results show that the CESBIO OSO product meet the usually accepted thematic validation requirement, i.e. 85 % in both blind interpretation and plausibility analysis. Indeed, the overall accuracies obtained are 81.4 +/- 3.68% for the blind analysis and 91.7 +/- 1.25% for the plausibility analysis on the CESBIO OSO 10m layer. The analysis on the 20m layer shows us that the overall accuracy for the blind approach is 81.1 +/-3.65% and 88.2 +/-3.15% for the plausibility approach.
Quality checks of the validation points have been made by French experts. It should be noticed that for the blind analysis, the methodology of control was based mostly on Google Earth imagery, no additional thematic source of information that could provide further context was used such as forest stand maps, peatland maps, etc.

These results are very good news for us and for our users. The report also contains interesting recommendations that will help us to improve our algorithms. The full report is available for download.

Machine learning benchmarking for land cover map production

Land cover map validation is a complex task. If you read French, you can check this post by Vincent Thierion which shows how the 2016 LC map of France produced by CESBIO stands with respect to data sources independent from those used for its production. But this is only one aspect of the validation. A land cover map is a map, and therefore, there are other issues than checking if individual points belong to the correct class. By the way, being sure that the correct class is known, is not so easy neither.


In this epoch of machine learning hype 1, it is easy to fall in the trap of thinking that optimising a single metric accounts for all issues in map validation. Typical approaches used in machine learning contests are far from enough for this complex task. Let's have a look at how we proceed at CESBIO when we assess the quality of a LC map produced by classification.
Continue reading

The land cover classification for France in 2016 is available



just this once, we are ahead of time. Well, nearly. We had promised the 2016 land cover map or France before the end of first term of 2017. It exists and is available here. It's resolution is 10m, with the same 17 classes nomenclature that we used for Landsat landcover map of 2014..

The map is mainly based on the Sentinel-2 data acquired from end 2015 to end 2016, but we have also processed the LANDSAT 8 data.  We will provide some details below.

Continue reading

La première carte d'occupation des sols 2016 de la France avec Sentinel-2


(article copié depuis le blog OSO)


Une fois n'est pas coutume, nous sommes en avance. Enfin, presque. Nous avions promis une carte d'occupation des sols 2016 de la France métropolitaine avant la fin du premier trimestre 2017. Elle existe et est disponible ici. Il s'agit d'une carte à 10 m de résolution, avec la même nomenclature que celle utilisée pour les derniers produits prototypes Landsat à 17 classes.

La carte est principalement basée sur des données Sentinel-2 allant de fin 2015 à fin 2016, mais nous avons aussi utilisé des données Landsat-8. Nous vous donnons les détails de la procédure de production plus bas.

Continue reading

The iota2 Land cover processor has processed some Sentinel-2 data


You already heard about iota2 processor, and you must know that it can process LANDSAT 8 time series et deliver land cover maps for whole countries. These las days, Arthur Vincent completed the code that allows processing Sentinel-2 time series. Even if atmospherically corrected Sentinel-2 data are not yet available above the whole France, we used  the demonstration products delivered by Theia to test our processor.


Everything seems to work fine, and the 10 m resolution of Sentinel-2 seems to allow seeing much more details. The joined images show two extracts near Avignon, in Provence, which show the differences between Landsat 8 and Sentinel-2. Please just look only at the detail level, and not at the differences in terms of classes. Both maps were produces using different time periods, and a period limited to winter and beginning of spring for Sentinel-2, and the learning database is also different. Please don,'t draw conclusions too fast about the thematic quality of the maps.


First extract shows a natural vegetation zone, with some farmland (top LANDSAT8, bottom Sentinel-2)


Continue reading

La chaîne d'occupation des sols iota2 sait maintenant traiter Sentinel-2


Vous connaissez déjà la chaîne iota2 et vous savez qu'elle sait traiter les séries temporelles Landsat8 et générer des cartes d'occupation des sols. Ces derniers jours, Arthur Vincent a terminé le code permettant d'utiliser les séries temporelles Sentinel-2. Même si nous n'avons pas encore des séries Sentinel-2 sur toute la France (mais elles devraient arriver bientôt), nous avons utilisé des produits de démonstration fournis par THEIA pour valider la chaîne de traitement.


Tout a l'air de bien marcher et la résolution de 10m. de Sentinel-2 permet d'avoir beaucoup plus de détails au niveau des cartes produites. Voici 2 extraits (près d'Avignon) qui montrent la différence entre Landsat8 (en haut) et Sentinel-2 (en bas). Attention, la comparaison n'a de sens qu'en termes de détail spatial : les cartes ne correspondent pas aux mêmes périodes d'acquisition, seuls quelques mois de données Sentinel-2 ont été utilisés, sans la période estivale, et les données de référence sont légèrement différentes. Il ne faut pas tirer donc de conclusion par rapport à la qualité thématique de ces cartes.


Le premier extrait montre une zone de végétation naturelle avec un peu d'agriculture (en haut LANDSAT8, en bas Sentinel-2)


Continue reading

New version of fully automatic land cover map of France for 2014 from LANDSAT8


Over the last months, we worked a lot on our method for Land Cover map production. Three main topics (1) were studied with Arthur Vincent and David Morin at CESBIO :

  1. porting and validating the iota2 processor on the CNES High Performance Computing facilities (HPC);
  2. enhancing the method for reference data preparation. Reference data are used both for training and validation;
  3. developing a stratification method which allows to train and apply classifiers per eco-climatic area, for instance.

Using all these new features, we produced a lot (really a lot!) of maps for the continental France. We just released the 4 following examples, produced using all the available LANDSAT8 data in 2014 :

  • regarding reference data :
    1. including 4 classes of artificial surfaces : continuous urban , dicontinuous urban, road surfaces, and commercial and industrial areas (2);
    2. only one artificial class that gathers the 4 above (3);
  • regarding the stratification method :
    1. using eco-climatic areas (4);
    2. without stratification, but using a fusion of several classifiers trained over different sets of tiles.
The pink urban spot, in the center of brown zone, is the village of Chateauneuf du Pape which is famous for its wine, and the brown color is the vineyard class. Validated !

Continue reading

Nouvelle version des produits d'occupation des sols OSO sur la France en 2014


Nous avons beaucoup travaillé sur la procédure de génération des cartes d'occupation des sols ces derniers mois. Trois axes principaux1 ont été abordés par Arthur Vincent et David Morin au Cesbio :

  1. Le portage et la validation de la chaîne de traitement iota2 sur l'infrastructure de calcul à haute performance (HPC) du Cnes.
  2. L'amélioration de la procédure de préparation des données de référence utilisées pour l'apprentissage des classifieurs et la validation des cartes produites.
  3. La mise au point de la stratification qui permet de spécialiser les algorithmes de classification par zone éco-climatique, par exemple.

En utilisant toutes ces nouveautés, nous avons produit beaucoup (vraiment beaucoup!) de cartes sur la France métropolitaine. Nous venons de mettre en ligne quelques exemples sur l'année 2014 en utilisant toutes les données Landsat8 disponibles. Nous avons choisi de vous montrer les 4 cas qui correspondent aux combinaisons suivantes :

  • sur la donnée de référence :
    1. utilisation de 4 classes de surfaces artificielles (abusivement appelées "bâti") : urbain continu, urbain discontinu, surfaces "route" et zones industrielles et commerciales (2);
    2. regroupement a posteriori de ces 4 classes (3);
  • sur le mode de stratification :
    1. avec stratification par zone éco-climatique (4);
    2. sans stratification, mais avec une fusion de plusieurs (10) classifieurs appris sur des tuiles images différentes.

Le village en rose, au centre de la zone marron, c'est le village de Chateauneuf du Pape, et la zone marron autour du village, ce sont des vignes ! Pas besoin de vérité terrain pour le vérifier, mais on veut bien aller vérifier quand même.

Arthur nous a concocté une interface assez pratique pour la visualisation et la comparaison des différentes cartes.  Vous pouvez y accéder ici. L'icône en haut à droite vous permet de sélectionner les cartes qui seront affichées. A gauche, sous les boutons qui gèrent le niveau de zoom, vous avez la possibilité de sélectionner 2 des cartes pour lesquelles les statistiques de qualité (FScore par classe5) seront affichées sous la zone de visualisation. Cela vous permet d'apprécier les différences entre les approches.


Aux 4 nouvelles cartes, nous avons ajouté la version que nous avions publié en début d'année, dont la qualité est inférieure. Si vous regardez la précision globale de cette carte (Overall Accuracy) vous verrez qu'elle est en fait supérieure à celle des nouvelles cartes. Ceci est dû au fait que dans cette ancienne version, nous utilisions beaucoup de pixels d'eau pour la validation, et l'eau est très facile à classer. Le problème principal de cette ancienne version est le sur-classement des zones urbaines au dépens des surfaces minérales naturelles et des vergers. Ceci a été amélioré grâce au travail sur la préparation de la donnée de référence.


Pour comparer des cartes, il est utile de regarder les FScore par classe. Vous verrez ainsi que la stratification éco-climatique apporte des améliorations importantes sur les valeurs moyennes et sur les intervalles de confiance.


Si vous voulez récupérer les fichiers GeoTiff complets (attention, c'est volumineux!), vous pouvez utiliser les liens suivants :

N'hésitez pas à nous faire des retours. Nous continuons à travailler sur les améliorations des méthodes.


1Beaucoup d'autres tâches ont été réalisées, dont la préparation de l'ingestion des données Sentinel-2, par exemple.

2Ces 4 classes correspondent à la nomenclature de Corine Land Cover, dont les polygones du millésime 2012 ont été affinés en utilisant une procédure développée par David et Marcela et décrite dans cette présentation (à partir de la planche 33).

3L'apprentissage et la classification sont toujours faits avec les 4 classes séparées, mais elles sont regroupées à la fin, ce qui permet d'augmenter la précision de la carte en échange d'une perte de finesse thématique. Mais les pixels de 30 m. de Landsat ne nous permettent d'être très précis pour ces classes.

4Nous avons utilisé la carte publiée par Joly et al.

5Nous utilisons cette métrique, car elle combine les erreurs d'omission et de commission.


Land cover maps quickly obtained using SPOT4 (Take5) data for the Sudmipy site


At CESBIO, we are developing land cover map production techniques, for high resolution image time series, similar to those which will soon be provided by Venµs and Sentinel-2. As soon as the SPOT4 (Take5) data were available over our study area (Sudmipy site in South West France), we decided to assess our processing chains on those data sets. The first results were quickly presented during Take5 user's meeting which was held last October.

1. Experiments

In this post we describe the work carried out in order to produce these first land cover classifications with the SPOT4 (Take5) Sudmipy images (East and West areas) and we compare the results obtained over the common region to these two areas.


Prior to the work presented here, we organized a field data collection campaign which was synchronous to the satellite acquisitions. These data are needed to train the classifier training and validate the classification. The field work was conducted in 3 study areas (figure 1) which were visited 6 times between February and September 2013, and corresponded to a total of 2000 agricultural plots. This allowed to monitor the cultural cycle of Winter crops, Summer crops and their irrigation attribute, grasslands, forests and bulit-up areas. The final nomenclature consists in 16 land cover classes.


The goal was to assess the results of a classification using limited field data in terms of quantity but also in terms of spatial spread. We wanted also to check whether the East and West SPOT4 (Take5) tracks could be merged. To this end, we used the field data collected on the common area of the two tracks (in pink on the figure) and 5 level 2A images for each track acquired with a one day shift.


2. Results

The first results of supervised SVM classification (using the ORFEO Toolbox) can be considered as very ipromising, since they allow to obtain more than 90% of correctly classified pixels for both the East and the West tracks and since the continuity between the two swaths is excellent. Some confusions can be observed between bare soils or mineral surfaces and Summer crops, but these errors should be reduced by using LANDSAT 8 images acquired during the Summer, when Summer crops will develop.

Merging of the land cover maps obtained on the East and West Sudmipy tracks (the cloudy areas were cropped out). The comparison against the ground truth (the black dots on the map to the South-West of Toulouse) results in a kappa coefficient of 0.89 for the West and 0.92 on the East.



This zoom compares the results obtained on the common area of the two tracks (West to the left and East to the right). The two classifications were obtained independently, using the same method and the same training data, but with images acquired at different dates and with different viewing angles. The main errors are maize plots labeled as bare soil, which is not surprising, since this crop was just emerging when the last image was acquired. There are also confusions between wheat and barley, but even on the field, one has to be a specialist to tell them apart.

3. Feedback and retrospective

After performing these experiments, we were very satisfied with the operationnality of our tools. Given the data volume to be processed (about 10 GB of images) we could have expected very long computation times or a limitation in terms of memory limits of the software used (after all, we are just scientists in a lab!). You will not be surprised to know that our processing chains are based on Orfeo Toolbox. More precisely, the core of the chain uses the applications provided with OTB for supervised training and image classification. One just have to build a multi-channel image were each channel is a classification feature (reflectances, NDVI, etc.) and provide a vector data (a shapefile, for instance) containing the training (and validation) data. Then, a command line for the training (see the end of this post) and another one for the classification (idem) are enough.

Computation times are very interesting: several minutes for the training and several tens of minutes for the classification. One big advantage of OTB applications is that they automatically use all the available processors automatically (our server has 24 cores, but any off the shelf PC has between 4 and 12 cores nowadays!).

We are going to continue using these data, since we have other field data which are better spread over the area. This should allow us to obtain even better results. We will also use the Summer LANDSAT 8 images in order to avoid the above-mentioned errors on Summer crops.

4. Command line examples

We start by building a multi-channel image with the SPOT4 (Take5) data, not accounting for the cloud masks in this example :

otbcli_ConcatenateImages -il SPOT4_HRVIR_XS_20130217_N1_TUILE_CSudmipyE.TIF
SPOT4_HRVIR_XS_20130607_N1_TUILE_CSudmipyE.TIF -out

We compute the statistics of the images in order to normalize the features :

otbcli_ComputeImagesStatistics -il otbConcatImg_Spot4_Take5_5dat2013.tif -out

We train a SVM with an RBF (Gaussian) kernel :

otbcli_TrainSVMImagesClassifier otbConcatImg_Spot4_Take5_5dat2013.tif
-io.vd DT2013_Take5_CNES_1002_Erod_Perm_Dissolve16cl.shp -sample.vfn "Class"
-io.imstat EstimateImageStatistics_Take5_5dat2013.xml -svm.opt 1 -svm.k rbf
-io.out svmModel_Take5Est_5dat2013_train6.svm

And Voilà !, we perform the classification:

otbcli_ImageSVMClassifier -in otbConcatImg_Spot4_Take5_5dat2013.tif -mask
EmpriseTake5_CnesAll.tif -imstat EstimateImageStatistics_Take5_5dat2013.xml
-svm svmModel_Take5Est_5dat2013_train_6.svm -out ClasSVMTake5_5dat_16cl_6.tif