Sentinel-2 sees a persistent degradation of forests near the Gorges du Verdon, after 2017 drought


Coming back from our stay in the Alps, we stayed one day near the Verdon Gorges, the European Grand Canyon (I know there is a difference in scale with the American one). Anyway, the landscape was gorgeous as you can seen on the panorama below.

Panorama from "Sublime point" (and modest)



But don't worry, I will not tell you about all my holidays adventures, I keep that for my close colleagues at lunch time, and it seems they have had enough of it with the two past days.


Let's go to the point, I also noticed quite a number of brown pine trees, which surprised me as the spring and summer had been rather wet for the region so far. I asked some locals who told me it was due to the severed drought that happened last year in Provence, which happened during summer and fall 2017 and damaged mainly the pine forest, particularly where the soil root zone is thin due to the presence of rock.

Continue reading

La vectorisation du produit OSO, comment ça marche ?

Le produit vecteur d'OSO 2017 est enfin sorti ! Après plusieurs semaines de traitements, les vecteurs de chaque département sont disponibles ici. La production requiert la mobilisation d'une grande quantité de ressources de calcul et une stratégie de traitements un peu particulière. Nous voulions vous expliquer comment parvient-on à produire cette couche d'information.

Exemple du raster initial (10 m), régularisé (20m) et vectorisé

A priori, le plus simple serait de prendre la couche raster issue de la chaine de traitements iota² de l'intégrer dans notre logiciel SIG préféré et d'appuyer sur le bouton "Vectorisation" ! Mais les choses ne sont pas si simples, certaines contraintes et besoins nous obligent à quelques tours de passe-passe :

Continue reading

Machine learning for land cover map production - Follow-up on the TiSeLaC challenge

I discussed some important aspects to take into account when validating land cover maps in a previous post. In that same post I insisted on the fact that machine learning pipeline building using a blind optimisation of accuracy metrics can lead to unrealistic expectations about land cover maps produced using these approaches.


I cited as an example the TiSeLaC challenge, where 2 of the participating teams achieved FScores above 99%, which is an accuracy higher than the one we can expect from the reference data used for the validation.


I assumed that this unrealistic performances where due to over-fitting and the use of a validation set too similar to the training set. I have recently asked the challenge organisers about the procedure for splitting the reference data into train and test sets and they confirmed that the split was done at the pixel level and not at the polygon level. Therefore, nearly identical pixels coming from the same polygon could be used for training and validation.


Therefore, looking at the challenge results, one could expect that all the teams would have got similar high performances. Since this was not the case, I asked for references to the methods used. Two of the methods are published. I am assuming that these are the 2 winning methods.


One of the methods uses spatial nearest neighbour classification to decide the labels, that is, the class for a pixel is decided using the labels of the nearest pixels of the training set. Here, "nearest" means the closest in the image using an Euclidean distance on the spatial coordinates of the pixel. Indeed, the pixel coordinates where provided as a separate record, but I don't think they were intended to be used as features. And, yes, the best results are obtained if only pixel coordinates are used (no reflectances, no NDVI, nothing!). And 1 single neighbour works best than 2-NN or 10-NN.


This shows that indeed, neighbouring pixels were present in the training and test sets, and the fewer the information used (just the closest pixel) the better the result obtained.


To quickly check this, I ran a simple, out-of-the-box, Random Forest classifier using the coordinates as features and got 97.90% accuracy on the test set, while using the image features gives about 90%.


The second of the 2 winning methods (which is actually the first with an FScore of 99.29 while the method above obtains 99.03), uses 3 deep neural networks, 2 of which use temporal convolutions for each pixel. The third network is a multi-layer perceptron were the input features are statistics computed on all the pixels found in a spatial neighbourhood of the pixel to be classified. Different sizes of neighbourhoods between 1 and 17 are used. This is much more complex than using only the label of the closest pixel, but actually, contains the same information. Adding the information of the 2 first networks may allow to correctly classify the few pixels that the previous method got wrong. The performance difference between the 2 methods is less than 0.3%, which may probably fall within typical confidence intervals.


What can we learn from these results?


First of all, blind metric optimisation without domain knowledge can produce misleading results. Any remote sensing scientist knows that pixel coordinates only are not good predictors for producing a map. Otherwise, one could just spatially interpolate the reference data. Even when applying krigging, other variables are usually used!


Second, when organising this kind of contest, realistic data sets have to be used. The split between training and validation has to follow strict rules in order to avoid neighbouring pixels appearing in both data sets.


Third, map validation has to have a spatial component: are the shapes of the objects preserved, is there image noise in the generated map, etc. This is a tricky question which needs either to have dense reference data in some places or having specific metrics which are able to measure distortions without reference data. Obtaining dense reference data is very costly to and can even be impossible if some of the classes can't be identified by image interpretation (we are not tagging images of cats or road signs!). Developing specific metrics for spatial quality which don't need reference data is an open problem. Some solutions have been developed for the assessment of pan-sharpening algorithms, but the problem is rather different.


Finally, I hope that this critical analysis of the TiSeLaC contest will be useful for future contests, because I think that they may be very useful to get together the remote sensing and the machine learning communities.

Another validation of CESBIO's 2016 France land-cover map

In this post, a validation of the land-cover map of France produced by CESBIO for the 2016 period was presented. This validation used independent data (that is data collected by different teams and using different procedures than the data used for the classifier training), but the validation procedure consisted in applying classical machine learning metrics which, as described in this other post, have some limitations.

A fully independent validation following a sound protocol is costly and needs skills and expertise that are very specific. SIRS is a company which is specialised in the production of geographic data from satellite or aerial images. Among other things, they are the producers of Corine Land Cover for France and they are also responsible for quality control and validation of other Copernicus Land products.

SIRS has recently performed a validation of the 2016 France land-cover map. The executive summary of the report reads as follows:

This report provides the evaluation results of the CESBIO OSO 2016 10m layer and the CESBIO OSO 2016 20m layer.

The thematic accuracy assessment was conducted in a two-stage process:

  1. An initial blind interpretation in which the validation team did not have knowledge of the product’s thematic classes.
  2. A plausibility analysis was performed on all sample units in disagreement with the production data to consider the following cases:
  • Uncertain code, both producer and operator codes are plausible. Final validation code used is producer code.
  • Error from first validation interpretation. Final validation used is producer code
  • Error from producer. Final validation code used is from first validation interpretation
  • Producer and operator are both wrong. Final Validation code used is a new code from this second interpretation.

Resulting to this two-stage approach, it should be noticed that the plausibility analysis exhibit better results than the blind analysis.

The thematic accuracy assessment was carried out over 1,428 sample units covering France and Corsica.
The final results show that the CESBIO OSO product meet the usually accepted thematic validation requirement, i.e. 85 % in both blind interpretation and plausibility analysis. Indeed, the overall accuracies obtained are 81.4 +/- 3.68% for the blind analysis and 91.7 +/- 1.25% for the plausibility analysis on the CESBIO OSO 10m layer. The analysis on the 20m layer shows us that the overall accuracy for the blind approach is 81.1 +/-3.65% and 88.2 +/-3.15% for the plausibility approach.
Quality checks of the validation points have been made by French experts. It should be noticed that for the blind analysis, the methodology of control was based mostly on Google Earth imagery, no additional thematic source of information that could provide further context was used such as forest stand maps, peatland maps, etc.

These results are very good news for us and for our users. The report also contains interesting recommendations that will help us to improve our algorithms. The full report is available for download.

Machine learning benchmarking for land cover map production

Land cover map validation is a complex task. If you read French, you can check this post by Vincent Thierion which shows how the 2016 LC map of France produced by CESBIO stands with respect to data sources independent from those used for its production. But this is only one aspect of the validation. A land cover map is a map, and therefore, there are other issues than checking if individual points belong to the correct class. By the way, being sure that the correct class is known, is not so easy neither.


In this epoch of machine learning hype 1, it is easy to fall in the trap of thinking that optimising a single metric accounts for all issues in map validation. Typical approaches used in machine learning contests are far from enough for this complex task. Let's have a look at how we proceed at CESBIO when we assess the quality of a LC map produced by classification.
Continue reading

Premières validations de la carte d'occupation du sol OSO

En 2017, le Centre d'Expertise Scientifique OSO (Occupation du SOl) par l'intermédiaire du CESBIO a produit une carte d'occupation du sol de l'année 2016 à l'échelle du territoire métropolitain français et corse. On l'appelle la carte d'occupation du sol OSO ! Cette carte est le résultat de traitements automatiques massifs de séries temporelles d'images satellites optiques Sentinel-2. Comme les images Sentinel-2, cette carte a une résolution spatiale de 10 m correspondant à une unité minimale de collecte (UMC) de 0.01 ha. L'occupation du sol est décrite grâce à 8 classes au premier niveau et 17 classes à second niveau de détail, définies en fonction des potentialités de détection de l'imagerie Sentinel-2 et des besoins exprimés par des utilisateurs finaux. Ces classes couvrent les grands thèmes d'occupation du sol (surfaces artificialisées, agricoles et semi-naturelles).

Son principal avantage en comparaison avec d'autres cartes d'occupation du sol existantes, (loin de nous l'idée de les critiquer) est son exhaustivité territoriale et surtout sa fraîcheur ! Disposer d'une carte d'occupation du sol exhaustive sur l'ensemble du territoire national au premier trimestre de l'année suivante, c'est ce qu'OSO vous propose !

Quelle richesse thématique ?

Les classes détectées par télédétection sont celles du second niveau, celles du premier niveau sont obtenues par agrégation des classes du second niveau :

  • Culture annuelle
    • Culture d'hiver
    • Culture d'été
  • Culture pérenne
    • Prairie
    • Verger
    • Vigne
  • Forêt
    • Forêt de feuillus
    • Forêt de conifères
  • Formation naturelle basse
    • Pelouse
    • Lande ligneuse
  • Urbain
    • Urbain dense
    • Urbain diffus
    • Zone industrielle et commerciale
    • Surface route / asphalte
  • Surface minérale
    • Surfaces minérales
    • Plages et dunes
  • Eau
    • Eau
  • Glaciers et neiges éternelles
    • Glaciers et neiges éternelles

Avec quelle qualité ?

Valider une carte d'occupation n'est pas une procédure simple. Il s'agit de s'interroger sur :

  • la spécification des classes
  • l'échelle de validation
  • le jeu de données de validation

Dans tous les cas, il est rarement possible d'établir une validation exhaustive sur l'ensemble d'un territoire. Classiquement, une validation statistique permet d'appréhender partiellement la précision de la cartographie obtenue, et ne permet pas d'identifier l'ensemble des confusions thématiques et des erreurs géométriques de classification.

La suite de cet article tente de qualifier la précision de la carte d'occupation du sol OSO de 2016 grâce à des jeux de données de partenaires du CES OSO. Une première validation, intrinsèque au processus de classification, a été effectuée. Les résultats statistiques sont visibles ici.

Le jeu de données d'échantillons de la couverture de surface a été produit grâce à des bases de données nationales telles que la BD Topo, le Registre Parcellaire Graphique (RPG) et Corine Land Cover. 70% de ces échantillons ont été utilisés pour l'apprentissage et 30% pour la validation a posteriori visible sur la figure ci-dessous. Cette validation, bien que pertinente, s'appuie sur des échantillons dont la génération suit la même procédure que les échantillons d'apprentissage, biaisant quelque peu l'indépendance de la validation.

Validation de la carte d'occupation du sol OSO avec 30% des échantillons extraits des 3 jeux de données utilisés lors de la classification - BD Topo, Registre Parcellaire Graphique et Corine Land Cover)

De plus, il nous était impossible de valider les deux cultures annuelles de la classification. En effet, l'indisponibilité du RPG pour l'année 2016 et 2015 (toujours indisponible le jour de l'écriture de cet article), nous a amené à développer une méthode d'apprentissage basée sur le principe de l'adaptation de domaine utilisant des échantillons du RPG 2014. Cette méthode est très bien expliquée ici. Quoiqu'il en soit, il nous était impossible de valider la classification des cultures d'été et d'hiver de 2016, seuls des échantillons issus du terrain nous le permettait, en voilà la preuve !

Continue reading

Venµs à l'honneur en Haute-Garonne et en Ariège en 2018

Le satellite Franco-Israelien Venµs, attendu depuis si longtemps, a été lancé le 2 août 2017. 110 sites dans le monde vont être observés en 2018 et 2019 à 10 m de résolution et avec 12 bandes spectrales. Alors que la plupart des sites ne correspondent qu’à l’emprise d’une scène Venµs (27 à 32 km de large (est-ouest) * 27 km nord-sud), le site ‘Toulousain’ couvre un transect de 168 km du nord de la Haute-Garonne (Grenade) jusqu’en Espagne, en passant par les Pyrénées ariégeoises (dont le Mont Vallier), prolongé par un 2ème transect de 157 km de long en Espagne jusqu’à l’embouchure de l’Ebre (carte en ligne).


L’intérêt d’avoir choisi un si grand transect est la grande diversité des conditions pédo-climatiques due au relief varié de la zone, des types de cultures et de végétation et enfin de pratiques humaines de gestion (type d’agriculture, d’élevage…), sur un nombre de kilomètres assez restreint. Ce transect Venµs permettra ainsi d’étudier de nombreux agro-écosystèmes différents.

Le transect Venµs, de Toulouse à l'espagne.

L’intérêt majeur de la mission scientifique Venµs est d’offrir une très forte revisite temporelle : chaque site sera observé tous les 2 jours. En combinant les données de Venµs avec celles de Landsat 8 et Sentinel-2, la revisite sera presque quotidienne. Au niveau scientifique, il s’agit de préparer les futures missions spatiales opérationnelles et de démontrer l’intérêt d’une fréquence temporelle très élevée. Au niveau thématique, ces 2 années 2018 et 2019 vont permettre de suivre finement les évolutions rapides des phénomènes naturels comme les variations du manteau neigeux, la croissance des cultures, les stades phénologiques des diverses végétations (forêts, prairies, cultures, autres milieux naturels), etc… Pour être pleinement valorisés, ces sujets nécessiteront des observations de terrain de qualité sur ces deux années 2018 et 2019. Nous faisons donc acte d’information, voire d’appel à volontaires, pour collecter des données de terrain pertinentes. Ci-dessous, nous listons les principaux sujets déjà prévus ou potentiels, pour chacune des 2 grandes zones géographiques du transect ; ainsi que les principaux acteurs pré-identifiés.


Continue reading

Building a global cropland mask, is not an easy task

Criticizing is easy, and doing is hard, especially when trying to create a global map of croplands. Some collegues from CESBIO have worked on that subject within the Sen2Agri project, and obtained good resuts, but only at the local or country scale. Finding a method that works everywhere must clearly be much harder.

These days, I have received a lot of emails, tweets and posts about a new cropland global product at 30 m resolution, edited by USGS. I have no doubt it was a serious work from a serious team, done with appropriate terrain data and methods, validation, and of course a tremendous data processing.



But there it is, I checked it over a lot of places that I know very well, and it seems to me that the cropland mask, at least in South West France, is clearly overestimated. Is it the same in tour region ? Here are some examples :

Continue reading

The Delta of the Ebro seen by Venμs


The Delta of Ebro

Delta of the Ebro observed by Venμs on August 18, 2017 (Copyright CNES 2017)

The Ebro (Iberus in Latin) flows into the Mediterranean in Catalonia after a journey of more than 900 km from its source in Cantabria. The current morphology of the Delta results from the significant increase in sediment inputs caused in the 15th century by the intensive deforestation of the river catchment: the conquest of the Americas required the construction of boats. It was at this time that the land emerged over the sea, making the delta this wide plain of which only 10% of the land culminates at more than 2 meters. The construction since 1930 of 187 upstream dams reduces the sediment flow (from 28 Mt / year to about 0.1 Mt / year), which partly explains the erosion currently observed in some parts of the delta.


80% of the area of ​​the delta is devoted to crops, mostly rice producing one-third of the country's production. The rest, protected by the Natural Park, consists of lagoons, reed and rush beds,  brackish marshes and sandbanks.


The Delta de l'Ebre is included in one of the sites that will be observed by Venμs every two days for two and a half years. It is part of ClimaDat, a long-term climate research network. Different issues will be adressed: land use, phenology, greenhouse gas fluxes (CO2, CH4, N2O), vegetation productivity, role of saline inputs.


Complement: the evolution of the Ebro delta from the 4th century:



Le Delta de l'Ebre vu par Venµs


Delta de l'Ebre

Delta de l'Ebre observé par Venµs le 18 août 2017 (Copyright CNES 2017)

L’Ebre (Iberus en latin) se jette dans la Méditerranée en Catalogne après un parcours de plus de 900 km depuis sa source en Cantabrie. La morphologie actuelle du Delta résulte de l'augmentation significative des apports de sédiments provoquée au XVe siècle par le déboisement intensif du bassin versant du fleuve : la conquête des Amériques nécessitait la construction de bateaux. C'est à cette époque que les terres émergées ont gagné sur la mer, faisant du delta cette large plaine dont seulement 10 % des terres culminent à plus de 2 mètres. La construction depuis 1930 de 187 barrages en amont réduit le flux de sédiments (de 28 Mt/an à 0,1 Mt/an environ), ce qui explique en partie l’érosion actuellement observée dans certaines parties du delta.


80 % de la superficie du delta sont consacrés aux cultures, en majorité de la riziculture qui produit le tiers de la production du pays. Le reste, protégé par le Parc naturel, est constitué de lagunes, roselières, jonchères, marais saumâtres et bancs de sable.


Le Delta de l’Ebre est inclus dans l’un des sites qui sera observé par Venµs tous les deux jours durant deux ans et demi. Il fait partie du réseau ClimaDat, un réseau de recherche à long terme sur le climat. Différents thèmes seront abordés : occupation du sol, phénologie, flux de gaz à effet de serre (CO2, CH4, N2O), productivité de la végétation, rôle des entrées salines.


Complément : l'évolution du delta de l'Ebre depuis le 4ème siècle :