[MUSCATE News] Production stalled

Sorry for those of you waiting for our real time products, MUSCATE production is stalled these days. The teams are working hard to put it back in production.

Désolé pour ceux d'entre vous qui attendent nos produits. La production de MUSCATE est arrêtée. Les équipes travaillent d'arrache pied pour la remettre en route.

Machine learning benchmarking for land cover map production

Land cover map validation is a complex task. If you read French, you can check this post by Vincent Thierion which shows how the 2016 LC map of France produced by CESBIO stands with respect to data sources independent from those used for its production. But this is only one aspect of the validation. A land cover map is a map, and therefore, there are other issues than checking if individual points belong to the correct class. By the way, being sure that the correct class is known, is not so easy neither.


In this epoch of machine learning hype 1, it is easy to fall in the trap of thinking that optimising a single metric accounts for all issues in map validation. Typical approaches used in machine learning contests are far from enough for this complex task. Let's have a look at how we proceed at CESBIO when we assess the quality of a LC map produced by classification.


Supervised classification, training, testing, etc.

The iota2 processing chain is highly configurable in terms of how the images are pre-processed, how the reference data is prepared, how the classifiers are paremeterised, etc. We also continuously add new approaches. During this development work, we need to assess whether a change to the workflow performs better than the previous approaches. In order to do this, we use standard metrics for classification derived from the confusion matrix (Overall Accuracy, κ coefficient, F-Score). The confusion matrix is of course computed using samples which are not used for the training, but we go a little further than that by splitting the train and test sets at the polygon level. Indeed, our reference data is made of polygons which correspond to agricultural plots, forests, urban settlements, etc. Since images have a strong local correlation, pixels belonging to the same polygon have a high likelihood of being very similar. Therefore, allowing a polygon to provide pixels for both the training and test sets yields optimistic performance estimations.


Most of our tests are performed over very large areas (at least 25% of metropolitan France, often more than that), which means that, using reference data from Corine Land Cover, we have many more samples than we can deal with. Even in this situation, we perform several runs of training and testing by drawing different polygons for each run, which allows us to estimate confidence intervals for all our metrics and therefore assess the significance of the differences in performance between different parameter settings.


All this is good and well, but this is not enough for assessing the quality of the results of a particular algorithm.


Beyond point-wise validation

The data we feed to the classifier are images and they are pre-processed so that application agnostic machine learning approaches can deal with that. In iota2, we perform eco-climatic stratification, which can introduce artifacts around strata boundaries. We also perform temporal gapfilling followed by a temporal resampling of all data so that all the pixels have the same number of features regardless of the number of available clear acquisitions. After that, sometimes we compute contextual features which take into account the neighbourhood of the pixels, in Convolutional Neural Networks, a patch size is defined, etc.


All these pre-processing steps have an influence on the final result, but most of the time, their effect can't be observed on the global statistics computed from the confusion matrix. For instance, contextual features may produce a smeared out image, but since most of the validation pixels are inside polygons and not on their edges, the affected pixels will not be used for the validation. In our case, the reference data polygons are eroded in order to compensate for possible misregistrations between the reference data and the images. Therefore, we have no pixels on the boundaries of the objects.


In our paper describing the iota2 methodology, we presented some analysis of the spatial artifacts caused by image tiling and stratification, but we lack a metric for that. The same happens when using contextual features or CNNs. The global point-wise metrics increase when the size of the neighbourhoods increase, but the maps produced are not acceptable from the user point of view. The 2 images below (produced by D. Derksen, a CESBIO PhD candidate) illustrate this kind of issues. The image on the right has higher values for the classical point wise metrics (OA, κ, etc), but the lack of spatial accuracy is unacceptable for most users.


Even if we had an exhaustive reference data set (labels for all the pixels), the number of pixels affected by the over-smoothing are a small percentage of the whole image and they would just weight a little in the global metrics. We are working on the development of quantitative tools to measure this effects, but we don't have a satisfactory solution yet.


How good is your reference data?

All what has been said above does not consider the quality of the reference data. At CESBIO, we have learned many things over the years about the different kinds of impacts of the quality of reference data, both in the classifier training and the map validation step. We have people here who collect data on the field every year on hundreds of agricultural plots. We have also a bit of experience using off-the-shelf reference data. The quality of the results is much better when we use the data collected by our colleagues and we have a rather good understanding on what happens during training and validation. Ch. Pelletier recently defended her PhD and most of her work dealt with this issue. For instance, she analysed the impact of mislabelled reference data on the classifier training and showed that Random Forests are much more robust than SVM. She also developed techniques for detecting errors in the reference.

We also use simple ways to clean the reference data. For instance, when using Corine Land Cover polygons which have a minimum mapping unit (MMU) of 25 hectares, we use information coming from other data bases, as described from slide 34 in this presentation. An illustration of the results is shown below.

The reasons for having label noise in the reference data can be many, but the 2 main we face are: the MMU and the changes occurred since the collection of the reference data.


For our 2016 map, we used Corine Land Cover 2012, and therefore, we may assume that more than 5% of the samples are wrong because of the changes. Therefore, when validating with this data, if for some classes we have accuracies higher than 95%, we must be doing something wrong. If we add the MMU issue to that, for the classes for which we don't perform the cleansing procedure illustrated above, accuracies higher than 90% should trigger an alarm.


Our ML friends like to play with data sets to improve their algorithms. Making available domain specific data is a very good idea, since ML folks have something to compete (this is why the work for free for Kaggle!) and they provide us with state of the art approaches for us to choose from. This is the idea of D. Ienco and R. Gaetano with the TiSeLaC contest: they used iota2 to produce gapfilled Landsat image time series and reference data as the ones we use at CESBIO to produce our maps (a mix of Corine Land Cover and the French Land Parcel Information System, RPG) and provided something for the ML community to easily use: CSV files with labelled pixels for training and validation.


The test site is the Reunion Island, which is more difficult to deal with than metropolitan France mainly due to the cloud cover. Even with the impressive (ahem …) temporal gapfilling from CESBIO that they used, the task is difficult. Add to that the quality of the reference data set which is based on CLC 2012 for a 2014 image time series, and the result is a daunting task.


Even with all these difficulties, several teams achieved FScores higher than 94% and 2 of them were above 99%. It seems that Deep Learning can generalise better than other approaches, and I guess that the winners use these kind of techniques, so I will assume that these algorithms achieve perfect learning and generalisation. In this case, the map they produce, is perfect. The issue is that the data used for validation is not perfect, which means that an algorithm which achieves nearly 100% accuracy, not only has the same amount of error than the validation data, but also that the errors are exactly on the same samples!


I don't have the full details on how the data was generated and, from the contest web site, I can't know how the different algorithms work 2, but I can speculate on how an algorithm can achieve 99% accuracy in this case. One reason is over-fitting3, of course. If the validation and training sets are too similar, the validation does not measure generalisation capabilities, but it rather gives the same results as the training set. Several years ago, when we were still working on small areas, we had this kind of behaviour due to a correlation between the spatial distribution of the samples and local cloud patterns: although training and test pixels came from different polygons, for some classes, they were close to each other and were cloudy on the same dates and the classifier was learning the gapfilling artifacts rather than the class behaviour. We made this mistake because we were not looking at the maps, but only optimising the accuracy metrics. Once we looked at the classified images, we understood the issue.



In this era of kaggleification of data analysis, we must be careful and make sure that the metrics we optimise are not too simplistic. It is not an easy task, and for some of the problems we address, we don't have the perfect reference data. In other situations, we don't even have the metrics to measure the quality.

The solutions we use to solve mapping problems need an additional validation beyond the standard machine learning metrics.



Please correct me if my assumptions are wrong!


Deep neural networks are able to fit random labels by memorising the complete data set.

[MUSCATE News] A difficult start of 2018 for our production center

As you have probably noticed, our production rate has been very low these days and we are more than 10 days late in our delivery of L2A and snow cover products.

This seems to be due to an intervention on CNES cluster end of December to add new nodes and disk space. MUSCATE sometimes loses communication with the platform that handles the databases and crashes. As it also happened when CNES was closed for Christmas, we really lost a lot of time.  All the teams are on the deck to try to solve this issue and catch the delay up. We are very sorry for that inconvenience.


Revised spectral bands for Sentinel-2A

The Sentinel-2 mission status document, edited by ESA, is a very interesting reading. On its last edition of 2017, ESA announced very discretely that the spectral bands of Sentinel-2 had been revised, following a review of the pre-flight measurements. Very few details are provided on the nature of the error contained in the previous version, and on the validation of the new ones. But still, a new version of the spectral response function is available here,, since the 19th of December 2017. The site provides an excel file with the spectral response functions.

All the visible and near infrared bands have changed a little, even if only three bands have significant changes, B1, B2 and B8: B2 equivalent wavelength changes by 4 nm, B1 by 1 nm, and B8 by 2 nm. The SWIR bands did not change.

Old and new versions of five VNIR S2A spectral bands, together with that of S2B.

Most users should not use bands B1 and B2, as they are affected by atmospheric effects. So I do not think much of you will have to change the coefficients in your methods. But for us, who take charge of the atmospheric correction, and heavily rely on B1 and B2, it probably has an effect, and we are changing our look-up tables to account for that. Stay tuned for the results.

Best wishes for 2018 !


May this new year bring you happiness, and not only related to image time series !


As usual, this beginning of year brings the opportunity to summarize 2017. Here is what I would record, in our field of interest :

  • the consecration of Copernicus program and of the Sentinel satellites. Since 2015, more than 110,000 people have registered to access the data since 2015! In my opinion, this success is due to the combination of several factors: the data are free and easy to access, the observations are repetitive, regular and frequent worldwide, and the data are of high quality. Congratulations to ESA and the EU, not to mention the contribution of CNES for the quality of Sentinel-2 images and the calibration of Sentinel-3. Continue reading

Tous nos voeux pour 2018 !


Que cette nouvelle année vous apporte joie et bonheur, et pas seulement dans l'utilisation de séries temporelles !


Sans aucune originalité, ce début d'année est l'occasion de faire un petit bilan de l'année 2017. Voici, dans notre domaine, quelques uns des faits que je retiendrai :

  • la consécration du programme Copernicus, et des satellites Sentinel. Plus de 110 000 personnes se sont inscrites pour accéder aux données depuis 2015 !  A mon avis, ce succès est dû à la combinaison de plusieurs facteurs : les données sont gratuites et faciles d’accès, les observations sont répétitives, régulières et fréquentes sur le monde entier, et les données sont de grande qualité. Un grand bravo à l'ESA et à l'UE, sans oublier la contribution du CNES pour la qualité des images de Sentinel-2 et l'étalonnage de Sentinel-3. Continue reading

Venµs mission status, 5 months after launch

Hereafter some news on the status of the Venµs mission.

Venµs was successfully launched from the Kourou space port on August 1st by a VEGA launcher:




First images were acquired by mid-August (see below for examples). The commissioning phase is still running. This phase consists in checking the whole system, including the satellite, the camera, the download of the images and data to the Kiruna receiving station, the ground processing chains, as well as the geometric and radiometric calibrations. All these components are in a good health and working well. However, given the very demanding requirements in terms of multitemporal registration, more work than anticipated is needed to fine tune the AOCS (Attitude and Orbital Control Subsystem) and the processing algorithms. In addition, the first part of the in-flight demonstration of the electric engine developed by RAFAEL (called IHET) will take place from mid-december to mid-january.


For these reasons, CNES and ISA plan to resume the systematic acquisitions of the scientific sites early 2018. Preparing the reference images for every site and checking the quality of time series will also take some weeks. We anticipate delivering the Level 1 (top of the atmosphere reflectances, orthorectified) products by april 2018, but all the data acquired from the beginning of systematic acquisitions will be processed and made available on the Theia web site:



Level 2 data (top of the canopy reflectances) might be available slightly later since the method we use requires a time series of data.

You will find some examples of images following the links below :



and here:






The Moon is also acquired for calibration monitoring purposes:



We thank you for your patience. We are doing our best to provide you with quality products.

Happy New Year to you all

Our blog's audience in 2018


A sixth year begins for the "Séries temporelles" blog, and as usual, it is an opportunity to review its audience, and to get a little self-satisfaction.

The blog is always receiving more visits. even if the annual growth rate is much lower than the previous years, but still 20%... French visitors constitute only 35% of visits. The United States rank second, followed by Morocco, which is probably an effect of CESBIO's long presence in Marrakesh. Then come France neighbouring countries.


2013 2014 2015 2016 2017
Number of visits 13985 22928 34723 47773 57692
Number of viewed pages 30922 46940 66947 89555 105846

Continue reading

Fréquentation du blog en 2017


Une sixième année commence pour le blog Séries Temporelles, et comme d'habitude, c'est l'occasion de revenir sur sa fréquentation, et de faire un peu d'autosatisfaction.


Le blog reçoit toujours plus de visites. même si la croissance devient inférieure à 20%..en un an. Les visiteurs Français ne constituent plus que 35% des visites. Les Etats-Unis viennent en second, suivis par le Maroc, ce qui est probablement un effet de la longue implantation du CESBIO à Marrakech.

2013 2014 2015 2016 2017
Nombre de visites 13985 22928 34723 47773 57692
Nombre de pages lues 30922 46940 66947 89555 105846

Continue reading

Data upload on MUSCATE distribution server is slow

It seems that the upload of data to MUSCATE distribution server is quite slow these days. This results in a delay in the provision of data. As CNES is closed between Christmas and the new year, we will have to wait a little to have it repaired. We are sorry for that inconvenience. But you can use this opportunity to rest and spend time with your families without having to process our scenes :) . We wish you all a happy Christmas week.

Il semble que la mise à jour du serveur de distribution de MUSCATE très lente depuis quelques jours,c e qui cause d'importants retards de distribution. Comme le CNES est fermé entre Noel et le jour de l'an, nous devrons attendre pour réparer ce problème. Nous vous prions e nous en excuser. Mais vous pouvez en profiter pour passer des vacances en famille sans avoir à traiter nos données ;) . Passez tous de bonnes fêtes !