The importance of a good cloud mask for the operational use of Sentinel-2 data

=>

Top of Atmosphere reflectance in the 4 high resolution channels of Sentinel-2

The above plot shows the TOA reflectance time series gathered by Sentinel-2 over a pixel chosen randomly in a tile in the centre of France (tile 31TDK, pixel 3000-7000), from L1C products.  Looking at the time series, it is rather difficult to tell what kind of surface was observed, even if a vegetation cycle seems to be present.  As we will see below, most of the observed noise is due to the presence of clouds or cloud shadows.

Continue reading

Patagonian skies are not cloudy anymore

"The most usual weather in these latitudes is a fresh wind between north west and south west with a cloudy overcast sky" - Phillip Parker King, Sailing Directions for the Coasts of Eastern and Western Patagonia (1832).

 

Patagonia is a beautiful place to visit but campers know that the weather is extremely variable and the sky is often cloudy. This can be a problem for glaciologists, too, since they rely on optical satellite imagery to study glacier area changes over the last decades (mainly Landsat). Clear-sky optical images can also be used to determine glacier velocity, albedo, front variations, etc.
Continue reading

MACCS/MAJA, how it works

=> 


MACCS (Multi-sensor Atmospheric Correction and Cloud Screening) is a level 2A processor, which detects the clouds and their shadows, and estimates aerosol optical thickness (AOT), water vapour and corrects for the atmospheric effects. The processor was jointly developed by CESBIO and CNES. CESBIO developed the methods and a prototype, while CNES funded the operational version of the processor, with a strong support from CESBIO for the validation.

 

More recently CNES+CESBIO and DLR joined their efforts to develop a joint processor named MAJA, for MACCS-ATCOR Joint Algorithm. MAJA is an evolution  of MACCS, in which a couple of methods inspired by ATCOR software have been added. MAJA V1_0 could have been called MACCS V6.0, but we wanted to celebrate the association of both entities with a new name.

MAJA runs at CNES within the Muscate ground segment of Theia (data are available here) and within the Venµs ground segment. And finally, MAJA is freely available in binary version for non commercial use.

 

MAJA's distinctive feature is its dedication to high resolution time series and its wide use of multi-temporal methods. For this reason, MAJA can only be applied to the optical missions which observe the earth under constant viewing angles. However, MAJA was already applied to several satellites :

 

MAJA is briefly described in the joined figure.

Atmajaspheric absorption

In the case of Sentinel-2 and Venµs, which include a water vapour channel at 940 nm (resp. 910nm) in a strong water vapour absorption band, a first step consists in estimating the atmospheric water vapour content. For the other satellites, weather analysis data can be used. After that, the processor can correct for the gaseous absorption using the SMAC model.

 

Composite imajage

The next steps deeply involve multi-temporal methods. Of course, to do that, a time series must be processed in chronological order. After each processing, a composite image is updated with with the unclouded pixels from the processed date. This composite image is used as a reference for the cloud detection and the AOT estimate.

 

The cloud majasks

Our cloud detection method is based on a large number of tests, the most efficient of which are :

  • a test based on the  cirrus band (at 1380 nm), available on Landsat 8 and Sentinel-2, which detects very well the high clouds (above 2000m)
  • a multi-temporal test, which detects a steep increase of the blue surface reflectance, which is the sign of presence of a cloud.
  • and finally, to avoid over detections of clouds, for each potential detected by one of the previous tests, a last test measures the correlation of the pixel neighbourhood with the previous images. As it is unlikely that two different clouds at the same location on successive dates have the same shape, if a large correlation is observed, the pixel is finally not declared as a cloud.

Having detected the clouds, we can follow with the detection of  cloud shadows, water, and snow.

Aerosol optical thickness estimajate

The aerosol optical thickness (AOT) estimate combines several criteria in the computation of a global cost function, which is then inverted using non linear least mean squares inversion.

  • A multi-temporal criterion : after atmospheric correction, two successive observations of the same neighbourhood should provide nearly the same surface reflectances. The squared residuals after atmospheric correction are inserted in the cost function.
  • A multi-spectral criterion : above vegetation, and also above many bare soils, the surface reflectance in the blue is close to half the reflectance in the red. The squared residuals to this relation after atmospheric correction are also added to the cost function.
  • Optical Thickness minimum and maximum : AOT cannot be negative, and should not get higher that the one measured using the dark pixel method. When the AOT values are above maximum or under minimum, a high cost is added to the cost function.

 

The cost function evaluation is evaluated using neighbourhoods of coarse resolution pixels (240m), spreading over 2 kilometres. The obtained AOT images are then smoothed, the gaps are filled to obtain finally an AOT map with a 5 km resolution. The aerosol type is not estimated, it is a processing parameter which can be fixed per geographic region.

Atmajaspheric correction

One of he quicklooks we produce with each image for visual verification, here for Chiapas site in Mexico, with the TOA reflectance, top left, the AOT and cloud mask, bottom left, surface reflectance after adjacency effect correction, top right, and the same with slope correction, bottom right.

 

Once the AOT is known, we can retrieve the surface reflectances. To do that, we are using look-up tables (LUT) which are computed using the SOS radiative transfer code (Successive Orders of Scattering, Lenoble, 2007). These LUT are also used in the AOT estimation.

 

The surface reflectance of the cloud free pixels obtained there are used to update the composite image, which will be used for the processing of the next image in the time series.

 

Before editing the output product, we still need to correct for two other points, already described in this blog : the adjacency effects and the effects of terrain slopes on the illumination.

 

MAJA development started in 2005, and the contributor list is starting to be quite long :

  • at CESBIO : H.Tromp, V. Debaecker, M. Huc, P.Gely, Bastien Rouquié and O.Hagolle,
  • at CNES : B. Petrucci, D.Villa-Pascual, Camille Desjardins, Pierre Lassalle
  • at DLR : A. Makarau and R.Richter
  • at CS-SI : T.Feuvrier, C.Ruffel, A.Bricier and many others
  • at CAP GEMINI : M.Farges, G. Rochais, E.Durand
  • at Magellium : E. Hillairet

For more details, we have published 4 papers about MACCS methods and validation :

  • A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENµS, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755
  • Correction of aerosol effects on multi-temporal images acquired with constant viewing angles: Application to Formosat-2 images, O Hagolle, G Dedieu, B Mougenot, V Debaecker, B Duchemin, A Meygret, Remote Sensing of Environment 112 (4), 1689-1701

 

A dark cloud over Kiev on the 9th of June

=>

These days, Mireille Huc is spending a lot of time to enhance the cloud shadow detection method applied to time series. Our MACCS method tends to forget some shadows when they are partly hidden under the cloud. We will explain in a future article the defects of the present method and how we will mitigate them.

 

When checking our results, we found out a very particular case, on June the 9th in 2015, on the time series acquired near Kiev with SPOT5 (Take5).  The images are shown below :

 

May 25th

June 9th

June 14th

 

Quicklook de l'image du 9 juin avec nuages entourés en vert et ombres entourées en noir

The dark zone in the image center was not classified as a cloud shadow, as shown by the quicklook. Because it is not a cloud shadow, the sun is in the South-East direction, and the shadows are cast to the North-West. It does not look like the footprint of a flooding, or of a forest fire, and there was no solar eclipse on that day...

 


In fact, a close up on the SWIR image, which is sensitive to the thermal emission by high temperatures, shows that it is a black smoke cloud, due to a fire at the North East of the cloud.  Duckduckgo gave us the answer, it was the fire of a fuel depot (which caused some casualties).

 

Our multi-temporal methods pour cloud detection and aerosol estimates is disturbed by this dark cloud. The surface reflectance drops and then increases again, the drop is not detected as a shadow, but the subsequent increase is interpreted as a cloud. The aerosols are also inaccurately estimated, since usually, an increase of the aerosol quantity causes an increase of the reflectance, but here, the aerosol are so absorbing that the reflectance decreases.

 

USGS releases LANDSAT 8 data corrected from atmospheric effects

=>

Since Christmas (almost, December 23rd), USGS distributes LANDSAT 8 data corrected from atmospheric effects, as they already did for LANDSAT 5 and 7. These data have been long awaited and will be very useful.

 

Of course, USGS processing addresses the whole world. To access these products, you should use the earthexplorer.server. You will find the data by clicking on LANDSAT_CDR, as shown in the image on the right. You will have to order for their processing before being able to download them. Usually this processing is very quick.

 

We did not yet compare the USGS products developed by NASA (E.Vermote) with our Level 2A LANDAT 8 data acquired over France and produced and distributed by THEIA with CESBIO processor, but we will work into it shortly.

 

LANDSAT 5 & 7 acquired above France since 2009 soon released by THEIA

=>

As for LANDSAT 8 a few weeks ago, we just produced the level 2A products for the LANDSAT 5 et LANDSAT 7 data acquired above France from 2009 to 2011. This data set will be released in a few days, when its transfer to the distribution server has ended. The MUSCATE team took charge of the processing for the THEIA Land data center, using CNES computing center. The data will be available on the following site :

http://spirit.cnes.fr/resto/Landsat/

Example of a Level 2A product from LANDSAT 5 over the Atlantic coast of France. The clouds are circled in green, the water mask in blue, and the snow in pink. Sometimes, the water turbidity provides a signal similar to snow in the infra-red, which turns the now flag on...

The processing methods and the data format are similar to the LANDSAT 8 data set described here. However there are also a few differences which are detailed below :

 

Starting point.

The starting point is not the same for LANDSAT 5 and LANDSAT 7 :

  • For LANDSAT 7, as for LANDSAT 8, we start from Level 1T products provided by USGS. These products have a huge defect, with black stripes appearing away from the center axis of the image. These stripes are due to the breakdown of a mirror in 2003. The origin of this defect is described here. In our case, we decided to use only the central part of the images, doing a slight interpolation when the stripes are thin, and removing the data when the lack stripes are too large.
  • LANDSAT 5 data acquired above France are not yet available at USGS. ESA owns these data and agreed to provide them  (Thanks to Bianca Hoersh and Riccardo Biasutti from ESA, and to the SERCO company who processed the data). As a result, this data set is a unique data set, only available online here and nowhere else ! These data are provided at level 1G, for which the data have not yet been ortho-rectified.  We had to ortho-rectify them at Theia, using CNES's SIGMA tool, as for SPOT4 (Take5).
  • Having a different approach for both sensors has a drawback. The grounc control point data base used at USGS seems to have some errors in France, and for instance, the location errors near Toulouse have a bias of about one pixel southward. It is not the case for the LANDSAT5 data ortho-rectified by THEIA, and therefore, one may observe registration errors in a time series involving LANDSAT 5 and LANDSAT 7 images. ESA's data also have some defects, which are presented at the end of this post.
Resampling to Lambert'93 projection

Level 1T data are provided with the UTM projection. This projections uses three different zones over France, for which the registration of data is not direct. We decided to resample the data on a Lambert'93 projection, which is the official French projection. Of course, the LANDSAT5 have been directly projected in Lambert 93.

Tiling of products

We chose to tile the data in 110*110 km tile s spaced with a 100 km interval, as it will be done for Sentinel-2. The (1,1) tile is in the SouthWest corner of France. The tile of Toulouse is the 5th to the West, and the 2nd to the North. It is named D0005H0002 (D for "droite", H for "Haut"). For Corsica, a different tiling made of 2 tiles was defined.

 

For each tile, we provide the whole set of dates for which a LANDSAT 5 or 7 image intersects the tile. A few date may be missing, for several reasons, in general related to the cloud cover :

  • The image was not acquired by LANDSAT 5 or 7 (when a 100% cloud cover is forecast, the image is not acquired).
  • The image was acquired but not processed to L1T by LANDSAT7 at USGS, or to L1C at CNES, because the cloud cover prevented from using a sufficient number of ground control points
  • The Level 2A processing rejects images with more than 90% of cloud cover.

 

Level 2A processing (atmospheric correction and cloud screening)

First of all, we would like to outline that our processor does not process the themal bands of LANDSAT

 

For the visible, near and short wave infrared bands, we use the same method as for SPOT4(Take5). It involves also the MACCS processor, developed and maintained by Mireille Huc at CESBIO. It is based on multi-temporal methods for cloud screening, cloud shadow detection, water detection as well as for the estimation of the aerosol optical thickness.

 

However, thanks to LANDSAT spectral bands, our processing was enriched compared to SPOT4 (Take5). Thanks to the blue band, we have an additional criterion to detect the aerosols, thanks to the quasi constant relationship between the surface reflectances in the blue and in the red above vegetation. The precision gain due to this criterion compensates for the precision loss due the lower repetitivity of  LANDSAT images. Finally, as there is no 1.38 µm band on LANDSAT 5 and 7, the detection of high clouds is much less easy than for LANDSAT 8.

 

Images of one of the Atlantic Coast tiles, coming from different LANDSAT Paths (left and middle, tracks 201 and 200). The viewing angles are slightly different as the left image was observed from the West and the rmiddle image from the East. On the right, a Landsat7 image from track 201 reduced to its central part.

To increase repetitivity of observation which is essential in our multi-temporal method, we decided to use time series that merge LANDSAT 5 and LANDSAT 7 data as well as LANDSAT 5 data coming from adjacent tracks. As these data are not observed under the same viewing angle (+/- 7 degrees), but the angle difference is small enough to increase precision on the overlap zones, even if it may cause the appearance of artefacts in the AOT images.

 

Data Format

For LANDSAT 5 and 7, we used the same format as for SPOT4 (Take5).

 

Known issues :

Here is a little list of known defects for THEIA's LANDSAT 5 and 7 L2A products :

Example of LANDSATV5 "afterglow" issue near a large cloud. This electronic issue takes the appearance of whiter stripes above vegetation.

- reference data for ortho-rectification at USGS may be biased by more than 30 m (38 m in Toulouse). The Landsat 8 data could be misregistered with the LANDSAT 5 data ortho-rectified at CNES using a national geographical reference.

- LANDSAT 5 TM instrument electronics have an "afterglow" issue, that causes the appearance of whiter stripes perpendicularly to the satellite track near very bright zones such as a large cloud.

- ESA's LANDSAT 5 products have some random bright spots that appear as colored spots in color composites.

Bright color spots observed on some ESA LANDSAT 5 images.

- in LANDSAT products, the "nodata" value that tells if a pixel is outside the image is 0, which is also a value observed within the image. Sometimes pixels may be identified as nodata when the are in fact within the image. It happens mainly over sea, where the medium infrared reflectance is often equal to zero. In this case, all the bands have the nodata value which, in our products is -10 000, to avoid the same difficulties for subsequent users.

Completion of the processing of LANDSAT 8 Level 2A products taken above France in 2013.

=>

That was fast ! The processing of all the LANDSAT 8 images taken above France in 2013 took less than 15 days. The first LANDSAT 8 images were taken in April 2013. The MUSCATE team processed the data for the THEIA land data center, using CNES computing center.

 

A few more days will be necessary to upload the data on the THEIA website and to check that the data are correct. Finally, the longest part in the processing is the downloading of the input Level 1T products from USGS earthexplorer website (equivalent to the Level 1C in THEIA's nomenclature).

 

The Level 2A data quality is quite good, as may be seen on the browse products on the right, as shown by images on the right, which come from the times series obtained on the tile of Paris. As usual on this blog, the clouds are circled in green, the shadows in black, the snow in pink and the water in blue. A few clouds are sometimes missed by our multi-temporal method, when the repetitivity of cloud free acquisition is too low, as in the image on the right which was acquired during a cloudy spring. The following images in the time series are not affected by this kind of defect.

 

This paper aims at describing the main steps of the processing.

 

Starting point : Level 1T

We download the input data from the earthexplorer website, using an enhanced version of the script described here. These products are ortho-rectified by USGS, using a global data base of ground control points.

 

The location requirement for LANDSAT 8 is 50 m, which seems to be met by the L1T products. We found location errors around 1.5 pixels near Toulouse, but most regions seem to have better performances. USGS confirmed a 38m bias Southward near Toulouse and will try to correct them.  Our processing does not correct for these small errors, and the next version of the USGS LANDSAT 8 processing only wil lcorrect for this bias.

 

Regarding LANDSAT 8 absolute radiometric calibration, we use the coefficient values recommended by LANDSAT 8 and provided with the L1T products.

 

Resampling to Lambert'93 projection


Level 1T data are provided with the UTM projection. This projections uses three different zones over France, for which the registration of data is not direct. We decided to resample the data on a Lambert'93 projection, which is the official French projection.

Tiling of products

We chose to tile the data in 110*110 km tiles spaced with a 100 km interval, as it will be done for Sentinel-2. The (1,1) tile is in the SouthWest corner of France. The tile of Toulouse is the 5th to the West, and the 2nd to the North. It is named D0005H0002 (D for "droite", H for "Haut")

 

For Corsica, a different tiling made of 2 tiles was defined.

 

For each tile, we provide the whole set of dates for which a LANDSAT 8 image intersects the tile. A few date may be missing, for several reasons, in general related to the cloud cover :

  • The image was not acquired by LANDSAT 8 (when a 100% cloud cover is forecast, the image is not acquired).
  • The image was acquired but not processed to level 1T by LANDSAT8, because the cloud cover prevented from using a sufficient number of ground control points
  • The Level 2A processing rejects images with more than 90% of cloud cover.

 

Level 2A processing (atmospheric correction and cloud screening)

First of all, we would like to outline that our processor does not process the themal bands of LANDSAT 8.

For the visible, near and short wave infrared bands, we use the same method as for SPOT4(Take5). It involves also the MACCS processor, developed and maintained by Mireille Huc at CESBIO. It is based on multi-temporal methods for cloud screening, cloud shadow detection, water detection as well as for the estimation of the aerosol optical thickness.

 

However, thanks to LANDSAT 8 spectral bands, our processing was enriched compared to SPOT4 (Take5) : LANDSAT8's 1.38µm band enables an enhanced detection of high and thin clouds. And thanks to the blue band, we have an additional criterion to detect the aerosols, thanks to the quasi constant relationship between the surface reflectances in the blue and in the red above vegetation. The precision gain due to this criterion compensates for the precision loss due the lower repetitivity of  LANDSAT8 images.

Level 2A images from Paris's tile, from 3 different LANDSAT 8 tracks (From left to right, tracks 200, 199, 198). The viewing angle differs as the image is from the west on the left image, at nadir in the center and from the east for the right image.

 

To enhance the cloud screening accuracy, we decided to use the data from adjacent satellite tracks within the same time series. These data are not acquired under exactly the same angle (+/- 7 degrees), which is the assumed by the multi-temporal method, but the difference is small enough to allow a large accuracy gain due to the enhanced repetitivity. However, because of this approximation, a few artefacts may be observed.

 

For a greater enhancement, we might also use LANDSAT 7 and LANDSAT 8 data in the same time series, but we will implement that later on...

 

Data Format

For LANDSAT 8, we used the same format as for SPOT4 (Take5), excepted a few details, that I will describe soon...

 

 

 

SPOT4(Take5) : Cloud statistics after one month

=>

We have now received all the L1A images of the SPOT4(Take5) experiment taken between January the 31st and March the 10th, for which at least some part of the surface is visible. We ortho-rectify these images to obtain level 1C products, but sometimes, the cloud cover is still too high to process the image. We can use all these productions to derive some statistics about cloud cover.

 

Proportion of images processed at Level 1A and Level 1C for the sites selected by each agency.
Institution Images acquired L1A processed L1C processed % L1A % L1C
CNES 324 184 157 56 % 49 %
JRC 54 29 27 53 % 50 %
ESA 84 41 34 49 % 40 %
NASA 48 26 26 54% 54%
CCRS 6 1 1 17 % 17 %

 

Between 40% and 50% of the images taken are sufficiently clear so that the ortho-rectification is feasible. When the production of all cloud masks (level2A) is finished, we will be able to compute the number of cloud free observations for each pixel.

After having looked at all the images in Europe or North Africa, we can confirm that all the pixels of these sites have been observed at least once without clouds, except for 3 sites : CAlsace, EBelgium and CTunisia (!). For the site in Alsace, we had to wait until the 4th of March, and until the 10th of March for the site in Tunisia. And up to now, only a little part of the site in Belgium has been observed, on the 8th of March.

 

Number of images acquired in February,
as a function of their cloud cover
Site Clouds < 10% 10% < Clouds < 50% 50% < Clouds < 80% 80% < Clouds
Alpes 2 0 2 2
Alsace 0 0 0 6
Ardèche 1 1 0 4
Loire 1 0 3 2
Bretagne 1 0 1 4
Languedoc 0 2 2 2
Provence 2 3 1 0
SudmipyO 1 1 1 3
SudmipyE 1 1 1 3
VersaillesE 2 0 1 3

In France, despite a very cloudy month of February, the 5 days repetitivity enabled to observe nearly each site at least once. But if SPOT4 had only imaged one out of two overpasses, only the sites in Versailles, Provence and the Alps would have been observed in any case.

 

This result confirms that it is absolutely necessary to launch both Sentinel-2 satellites with a short time interval, so enable the numerous operational applications that need to rely on a monthly clear observation. And it would be a pity if the recent GMES/Copernicus budget cuts resulted in delaying the Sentinel-2B satellite, reducing the repetitivity to only 10 days for several long years.

SPOT4(Take5) first cloud masks

=>

Now that you know almost everything on our cloud detection method and on our shadow detection method, we can show you the first results obtained by Mireille Huc (CESBIO) with SPOT4(Take5) time series. As the method is multi-temporal, it needs an initialisation phase, and we had to wait until we had a sufficient number of images to produce the masks. These first results are not (yet) perfect, but are already quite presentable.

 

The images shown below are a series of 6 Level 1C images, expressed in Top of Atmosphere reflectance, with the contours of several masks orverlayed : the clouds are circled in green, their shadows in black, the water and snow mask are respectively circled in blue and pink. You may click twice on the images to see the details of the masks. These images were acquired in Provence (France), each of them is made from 4 (60x60 km²) SPOT Images obtained on the same day, ortho-rectified, then merged.

 

Most clouds are detected, including very thin clouds, while the number of false cloud detections is very low. Most large cloud shadow are also detected, even if a few of them were missed. The water mask is also quite accurate with nearly no false detections, taking into account it is produced at 200m resolution. The snow is well classified when the snow cover is high, but often, pixels with a moderate snow cover are classified as clouds. This is a classical difficulty with snow masks.

 

However, we know that your sharp eyes will have noticed some very thin clouds partly missed by our classification in the North East of the first image, a few false cloud detections on the 3rd and the 5th images (the ground dries and becomes brighter and whiter), some missed cloud shadows for some small clouds once in a while (we know why, it is an initialisation problem, but quite long to explain...). The cloud detection threshold for water pixels (the method is different from the cloud detection above land), is maybe a little to low, as some bright Camargue Lakes are wrongly classified as cloudy. But after all, for a first run, the result is not bad, and we will refine all the parameters when we have a sufficient number of images.

On the Fourth Image, only two of the 4 (60*60 km²) images are available, because the two others are too cloudy to be ortho-rectified, as we need to see the surface to take ground control points. In fact, the ortho-rectification step is the first of our cloud masking steps.

 

The clouds are circled in green, their shadows in black, the water and snow mask are respectively circled in blue and pink. You may click twice on the images to see the details of the masks.