3. Image classification
Objective
Supervised classification is arguably one of the most important classical machine learning techniques in remote sensing. Applications range from generating Land Use/Land Cover maps to change detection. The Earth Engine is unique suited to do classifications at scale. The goal of today's session is to provide an overview of land cover classification routines in Earth Engine environment. Our aim is to provide examples of typical workflows that you can customize and implement depending on your individual needs.
Supervised classification
The increasing availability and accessibility of earth observation imagery provides significant opportunities to assess status and monitor changes in land cover. To unlock this capability supervised and unsupervised classification methods can be applied. Here, we will focus on supervised classifications, as these are most commonly implemented.
Classification workflow
The Classifier package handles supervised classification by traditional ML algorithms running in Earth Engine. These classifiers include CART, RandomForest, NaiveBayes and SVM. The general workflow for classification is:
Collect training data. Assemble features which have a property that stores the known class label and properties storing numeric values for the predictors.
Instantiate a classifier. Set its parameters if necessary.
Train the classifier using the training data.
Classify an image or feature collection.
Estimate classification error with independent validation data.
The training data is a FeatureCollection with a property storing the class label and properties storing predictor variables. Class labels should be consecutive, integers starting from 0. If necessary, use remap() to convert class values to consecutive integers. The predictors should be numeric. The training data can be points or polygons representing homogeneous regions. For polygons, every pixel in each polygon is a training point.
// Make a cloud-free Landsat 8 TOA composite (from raw imagery).
var l8 = ee.ImageCollection('LANDSAT/LC08/C02/T1');
var image = ee.Algorithms.Landsat.simpleComposite({
collection: l8.filterDate('2018-01-01', '2018-12-31'),
asFloat: true
});
// Use these bands for prediction.
var bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10', 'B11'];
Map.setCenter(-62.9136, -9.1308, 12)
Map.addLayer(image, {bands: ['B7', 'B5', 'B3'], min: 0.05, max: [0.15, 0.35, 0.30], gamma: 2}, 'Landsat composite');
/* HERE YOU NEED TO TAKE ACTION:
1. Use the geometry tool (points or polygons) to define three classes/geometries,
named "Forest", "Bare", and "Water". Draw at least three polygons (per class)
that are representive for each land cover class on top of the Landsat composite.
2. Import each geometry as a "Feature" (edit lyer properties menu)
3. Add to each feature a property called "class" and assign them values:
Forest = 0, Bare = 1, Water = 2
*/
// Combine your features to one FeatureCollection
var polygons = ee.FeatureCollection([Forest, Bare, Water]);
// 1. Collect training data.
// Get the values for all pixels in each polygon in the training.
var training = image.sampleRegions({
// Get the sample from the polygons FeatureCollection.
collection: polygons,
// Keep this list of properties from the polygons.
properties: ['class'],
// Set the scale to get Landsat pixels in the polygons.
scale: 30
});
// Investigate the training FeatureCollection
print('Training spectra per class',training.limit(11))
// 2. Instantiate a classifier.
// Create an Support Vector Machine (SVM) classifier with custom parameters.
var classifier_svm = ee.Classifier.libsvm({
kernelType: 'RBF',
gamma: 0.5,
cost: 10
});
// 3. Train the classifier.
var trained_svm = classifier_svm.train({
features: training,
classProperty: 'class',
inputProperties: bands
});
// 4. Classify the image.
var classified_svm = image.classify(trained_svm);
// Display the classification result.
Map.addLayer(classified_svm,
{min: 0, max: 2, palette: ['green','red', 'blue']},
'classification svm');In this example, your training data store the land cover class label. Note that the training property ('class') stores consecutive integers starting at 0 (Use remap() on your table to turn your class labels into consecutive integers starting at zero if necessary). Also note the use of image.sampleRegions() to get the predictors into the table and create a training dataset. To train the classifier, specify the name of the class label property and a list of properties in the training table which the classifier should use for predictors. The number and order of the bands in the image to be classified must exactly match the order of the properties list provided to classifier.train(). Use image.select() to ensure that the classifier schema matches the image.

This previous example uses a Support Vector Machine (SVM) classifier (Burges 1998). Note that the SVM is specified with a set of custom parameters. Without a priori information about the physical nature of the prediction problem, optimal parameters are unknown. See Hsu et al. (2003) for a rough guide to choosing parameters for an SVM.
Playtime
Task: Compare your SVM classification with a Random Forest classifier (ee.Classifier.smileRandomForest). Which classifier performs better (in your opinion)?
Use the Code Editor (start from your previous svm-classification script, which you used above). After you finished the comparison, zoom out to see how the classifier performs outside your training area.
Accuracy assessment
It is very important to get a quantitative estimate of the accuracy of the classification. Otherwise, there is no real value behind your results. To do this, a common strategy is to divide your training samples into 2 random fractions - one used for training the model and the other for validation of the predictions. Once a classifier is trained, it can be used to classify the entire image. We can then compare the classified values with the ones in the validation fraction. We can use the ConfusionMatrix (Stehman 1997) to assess the accuracy of a given classifier. Classification results are evaluated based on the following metrics:
Overall Accuracy: How many samples were classified correctly.
Producer’s Accuracy (recall): How well did the classification predict each class.
Consumer’s Accuracy (precision): How reliable is the prediction in each class.
Kappa Coefficient: How well the classification performed as compared to random assignment.

The following example uses sample() to generate training and validation data from a Copernicus classification reference image and compares confusion matrices representing training and validation accuracy:
This example uses a random forest (Breiman 2001) classifier with 10 trees to downscale classification image (100 m) to Landsat resolution (30 m). The sample() method generates two random samples from the classification image: one for training and one for validation. The training sample is used to train the classifier. You can get resubstitution accuracy on the training data from classifier.confusionMatrix(). To get validation accuracy, classify the validation data. This adds a classification property to the validation FeatureCollection. Call errorMatrix() on the classified FeatureCollection to get a confusion matrix representing validation (expected) accuracy.
Task: Replace the sampling approach using the "stratifiedSample" method to sample and validate all classes more evenly.
Inspect the output to see that the overall accuracy estimated from the training data is much higher than the validation data. The accuracy estimated from training data is an overestimate because the random forest is “fit” to the training data. The expected accuracy on unknown data is lower, as indicated by the estimate from the validation data.
You can also take a single sample and partition it with the randomColumn() method on feature collections. You may also want to ensure that the training samples are uncorrelated with the evaluation samples. This might result from spatial autocorrelation of the phenomenon being predicted. One way to exclude samples that might be correlated in this manner is to remove samples that are within some distance to any other sample(s). This can be accomplished with a spatial join. See both approaches (sample partitioning and spatial autocorrelation check) included in the previous example:
Crop type classification
Reliable and accurate crop classification maps are an important data source for agricultural monitoring and food security assessment studies. Crops such as rice, wheat, corn and barley are major food resources in many parts of the world, thus information on their spatial distribution and condition are significantly important at regional, national and even global level. To acquire such information on croplands over large agricultural regions, satellite data is an essential data source.

In the Central Valley, also known as the Great Valley of California, 250 different crops are grown with an estimated value of $17 billion per year. The predominate crop types are cereal grains, hay, cotton, tomatoes, vegetables, citrus, tree fruits, nuts, table grapes, and wine grapes. Our aim is to classify some of these crop types using Sentinel-2, Landsat and Sentinel-1 data. Our training data is based on the Cropland Data Layer (CDL), which is a crop-specific land cover data layer created annually for the continental United States using moderate resolution satellite imagery and extensive agricultural ground truth.
Open in Code Editor (Classification using Landsat 8 single image)
Open in Code Editor (Classification using Landsat 8 time series)
Open in Code Editor (Classification using Sentinel-2 time series)
Open in Code Editor (Classification using Sentinel-1 time series)
Open in Code Editor (Classification using Sentinel-1 and Sentinel-2 combined time series)
Another classification example can be found in the JavaScript tutorial of the Earth Engine:
Last updated
Was this helpful?