KML, KMZ and Featureclasses

It is a very common need to convert the form of data from one structure to another. Whether it’s data stored as a simple list, CSV, JSON, or XML, it is likely that it will need to be in a different structure in order for it to be consumed or analyzed. For this section, we will look at extracting and translating data from a KML and extracting Featureclass data to a dictionary.

KML and KMZ

A friendly competitor or companion to the shapefile format is the KML or Keyhole Markup Language and it is used to display geographic data in several different GIS software suites. Some sample KML and KMZ files [1] can be downloaded. KML organizes the different features into placemarks, paths, and polygons and groups like features into these categories. When you import a KML into GIS, you will see these three shapefiles, even if there are different features with different attributes. For example, if you have a KML of hotels and soccer field locations, these would be combined within the placemark features. Esri provides some conversion of KML’s but they do not offer the ability to also dissolve on an attribute (separating hotels from soccer fields) during the import. This becomes a multistep process if you want your data compartmentalized by feature type.

There are python packages that provide tools to access the feature’s data and parse them into familiar featureclass or geojson structures that are worth knowing about. One package that is worth knowing is kml2geojson, which extracts the KML data into a geojson format. The steps are mostly the same, but involves writing the JSON to a file in order for esri's JSONToFeatureClass method to read it.

from zipfile import ZipFile
import kml2geojson
import os
import json

outDir = r'C:\NGA\kml kmz data\kml kmz data'
kmz = os.path.join(outDir, r'CurtGowdyStatePark.kmz')

# extract the kml file
with ZipFile(kmz) as kmzFile:
    # extract files
    kmzFile.extractall(outDir)

# convert the doc.kml to a geojson object
kml_json = kml2geojson.main.convert(fr'{outDir}\doc.kml', r'curt_gowdy')

From here, you have a geojson object that you can convert to a Featureclass via JSONToFeatureClass, import into other analytical tools like geopandas/ pandas dataframes, or use in API's like the ArcGIS API for Python. Having the data in this portable format also assists with extracting data out of the KML. For example, if you were tasked with getting a list of all places within the KML, you could use a mapping program like Google Earth, QGIS or ArcGIS Pro to open the dataset and copy out each requested attribute, or you could employ Python and a few other packages to do the work for you now that you know how to parse the KML. For example, getting the place name of each feature by using the geopandas package:

import kml2geojson
import os
import geopandas as gpd

# read in the kml and convert to geojson.
kml_json = kml2geojson.main.convert(fr'{outDir}\CurtGowdyArchery.kml', r'curt_gowdy_archery')

# ETL to geopandas for manipulation/ data extraction
gdf = gpd.GeoDataFrame.from_features(kml_json[0]["features"])

place_names = list(set(gdf['name']))

print(place_names)

The features from the kml2geojson are read into the geopandas dataframe, where you can then use pandas functions to access the values. You could drill down into the features of the kml_json to loop over the list of features, though geopandas does it for us in less code.

Writing the geojson to file can be done using json.dump() method by adding it into our script:

...
# write the converted items to a file, using the name property of the kml_json object
# as the output filename.
with open(fr'{outDir}\{kml_json[0]["name"]}.geojson', 'w') as file:
    json.dump(kml_json, file)

...

ESRI Tools

Esri provides several geoprocessing tools that assist in the ETL of KML’s. KML To Layer [2] converts a .kml or .kmz file into Featureclasses and a symbology layer file. This method creates a file geodatabase and parses the KML features into the respective point, polyline, and polygon featureclasses as part of a Feature Dataset. It uses a layer file to maintain the symbology that is included within the KMZ. The example below takes it a step further and splits the resulting featurclasses into individual featureclasses based on the attributes.

import os

outDir = r'C:\NGA\kml kmz data'
file_name = 'CurtGowdyStatePark'
kmz = os.path.join(outDir, f'{file_name}.kmz')

# extract the kml file
with ZipFile(kmz) as kmzFile:
    # extract files
    kmzFile.extractall(outDir)

# convert the kml/kmz to featureclasses in a gdb.
arcpy.conversion.KMLToLayer(fr"{outDir}\doc.kml", fr"{outDir}\{file_name}", file_name)

# Change the workspace to the gdb created by KMLToLayer. The method creates a file geodatabase named .gdb
arcpy.env.workspace = fr'{outDir}\{file_name}\{file_name}.gdb'

# get the featureclasses created by the KMLToLayer method
fcs = [fc for fc in arcpy.ListFeatureClasses(feature_dataset='Placemarks')]

# Set the fields that will be used for splitting the geometry into named featureclasses
# Multiple fields can be used, ie ['Name', 'SymbolId']
splitDict = {'Points': 'SymbolID',
             'Polylines': 'Name',
             'Polygons': 'Name'}

# iterate over featureclasses and execute the split by attributes.
for fc in fcs:
    arcpy.SplitByAttributes_analysis(fc, arcpy.env.workspace, splitDict.get(fc))

Featureclass to Dictionary

When working with Featureclasses and Tables, it is necessary to sometimes compare one dataset to another and make updates. You can do this with nested cursors, but it can get confusing, circular, and costly regarding speed and performance. It is better to read one dataset into an organized structure and operate against it than trying to nest cursors. I'll provide some methods of creating dictionaries and then demonstrate how you could use it.

A simple way to avoid nested cursors is to load all the data into a dictionary using a search cursor. Then use dictionary lookups to retrieve new data. For example, this code below creates a dictionary of attribute values from a search cursor.

fc = r'C:\NGA 489\USA.gdb\Cities' 

# Create a dictionary of fields and values, using the objectid as key
# Get a list of field names from the Featureclass.
fields = [fld.name for fld in arcpy.ListFields(fc)] 

# Create empty dictionary
fc_dict = {} 

# Start the search cursor
with arcpy.da.SearchCursor(fc, fields) as sCur: 
    for row in sCur: 
        # Add the attribute values from the cursor to the avalache_dict using the OBJECTID as the key
        fc_dict[row[fields.index('OBJECTID')]] = {k: v for k, v in zip(sCur.fields, row)}

This code converts all features into a dictionary using a field (TYPE) as the key, and adds the feature's UIDENTvalue to a list to create a group of UIDENT values for each TYPE.

fc = r'C:\NGA 489\USA.gdb\Hydro' 
fields = ['OBJECTID', 'TYPE', 'HYDRO_', 'UIDENT'] 

fc_dct = {} 

with arcpy.da.SearchCursor(fc, fields) as sCur: 
    for row in sCur: 
        if dct.get(row[fields.index('TYPE')]): # gets the list index of ‘TYPE’ from the fields list as index of row 
            # Append the UIDENT value to the list
            fc_dct[row[fields.index('TYPE')]].append(row[3]) 
        else: 
            # Create the list for the accountno and add the space value.
            fc_dct[row[fields.index('TYPE')]] = [row[3]]

This example creates a dictionary of dictionaries using the OBJECTID as the key and {field:value, …} for all features in dictionary comprehension. We will discuss the list and dictionary comprehension construct in more detail in lesson 3.

fc = r'C:\NGA 489\USA.gdb\Hydro'
fields = [f.name for f in arcpy.ListFields(fc)] 

# Create a dictionary of fields and values, using the objectid as key
sCur = arcpy.da.SearchCursor(fc, fields)

fc_dct = {row[fields.index('OBJECTID')]: {k: v for k, v in zip(sCur.fields, row)} for row in sCur} # zip is a built in method that creates a field:value, one to one representation of the field to the value in the row, like a zipper.

Once the data is in the dictionary structure, you can iterate over the dictionary within an Update or Insert cursor to save the new data. For example:

# Create InsertCursor to add new features to a Featureclass

fc = r'C:\NGA 489\USA.gdb\Hydro'
# filter out the objectid field since it is autogenerated
fields = [f.name for f in arcpy.ListFields(fc) if f.name not in ['OBJECTID']] 
with arcpy.da.InsertCursor(fc, fields) as iCur: 
    # Iterate over the dictionary items.  Note that the dictionary may be in the { objectid : {field: value, ...}, ...} 
    # format so there is a need to get the inner dictionary of fields and values 
    for k, v in fc_dct.items(): 
        # k = objectid
        # v = {field: value, ...}
        # get list of values in the order of the cursor fields.
        iRow = [v.get(f) for f in iCur.fields]
        iCur.insertRow(iRow)

or as an Update and iterating over the features from the cursor and matching to the key in the dictionary by OBJECTID:

with arcpy.da.UpdateCursor(fc, [‘OBJECTID’, 'TYPE', 'UIDENT']) as uCur: 
    for row in uCur: 
        vals = fc_dct.get(row[0]) 
        if vals: 
            row[1] = vals['TYPE'] 
            row[2] = vals['UIDENT'] 
            uCur.updatetRow(row)