Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing KML with accented characters non-local files #864

Open
gmaclennan opened this issue Oct 31, 2011 · 4 comments
Open

Error parsing KML with accented characters non-local files #864

gmaclennan opened this issue Oct 31, 2011 · 4 comments
Labels

Comments

@gmaclennan
Copy link

I get the following error:

Error: OGR Plugin: XML parsing of KML file failed : not well-formed (invalid token) at line 19, column 26

from this KML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder id="Fusiontables">
<name>Fusiontables folder</name>
<Style id="BasicStyle">
<BalloonStyle>
<text>$[description]</text>
</BalloonStyle>
<IconStyle>
<color>FFFFFFFF</color>
<scale>1.1</scale>
<hotSpot x="0.5" y="0" xunits="fraction" yunits="fraction"/><Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/red-blank_maps.png</href>
</Icon>
</IconStyle>
</Style>
<Placemark>
<name><![CDATA[Sargento Puño Camp]]></name>
<styleUrl>#BasicStyle</styleUrl>
<description>
<![CDATA[<div class="googft-info-window" style="font-family:sans-serif">
<b>Desc:</b> Sargento Puño Camp<br>
<b>Status:</b> ACTIVE<br>
<b>Type:</b> Camp<br>
<b>X:</b> 210595<br>
<b>Y:</b> 9643410<br>
<b>Lat:</b> -3.22280072631399<br>
<b>Long:</b> -77.6040046294179<br>
<b>Source:</b> Oxy EIA
</div>]]>
</description>
<Point>
<coordinates>
-77.6040046294179,-3.22280072631399,0
</coordinates>
</Point>

When loading from this link:

https://www.google.com/fusiontables/exporttable?query=select+col5+from+1612141+&o=kml&g=col5

However if I load this file locally there is no error.

The character at line 19, column 26 is the ñ in Puño. Changing this to a non-accented n allows network import with no errors.

@tmcw
Copy link
Contributor

tmcw commented Oct 31, 2011

@springmeyer this looks like an upstream. Since you're an OGR dev, maybe take a look?

@springmeyer
Copy link
Member

The problem appears to be that the file is not utf8 encoded. In fact it is iso-8859-1 (according to the detection of the unix file command):

$ file -I "Talisman installations - Points.kml" 
Talisman installations - Points.kml: application/xml; charset=iso-8859-1

So, ogr trusts the declared character encoding at the top of the KML (encoding="UTF-8") and fails to decode the character.

$ ogrinfo "Talisman installations - Points.kml" 
ERROR 1: XML parsing of KML file failed : not well-formed (invalid token) at line 19, column 26
FAILURE:
Unable to open datasource `Talisman installations - Points.kml' with the following drivers.

If I reencode the file as utf8 then it works fine:

$ iconv -f iso-8859-1 -t utf-8 "Talisman installations - Points.kml" > talisman_utf8.kml
ogrinfo talisman_utf8.kml 
Had to open data source read-only.
INFO: Open of `talisman_utf8.kml'
      using driver `KML' successful.
1: Fusiontables folder (3D Point)
@springmeyer
Copy link
Member

This is a very common problem, and I've solved it in the past by forcing utf-8 encoding after detecting the likely encoding using http://pypi.python.org/pypi/chardet. We'll have to look for something similar in nodejs.

Until then the only way I can think of handling this is going back to the fusion tables source and changing the character to use the utf-8 encoded value. If you don't know how to do this then perhaps contact the fusion tables support.

@tfmorris
Copy link

I'm investigating a similar problem right now with the KMZ containing the World Heritage sites. I think contributing to why the problem is not getting fixed at source is the fact that Google Earth appears to ignore the declared XML encoding and interpret the characters as Latin-1. I'm not sure if it's hardwired for Latin-1 or if it's using some type of sniffer to guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants