Essential Go XML  Edit on GitHub      File Issue

Pull (streaming) XML parsing

Parsing into a struct is convenient but requires a lot of memory to hold the whole decoded document in memory.

In some cases XML files are so large that it’s not possible to decode the whole file into memory. For example XML dumps of Wikipedia content are several gigabytes in size.

Pull parsing is more efficient but API is harder to use.

var xmlStr = `
<people>
	<person age="34">
		<first-name>John</first-name>
		<address>
			<city>San Francisco</city>
			<state>CA</state>
		</address>
	</person>
	<!-- sample comment -->
	<person age="23">
		<first-name>Julia</first-name>
	</person>
</people>`

r := bytes.NewBufferString(xmlStr)
decoder := xml.NewDecoder(r)
inCityElement := false
for {
	t, err := decoder.Token()
	if err == io.EOF {
		// io.EOF is a successful end
		break
	}
	if err != nil {
		fmt.Printf("decoder.Token() failed with '%s'\n", err)
		break
	}

	switch v := t.(type) {

	case xml.StartElement:
		if v.Name.Local == "person" {
			for _, attr := range v.Attr {
				if attr.Name.Local == "age" {
					fmt.Printf("Element: '<person>', attribute 'age' has value '%s'\n", attr.Value)
				}
			}
		} else if v.Name.Local == "city" {
			inCityElement = true
		}

	case xml.EndElement:
		if v.Name.Local == "city" {
			inCityElement = false
		}

	case xml.CharData:
		if inCityElement {
			fmt.Printf("City: %s\n", string(v))
		}

	case xml.Comment:
		fmt.Printf("Comment: %s\n", string(v))

	case xml.ProcInst:
		// handle XML processing instruction like <?target inst?>

	case xml.Directive:
		// handle XML directive like <!text>
	}
}
Element: '<person>', attribute 'age' has value '34'
City: San Francisco
Comment:  sample comment 
Element: '<person>', attribute 'age' has value '23'

Pull parsing requests next token from stream of XML tokens.

For start tag like <person> we get xml.StartElement token.

For end tag like </person> we get xml.EndElemnt token.

For data inside the element <person>data</person> we get xml.CharData token.

When decoder reaches the end, it returns error io.EOF.

In the above example we print age attribute of <person> element and char data inside <city> element.

This is a very basic example. In real programs you might need to remember more state.

For example, if your XML is:

<foo>
  <bar>
    <foo></foo>
  </bar>
</foo>

If you look just at the xml.StartElement token, you don’t know if foo is for the top-level element or is it a child of <bar> element.

  ↑ ↓ to navigate     ↵ to select     Esc to close