Essential Go range statement  Edit on GitHub      File Issue

range over a string

Iterate over bytes

You can iterate over bytes in a string:

s := "Hey 世界"
for idx := range s {
	b := s[idx]
	fmt.Printf("idx: %d, byte: %d\n", idx, b)
}
idx: 0, byte: 72
idx: 1, byte: 101
idx: 2, byte: 121
idx: 3, byte: 32
idx: 4, byte: 228
idx: 7, byte: 231

Iterate over runes

Things are more complicated when you want to iterate over logical characters (runes) in a string:

s := "Hey 世界"
for idx, rune := range s {
	fmt.Printf("idx: %d, rune: %d\n", idx, rune)
}
idx: 0, rune: 72
idx: 1, rune: 101
idx: 2, rune: 121
idx: 3, rune: 32
idx: 4, rune: 19990
idx: 7, rune: 30028

In Go strings are immutable sequence of bytes. Think a read-only []byte slice.

Each byte is in 0 to 255 range.

There are many more characters in all the world’s alphabets.

Unicode standard defines unique value for every known character. Unicode calls them code points and they are integers that can fit in 32 bits.

To represent Unicode code points, Go has a rune type. It is an alias for int32.

Literal strings in Go source code are UTF-8 encoded.

Every Unicode code point can be encoded with 1 to 4 bytes.

In this form of iteration, Go assumes that a string is UTF-8 encoded. range decodes each code point as UTF-8, returns decoded rune and its byte index in string.

You can see the byte index of last code point jumped by 3 because code point before it represents a Chinese character and required 3 bytes in UTF-8 encoding.

Aside on strings and UTF-8

Go strings are just slices of bytes. You can put whatever data you want in them.

How the data is interpreted is up to your code.

Go doesn’t care if string is a valid UTF-8 sequence. Go doesn’t validate if string is as valid UTF-8 sequence.

That being said, Go does include support for working with UTF-8 encoded strings easily.

The behavior of range is one example of that.

Go also mandates that Go source code files are valid UTF-8.

  ↑ ↓ to navigate     ↵ to select     Esc to close