range over a string

suggest change

Iterate over bytes

You can iterate over bytes in a string:

s := "a 世"
for i := range s {
	b := s[i]
	fmt.Printf("idx: %d, byte: %d\n", i, b)
}
idx: 0, byte: 97
idx: 1, byte: 32
idx: 2, byte: 228

Iterate over runes

Things are more complicated when you want to iterate over logical characters (runes) in a string:

s := "Hey 世界"
for i, rune := range s {
	fmt.Printf("idx: %d, rune: %d\n", i, rune)
}
idx: 0, rune: 72
idx: 1, rune: 101
idx: 2, rune: 121
idx: 3, rune: 32
idx: 4, rune: 19990
idx: 7, rune: 30028

In Go strings are immutable sequence of bytes. Think a read-only []byte slice.

Each byte is in 0 to 255 range.

There are many more characters in all the world’s alphabets.

Unicode standard defines unique value for every known character. Unicode calls them code points and they are integers that can fit in 32 bits.

To represent Unicode code points, Go has a rune type. It is an alias for int32.

Literal strings in Go source code are UTF-8 encoded.

Every Unicode code point can be encoded with 1 to 4 bytes.

In this form of iteration, Go assumes that a string is UTF-8 encoded. range decodes each code point as UTF-8, returns decoded rune and its byte index in string.

You can see the byte index of last code point jumped by 3 because code point before it represents a Chinese character and required 3 bytes in UTF-8 encoding.

Strings and UTF-8

Go strings are slices of bytes. You can put arbitrary binary data in them.

How the bytes are interpreted is up to your code.

Most of the time a string represents Unicode string in UTF-8 encoding but outside of string literals in Go source code, Go doesn't check or ensure that string data form a valid UTF-8 sequence.

That being said, Go provides functionality for working with UTF-8 encoded data.

The behavior of range is one example of that.

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:



Table Of Contents