How do I use bufio.ScanWords
and bufio.ScanLines
functions to count words and lines?
I tried:
fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))
Prints:
5 [103 111 111 100] <nil>
Not sure what that means?
How do I use bufio.ScanWords
and bufio.ScanLines
functions to count words and lines?
I tried:
fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))
Prints:
5 [103 111 111 100] <nil>
Not sure what that means?
To count words:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)
// Count the words.
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)
To count lines:
input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanLines)
// Count the lines.
count := 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)
This is an exercise in book The Go Programming Language Exercise 7.1
This is an extension of @repler solution:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
type byteCounter int
type wordCounter int
type lineCounter int
func main() {
var c byteCounter
c.Write([]byte("Hello This is a line"))
fmt.Println("Byte Counter ", c)
var w wordCounter
w.Write([]byte("Hello This is a line"))
fmt.Println("Word Counter ", w)
var l lineCounter
l.Write([]byte("Hello \nThis \n is \na line\n.\n.\n"))
fmt.Println("Length ", l)
}
func (c *byteCounter) Write(p []byte) (int, error) {
*c += byteCounter(len(p))
return len(p), nil
}
func (w *wordCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanWords)
*w += wordCounter(count)
return count, nil
}
func (l *lineCounter) Write(p []byte) (int, error) {
count := retCount(p, bufio.ScanLines)
*l += lineCounter(count)
return count, nil
}
func retCount(p []byte, fn bufio.SplitFunc) (count int) {
s := string(p)
scanner := bufio.NewScanner(strings.NewReader(s))
scanner.Split(fn)
count = 0
for scanner.Scan() {
count++
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading input:", err)
}
return
}
This is an exercise in book The Go Programming Language Exercise 7.1
This is my solution:
package main
import (
"bufio"
"fmt"
)
// WordCounter count words
type WordCounter int
// LineCounter count Lines
type LineCounter int
type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)
func scanBytes(p []byte, fn scanFunc) (cnt int) {
for true {
advance, token, _ := fn(p, true)
if len(token) == 0 {
break
}
p = p[advance:]
cnt++
}
return cnt
}
func (c *WordCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanWords)
*c += WordCounter(cnt)
return cnt, nil
}
func (c WordCounter) String() string {
return fmt.Sprintf("contains %d words", c)
}
func (c *LineCounter) Write(p []byte) (int, error) {
cnt := scanBytes(p, bufio.ScanLines)
*c += LineCounter(cnt)
return cnt, nil
}
func (c LineCounter) String() string {
return fmt.Sprintf("contains %d lines", c)
}
func main() {
var c WordCounter
fmt.Println(c)
fmt.Fprintf(&c, "This is an sentence.")
fmt.Println(c)
c = 0
fmt.Fprintf(&c, "This")
fmt.Println(c)
var l LineCounter
fmt.Println(l)
fmt.Fprintf(&l, `This is another
line`)
fmt.Println(l)
l = 0
fmt.Fprintf(&l, "This is another\nline")
fmt.Println(l)
fmt.Fprintf(&l, "This is one line")
fmt.Println(l)
}
bufio.ScanWords
and bufio.ScanLines
(as well as bufio.ScanBytes
and bufio.ScanRunes
) are split functions: they provide a bufio.Scanner
with the strategy to tokenize its input data – how the process of scanning should split the data. The split function for a bufio.Scanner
is bufio.ScanLines
by default but can be changed through the method bufio.Scanner.Split
.
These split functions are of type SplitFunc
:
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Usually, you won't need to call any of these functions directly; instead, bufio.Scanner
will. However, you might need to create your own split function for implementing a custom tokenization strategy. So, let's have a look at its parameters:
data
: remaining data not processed yet.atEOF
: whether or not the caller has reached EOF and therefore has no more new data to provide in the next call.advance
: number of bytes the caller must advance the input data for the next call.token
: the token to return to the caller as a result of the splitting performed.To gain further understanding, let's see bufio.ScanBytes
implementation:
func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
if atEOF && len(data) == 0 {
return 0, nil, nil
}
return 1, data[0:1], nil
}
As long as data
isn't empty, it returns a token byte to the caller (data[0:1]
) and tells the caller to advance the input data by one byte.
To explain the output of bufio.ScanWords:
Here is a simple program to count the words, using these information:
package main
import (
"bufio"
"fmt"
)
func main() {
var ar []byte = []byte("hello there, how are ya.. \n And bye")
num_words := 0
start := 0
for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
if b != nil {
break
}
num_words++
for _, char := range array {
fmt.Printf("%c", char)
}
fmt.Println(" ")
start += num
if start >= len(ar) {
break
}
}
fmt.Println("The number of words is ", num_words)
}
And here is the corresponding output: Output for above's code The second argument seems to specify whether to stop at EOF, here is an output with the second argument set to false. Output with the second argument set to false As you can see, the loop doesn't stop, unless we use num>0 as the condition in the for loop.
I hope this was helpful.
I also had trouble deciphering the docstring for this function. Upon closer inspection though, I found a helpful example directly in the documentation:
© 2022 - 2025 — McMap. All rights reserved.
Scan*
functions ofbufio
are not meant to be invoked directly. They are instead designed for use as arguments tobufio.Scanner.Split
. – Propylaeum