How to use bufio.ScanWords
Asked Answered
P

6

13

How do I use bufio.ScanWords and bufio.ScanLines functions to count words and lines?

I tried:

fmt.Println(bufio.ScanWords([]byte("Good day everyone"), false))

Prints:

5 [103 111 111 100] <nil>

Not sure what that means?

Powerless answered 17/4, 2017 at 10:57 Comment(1)
The Scan* functions of bufio are not meant to be invoked directly. They are instead designed for use as arguments to bufio.Scanner.Split.Propylaeum
P
19

To count words:

input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"
scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)
// Count the words.
count := 0
for scanner.Scan() {
    count++
}
if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)

To count lines:

input := "Spicy jalapeno pastrami ut ham turducken.\n Lorem sed ullamco, leberkas sint short loin strip steak ut shoulder shankle porchetta venison prosciutto turducken swine.\n Deserunt kevin frankfurter tongue aliqua incididunt tri-tip shank nostrud.\n"

scanner := bufio.NewScanner(strings.NewReader(input))
// Set the split function for the scanning operation.
scanner.Split(bufio.ScanLines)
// Count the lines.
count := 0
for scanner.Scan() {
    count++
}
if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "reading input:", err)
}
fmt.Printf("%d\n", count)
Protrude answered 17/4, 2017 at 11:40 Comment(0)
L
4

This is an exercise in book The Go Programming Language Exercise 7.1

This is an extension of @repler solution:

package main

import (
    "bufio"
    "fmt"
    "os"
    "strings"
)

type byteCounter int
type wordCounter int
type lineCounter int

func main() {
    var c byteCounter
    c.Write([]byte("Hello This is a line"))
    fmt.Println("Byte Counter ", c)

    var w wordCounter
    w.Write([]byte("Hello This is a line"))
    fmt.Println("Word Counter ", w)

    var l lineCounter
    l.Write([]byte("Hello \nThis \n is \na line\n.\n.\n"))
    fmt.Println("Length ", l)

}

func (c *byteCounter) Write(p []byte) (int, error) {
    *c += byteCounter(len(p))
    return len(p), nil
}

func (w *wordCounter) Write(p []byte) (int, error) {
    count := retCount(p, bufio.ScanWords)
    *w += wordCounter(count)
    return count, nil
}

func (l *lineCounter) Write(p []byte) (int, error) {
    count := retCount(p, bufio.ScanLines)
    *l += lineCounter(count)
    return count, nil
}

func retCount(p []byte, fn bufio.SplitFunc) (count int) {
    s := string(p)
    scanner := bufio.NewScanner(strings.NewReader(s))
    scanner.Split(fn)
    count = 0
    for scanner.Scan() {
        count++
    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "reading input:", err)
    }
    return
}
Lessielessing answered 10/4, 2020 at 9:10 Comment(0)
P
1

This is an exercise in book The Go Programming Language Exercise 7.1

This is my solution:

package main

import (
    "bufio"
    "fmt"
)

// WordCounter count words
type WordCounter int

// LineCounter count Lines
type LineCounter int

type scanFunc func(p []byte, EOF bool) (advance int, token []byte, err error)

func scanBytes(p []byte, fn scanFunc) (cnt int) {
    for true {
        advance, token, _ := fn(p, true)
        if len(token) == 0 {
            break
        }
        p = p[advance:]
        cnt++
    }
    return cnt
}

func (c *WordCounter) Write(p []byte) (int, error) {
    cnt := scanBytes(p, bufio.ScanWords)
    *c += WordCounter(cnt)
    return cnt, nil
}

func (c WordCounter) String() string {
    return fmt.Sprintf("contains %d words", c)
}

func (c *LineCounter) Write(p []byte) (int, error) {
    cnt := scanBytes(p, bufio.ScanLines)
    *c += LineCounter(cnt)
    return cnt, nil
}

func (c LineCounter) String() string {
    return fmt.Sprintf("contains %d lines", c)
}

func main() {
    var c WordCounter
    fmt.Println(c)

    fmt.Fprintf(&c, "This is an sentence.")
    fmt.Println(c)

    c = 0
    fmt.Fprintf(&c, "This")
    fmt.Println(c)

    var l LineCounter
    fmt.Println(l)

    fmt.Fprintf(&l, `This is another
line`)
    fmt.Println(l)

    l = 0
    fmt.Fprintf(&l, "This is another\nline")
    fmt.Println(l)

    fmt.Fprintf(&l, "This is one line")
    fmt.Println(l)
}
Paraphrase answered 10/4, 2020 at 2:14 Comment(0)
Z
0

bufio.ScanWords and bufio.ScanLines (as well as bufio.ScanBytes and bufio.ScanRunes) are split functions: they provide a bufio.Scanner with the strategy to tokenize its input data – how the process of scanning should split the data. The split function for a bufio.Scanner is bufio.ScanLines by default but can be changed through the method bufio.Scanner.Split.

These split functions are of type SplitFunc:

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

Usually, you won't need to call any of these functions directly; instead, bufio.Scanner will. However, you might need to create your own split function for implementing a custom tokenization strategy. So, let's have a look at its parameters:

  • data: remaining data not processed yet.
  • atEOF: whether or not the caller has reached EOF and therefore has no more new data to provide in the next call.
  • advance: number of bytes the caller must advance the input data for the next call.
  • token: the token to return to the caller as a result of the splitting performed.

To gain further understanding, let's see bufio.ScanBytes implementation:

func ScanBytes(data []byte, atEOF bool) (advance int, token []byte, err error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }
    return 1, data[0:1], nil
}

As long as data isn't empty, it returns a token byte to the caller (data[0:1]) and tells the caller to advance the input data by one byte.

Zootechnics answered 30/12, 2022 at 10:58 Comment(0)
L
0

To explain the output of bufio.ScanWords:

  • The first return value represents the length of bytes in current word(including leading and trailing spaces), say num_bytes, which helps to move to the beginning of the next word, by moving to current_index + num_bytes index.
  • The second return value represents the bytes of the word(with any leading and trailing spaces removed).
  • And the third one represents the error.

Here is a simple program to count the words, using these information:

package main

import (
    "bufio"
    "fmt"
)

func main() {
    var ar []byte = []byte("hello there,       how are ya.. \n And bye")

    num_words := 0

    start := 0
    for num, array, b := bufio.ScanWords(ar[start:], true); ; num, array, b = bufio.ScanWords(ar[start:], true) {
        if b != nil {
            break
        }
        num_words++

        for _, char := range array {
            fmt.Printf("%c", char)
        }
        fmt.Println(" ")

        start += num
        if start >= len(ar) {
            break
        }
    }

    fmt.Println("The number of words is ", num_words)

}

And here is the corresponding output: Output for above's code The second argument seems to specify whether to stop at EOF, here is an output with the second argument set to false. Output with the second argument set to false As you can see, the loop doesn't stop, unless we use num>0 as the condition in the for loop.

I hope this was helpful.

Lalo answered 29/4, 2023 at 8:26 Comment(0)
S
0

I also had trouble deciphering the docstring for this function. Upon closer inspection though, I found a helpful example directly in the documentation:

https://pkg.go.dev/bufio#example-Scanner-Words

Sterling answered 11/8, 2024 at 16:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.