remove null character from string
Asked Answered
E

3

8

I want to check if string is empty and parse the string in time.

Please find the below code

valueStr = strings.Replace(string(valueStr), " ", "", -1)
valueStr = strings.Replace(string(valueStr), "\t", "", -1)
valueStr = strings.Replace(string(valueStr), "\n", "", -1)
valueStr = strings.Replace(string(valueStr), "\r", "", -1)
var re = regexp.MustCompile(`\s`)
valueStr = re.ReplaceAllString(valueStr, "")

if valueStr != "" {
    fmt.Printf("-------- valueStr %c: \n", valueStr)         // o/p =>  -------- valueStr %!c(string= ):
    fmt.Printf("-------- valueStr %#v: \n", valueStr)        // o/p => -------- valueStr "\x00":
    fmt.Printf("-------- valueStr %x: \n", valueStr)         // o/p =>  -------- valueStr 00:
    fmt.Println("-------- valueStr length: ", len(valueStr)) // o/p => -------- valueStr length:  1

    // considering valueStr is not empty, parse string to time

    time, err := time.Parse(TIME_FORMAT, strings.TrimSpace(valueStr))
    if err != nil {
        fmt.Println("-------- Error converting time: ", err) // o/p => -------- Error converting time:  parsing time " " as "15:04:05": cannot parse " " as "15"
        return
    }
} else {
    // another code
}

How to remove this empty character from string? Or check if string contains this empty character?

Ezana answered 21/1, 2019 at 7:40 Comment(0)
L
14

You can remove \x00 runes from a string the same way you can remove any other runes:

valueStr = strings.Replace(valueStr, "\x00", "", -1)

Example:

s := "a\x00b"
fmt.Printf("%q\n", s)
s = strings.Replace(s, "\x00", "", -1)
fmt.Printf("%q\n", s)

Output (try it on the Go Playground):

"a\x00b"
"ab"

Using strings.Replacer

Also note that you can substitute the multiple replaces with a single operation by using strings.Replacer, and it will also be more efficient as it only iterates over the input once (and there will be only one string allocated for the result, no matter how many substrings you want to replace).

For example:

s := " \t\n\rabc\x00"
fmt.Printf("%q\n", s)

r := strings.NewReplacer(" ", "", "\t", "", "\n", "", "\r", "", "\x00", "")
s = r.Replace(s)
fmt.Printf("%q\n", s)

Output (try it on the Go Playground):

" \t\n\rabc\x00"
"abc"

Also note that it's enough to create a string.Replacer once, and you can store it in a (global) variable and reuse it, it is even safe to use it concurrently from multiple goroutines.

Using strings.Map()

Also note that if you only want to replace (remove) single runes and not multi-rune (or multi-byte) substrings, you can also use strings.Map() which might be even more efficient than strings.Replacer.

First define a function that tells which runes to replace (or remove if you return a negative value):

func remove(r rune) rune {
    switch r {
    case ' ', '\t', '\n', '\r', 0:
        return -1
    }
    return r
}

And then using it:

s := " \t\n\rabc\x00"
fmt.Printf("%q\n", s)

s = strings.Map(remove, s)
fmt.Printf("%q\n", s)

Output (try it on the Go Playground):

" \t\n\rabc\x00"
"abc"

Benchmarks

We might think strings.Map() will be superior as it only have to deal with runes which are just int32 numbers, while strings.Replacer have to deal with string values which are headers (length+data pointer) plus a series of bytes.

But we should know that string values are stored as UTF-8 byte sequences in memory, which means strings.Map() have to decode the runes from the UTF-8 byte sequence (and encode the runes back to UTF-8 in the end), while strings.Replacer does not: it may simply look for byte sequence matches without decoding the runes. And strings.Replacer is highly optimized to take advantage of such "tricks".

So let's create a benchmark to compare them:

We'll use these for the benchmarks:

var r = strings.NewReplacer(" ", "", "\t", "", "\n", "", "\r", "", "\x00", "")

func remove(r rune) rune {
    switch r {
    case ' ', '\t', '\n', '\r', 0:
        return -1
    }
    return r
}

And we run benchmarks on different input strings:

func BenchmarkReplaces(b *testing.B) {
    cases := []struct {
        title string
        input string
    }{
        {
            title: "None",
            input: "abc",
        },
        {
            title: "Normal",
            input: " \t\n\rabc\x00",
        },
        {
            title: "Long",
            input: "adsfWR \t\rab\nc\x00 \t\n\rabc\x00asdfWER\n\r",
        },
    }

    for _, c := range cases {
        b.Run("Replacer-"+c.title, func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                r.Replace(c.input)
            }
        })
        b.Run("Map-"+c.title, func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                strings.Map(remove, c.input)
            }
        })
    }

}

And now let's see the benchmark results:

BenchmarkReplaces/Replacer-None-4    100000000   12.3 ns/op    0 B/op  0 allocs/op
BenchmarkReplaces/Map-None-4         100000000   16.1 ns/op    0 B/op  0 allocs/op
BenchmarkReplaces/Replacer-Normal-4  20000000    92.7 ns/op    6 B/op  2 allocs/op
BenchmarkReplaces/Map-Normal-4       20000000    92.4 ns/op   16 B/op  2 allocs/op
BenchmarkReplaces/Replacer-Long-4     5000000   234 ns/op     64 B/op  2 allocs/op
BenchmarkReplaces/Map-Long-4          5000000   235 ns/op     80 B/op  2 allocs/op

Despite expectations, string.Replacer performs pretty good, just as good as strings.Map() due to it not having to decode and encode runes.

Lated answered 21/1, 2019 at 8:25 Comment(7)
@Ezana Added another solution using strings.Map(), see edited answer.Lated
Why is strings.Map() more efficient than strings.Replacer? Is it, because it looks only at specific runes instead of whole substrings?Toussaint
@JonasTepe Yes, checking single runes (which are just int32 numbers) is always more efficient than checking strings, which is a header (length+data pointer) plus a series of bytes.Lated
@Lated Makes sense. Thanks for the explanation.Toussaint
@JonasTepe I added another section comparing performances, strings.Replacer is highly optimized and performs just as good as strings.Map() (due to it not having to decode and encode runes from the UTF-8 byte sequences).Lated
Seems like, both allocate, if there are matches to be replaced. For the new string to be returned.Toussaint
@JonasTepe Yes, they have to. The only case when they could avoid allocation is if the result would be a substring of the input, in which case the input could be sliced and returned. But this check is not built into them (and it's not worth the complexity).Lated
E
0

I don't know if this is your situation, but in my case, I was receiving uint16 slices from Windows Syscalls. In that case, the data is also terminated by null element. To deal with that, you can use the windows package:

package main

import (
   "fmt"
   "golang.org/x/sys/windows"
)

func main() {
   a := []uint16{77, 97, 114, 99, 104, 0}
   s := windows.UTF16ToString(a)
   fmt.Printf("%q\n", s) // "March"
}

https://pkg.go.dev/golang.org/x/sys/windows#UTF16ToString

Elwandaelwee answered 12/1, 2021 at 15:51 Comment(0)
I
0

In current Python (as of november 2021) and under Windows 10, this piece of code worked for me:

s = str.replace(s, "\x00", "", -1)
Injustice answered 27/11, 2021 at 14:38 Comment(2)
don't comments on answer section.Dovev
This is a question about Go :)Kosse

© 2022 - 2024 — McMap. All rights reserved.