Convert human readable to bytes in bash
Asked Answered
M

5

25

So I am trying to analyze very large log files in linux and I have seen plenty of solutions for the reverse of this, but the program that records the data doesn't allow for output formatting therefore it only outputs in human readable format (I know, what a pain). So the question is: How can I convert human readable to bytes using something like awk:

So converting this:

937
1.43K
120.3M

to:

937
1464
126143693

I can afford and I expect some rounding errors.

Thanks in advance.

P.S. Doesn't have to be awk as long as it can provide in-line conversions.

I found this but the awk command given doesn't appear to work correctly. It outputs something like 534K"0".

I also found a solution using sed and bc, but because it uses bc it has limited effectiveness meaning it only can use one column at a time and all the data has to be appropriate for bc or else it fails.

sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc

Mendymene answered 29/10, 2014 at 2:1 Comment(3)
Check out this answer https://mcmap.net/q/537832/-unformat-disk-size-stringsCayla
@amdn, thanks I actually found something similar to that and made an edit. The only problem with that solution is it uses bc, so it can't really analyze a full log file too well. It can work on a single column of data that is all the same type.Mendymene
At the bottom of that answer there is a "one liner" that doesn't use bcCayla
S
8
$ cat dehumanise 
937
1.43K
120.3M

$ awk '/[0-9]$/{print $1;next};/[mM]$/{printf "%u\n", $1*(1024*1024);next};/[kK]$/{printf "%u\n", $1*1024;next}' dehumanise
937
1464
126143692
Sapling answered 29/10, 2014 at 2:25 Comment(5)
Thanks! This works on only one column as well but it seems much more reliable than using the bc method.Mendymene
@Devon: Heh. Present some actual data and you may get an actual solution? :)Sapling
I accepted this since it works well. I had to test it a little further before accepting it. All I had to do was add awk {'print $2'} | (depending on the column) before hand to work on a different column and for my analysis, analyzing one column at a time works fine.Mendymene
You do not need ; after }. The last next is also not needed since its already at last part of the code. The other next may be removed too for this simple code, so this should do: awk '/[0-9]$/{print $1} /[mM]$/{printf "%u\n", $1*(1024*1024)} /[kK]$/{printf "%u\n", $1*1024}' fileEvelynneven
This doesn't do gigabytes.Bathometer
P
44

Use numfmt --from=iec from GNU coreutils.

Parity answered 28/1, 2019 at 15:14 Comment(4)
the best anwserHeddi
It would be the best answer but numfmt sadly doesn't handle float formatted inputs: e.g. numfmt --to iec 1.43K will give you numfmt: invalid suffix in input: ‘1.43K’. (Tested with coreutils 9.0 on macOS 11.6.4 20G417 x86_64).Dipteral
Do not you have a mistake in using --to instead of --from?Parity
Note: if you convert from Gib, Mib, and so on, then --from=iec-i (e.g. 1.62Gi)Veda
A
20

Here's a function that understands binary and decimal prefixes and is easily extendable for large units should there be a need:

dehumanise() {
  for v in "${@:-$(</dev/stdin)}"
  do  
    echo $v | awk \
      'BEGIN{IGNORECASE = 1}
       function printpower(n,b,p) {printf "%u\n", n*b^p; next}
       /[0-9]$/{print $1;next};
       /K(iB)?$/{printpower($1,  2, 10)};
       /M(iB)?$/{printpower($1,  2, 20)};
       /G(iB)?$/{printpower($1,  2, 30)};
       /T(iB)?$/{printpower($1,  2, 40)};
       /KB$/{    printpower($1, 10,  3)};
       /MB$/{    printpower($1, 10,  6)};
       /GB$/{    printpower($1, 10,  9)};
       /TB$/{    printpower($1, 10, 12)}'
  done
} 

example:

$ dehumanise 2K 2k 2KiB 2KB 
2048
2048
2048
2000

$ dehumanise 2G 2g 2GiB 2GB 
2147483648
2147483648
2147483648
2000000000

The suffixes are case-insensitive.

Allocution answered 25/7, 2015 at 9:55 Comment(1)
Adapted this to JavaScript: gist.github.com/lanqy/5193417#gistcomment-3253220Berna
S
8
$ cat dehumanise 
937
1.43K
120.3M

$ awk '/[0-9]$/{print $1;next};/[mM]$/{printf "%u\n", $1*(1024*1024);next};/[kK]$/{printf "%u\n", $1*1024;next}' dehumanise
937
1464
126143692
Sapling answered 29/10, 2014 at 2:25 Comment(5)
Thanks! This works on only one column as well but it seems much more reliable than using the bc method.Mendymene
@Devon: Heh. Present some actual data and you may get an actual solution? :)Sapling
I accepted this since it works well. I had to test it a little further before accepting it. All I had to do was add awk {'print $2'} | (depending on the column) before hand to work on a different column and for my analysis, analyzing one column at a time works fine.Mendymene
You do not need ; after }. The last next is also not needed since its already at last part of the code. The other next may be removed too for this simple code, so this should do: awk '/[0-9]$/{print $1} /[mM]$/{printf "%u\n", $1*(1024*1024)} /[kK]$/{printf "%u\n", $1*1024}' fileEvelynneven
This doesn't do gigabytes.Bathometer
M
8

Python tools exist

$pip install humanfriendly  # Also available as a --user install in ~/.local/bin

$humanfriendly --parse-size="2 KB"
2000
$humanfriendly --parse-size="2 KiB"
2048
Myasthenia answered 22/9, 2017 at 21:30 Comment(0)
M
1

awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}'

This is a modification on @starfry's answer.


Let's break it down:

function pp(p) { printf "%u\n", $0 * 1024^p }

Define a function named pp that takes a single parameter p and prints the $0 multiplied by 1024 raised to the p-th power. The %u will print the unsigned decimal integer of that number.

/[0-9]$/ { print $0 }

Match lines that end with a digit (the $ matches the end of the line), then run the code inside the { and }. Print the entire line ($0)

/K$/ { pp(1) }

Match lines that end with the capital letter K, call the function pp() and pass 1 to it (p == 1). NOTE: When $0 (e.g. "1.43K") is used in a math equation only the beginning numbers (i.e. "1.43") will be used below. Example with $0 = "1.43K"

$0 * 1024^p == 1.43K * 1024^1 == 1.43K * 1024 = 1.43 * 1024 = 1464.32

/M$/ { pp(2) }

Match lines that end with the capital letter M, call the function pp() and pass 2 to it (p == 2). Example with $0 == "120.3M"

$0 * 1024^p == 120.3M * 1024^2 == 120.3M * 1024^2 == 120.3M * 1024*1024 = 120.3 * 1048576 = 126143692.8

etc... for G and T

/[^0-9KMGT]$/ { print 0 }

Lines that do not end with a digit or the capital letters K, M, G, or T print "0".


Example:

$ cat dehumanise
937
1.43K
120.3M
5G
933G
12.2T
bad
<>

Results:

$ awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}' dehumanise
937
1464
126143692
5368709120
1001801121792
13414041858867
0
0
Martingale answered 14/10, 2018 at 4:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.