Integer division in awk

Asked 13/2, 2013 at 16:27 Answered 30/5, 2024 at 20:55

I want to divide two numbers in awk, using integer division, i.e truncating the result. For example

k = 3 / 2
print k

should print 1

According to the manual,

Division; because all numbers in awk are floating-point numbers, the result is not rounded to an integer

Is there any workaround to get an integer value?

The reason is that I want to get the middle element of an array with integer indexes [0 to num-1].

Yamashita answered 13/2, 2013 at 16:27 Comment(0)

Use the int function to get the integer part of the result, truncated toward 0. This produces the nearest integer to the result, located between the result and 0. For example, int(3/2) is 1, int(-3/2) is -1.

Source: The AWK Manual - Numeric Functions

Singlehearted answered 13/2, 2013 at 16:28 Comment(1)

This doesn't count as integer division, integer division has a strict guarantee based on remainder. This is just "truncation" – Killifish 10/11, 2019 at 0:6

In simple cases, you can safely use int() which truncates towards zero¹:

awk 'BEGIN { print int(3 / 2) }'    # prints 1
gawk 'BEGIN { print int(-3 / 2) }'  # prints -1; not guaranteed in POSIX awk

Keep in mind that awk always uses double-precision floating point numbers² and floating-point arithmetic³, though. The only way you can get integers and integer arithmetic is to use strings and either roll your own integer arithmetic (see another answer), or call external tools, e.g. the standard expr utility:

awk 'BEGIN { "expr 3 / 2" | getline result; print result; }'    # prints 1

This is really awkward, long, slow, … but safe and portable.

^{¹ In POSIX awk, truncation to zero is guaranteed only for positive arguments: int(x) — Return the argument truncated to an integer. Truncation shall be toward 0 when x>0. GNU awk (gawk) uses truncation toward zero even for negative numbers: int(x) — Return the nearest integer to x, located between x and zero and truncated toward zero. For example, int(3) is 3, int(3.9) is 3, int(-3.9) is -3, and int(-3) is -3 as well.}
^{² Numeric expressions are specified as double-precision floats in Expressions in awk in POSIX.}
^{³ All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard (see Concepts Derived from the ISO C Standard). — POSIX awk: Arithmetic functions}

If you choose to use floats, you should know about their quirks and be ready to spot them and avoid related bugs. Several scary examples:

Unrepresentable numbers:

awk 'BEGIN { x = 0.875; y = 0.425; printf("%0.17g, %0.17g\n", x, y) }'
# prints 0.875, 0.42499999999999999

Round-off errors accumulation:

awk 'BEGIN{s=0; for(i=1;i<=100000;i++)s+=0.3; printf("%.10f, %d\n",s,int(s))}'
# prints 29999.9999999506, 29999

Round-off errors ruin comparisons:

awk 'BEGIN { print (0.1 + 12.2 == 12.3) }'    # prints 0

Precision decreases with magnitude, causing infinite loops:

awk 'BEGIN { for (i=10^16; i<10^16+5; i++) printf("%d\n", i) }'
# prints 10000000000000000 infinitely many times

Read more on how floats work:

Stack Overflow tags floating-point wiki
Wikipedia article Floating point
GNU awk arbitrary precision arithmetic – contains both info on the specific implementation and general knowledge

Poole answered 19/2, 2014 at 0:12 Comment(9)

I guess a way to overcome the floating errors (for relatively small numbers) would be to do int(3/2+0.25). – Yamashita 25/3, 2014 at 8:32

@Yamashita Adding a constant does not solve the problem, it actually adds a new one. awk 'BEGIN {print int(7/8), int (7/8 + 0.25)}' produces 0 1. – Poole 25/3, 2014 at 12:42

Yes it would have to be less than 1/(d/2), where d is the denominator. As long as this value is larger than the floating error, it should work. – Yamashita 25/3, 2014 at 13:12

@Poole : The only way you can get integers and integer arithmetic is to use external tools, e.g. the standard expr utility: this statement is very much false - see my answer below. – Daimyo 30/5, 2024 at 21:2

@RAREKpopManifesto, I omitted that possibility because it takes lots of error-prone code and awk is usually used for simple one-shot scripts. I included the link to your answer, though, as the possibility is actually there. – Poole 31/5, 2024 at 16:30

@Poole : did you spot any bugs in my 3 functions ? Most of that extra code is really to safely handle division by zero (plus short circuit handling for dividing by ONE, since there's no point to actually divide), which gawk gives fatal errors instead of simply returning NaN or inf. And the upside of my functions is that the only times it calls int(…) are up front pre-truncation of operands. Nowhere along the actual division code does it require external funcs (trunc, floor, or ceiling), since everything is performed strictly in integer realm. – Daimyo 3/6, 2024 at 17:31

@Poole : another advantage of those functions is that the inputs can literally come in as ANY shape or form - integer, floating point, or numeric strings, and the same code handles them all seamlessly. Literally the only requirement for numeric strings is that the digits must be ASCII, that's all. On awks that support it, you can even directly feed it hex strings and it'll divide them just fine. – Daimyo 3/6, 2024 at 17:38

@Poole : not to mention that in expr, division/modulo by zero are still fatal errors instead of properly returning INF / NAN, and that expr cannot auto integer-truncate operands on your behalf. – Daimyo 3/6, 2024 at 17:53

My point is that I do not want to go through the code, @RAREKpopManifesto. :-) Infinities and NaN may be useful, but they are usually used only with floats, not integers. – Poole 4/6, 2024 at 19:1

Safe and quick awk integer division can be done with:

q=(n-n%d)/d+(n<0)

Forearm answered 4/10, 2014 at 18:41 Comment(4)

+1. Clever trick, I'll give you that. I wonder if it safe for precision errors, as @Poole explains... – Yamashita 5/10, 2014 at 15:34

+1 This implements a ceil type of rounding (toward +inf). Other mathematically correct mod concepts and consequently rounding methods do exist. Some other method may be preferable. – Ulceration 22/11, 2016 at 20:17

@sorontar At least in my system, this is not equivalent to a ceil function – Bioluminescence 11/10, 2017 at 6:36

Hm: Using Gawk 5.1 gawk -v n=-1 -v d=1 'BEGIN{print(n-n%d)/d+(n<0)}' gives 0, bash -c 'echo $((-1/1))' of course prints -1. – Gatto 21/6, 2022 at 16:28

for what it's worth, I have these 3 functions in my personal library to, one each for

Truncated (BAU awk approach),
Floored, and
Euclidean division

The 3 functions are cross-dependent so to avoid re-inventing the wheel 3 times. Inputs can be in any format - integer, floating point, or numeric strings.

Both dividend and divisor are pre-truncated before any division occurs.

With gawk -M (bigint via GMP), these functions offer UNLIMITED division precision without needing to set the PREC parameter. Without GMP, precision offered is the standard 53-bits of double precision FP underlying all of awk.

function divmod_trunc(___, _, __) {

    return \
    (__ = _ = int(_)) == (_ = !!_) \
        ? (+(__ = "=%_/=") < _ \
            ? ((!_)__) ___ \
            : ERRNO = (_ = "NAN")__ (! (___ = +int(substr(___,
                   _ = (__ = _)^(_ < _), -(_++) + _^_^_) ".")) \
                       ? __ : substr("-INF", _ - (___ < !_)))) \
        : (_ = (___ = \
            int(___)) % __) "=%_/=" (___ - _) /__
}
function divmod_floor(__, _, ___) {

    return \
    ((__ = int(__)) < !!__) == ((_ = int(_)) < !!_) || !_ \
        ? divmod_trunc(__, _) \
        : (__ - _ * (___ = (\
           __ - (___ = __ % _)) / _ - !!_)) ("=%_/=")___
}
function divmod_euclid(__, _, ___) {

    return \
    (___ = (_ = int(_)) == !!_ ||
         -(__ = int(__)) < __) || __*_ < !_ \
        ? (___  ? divmod_trunc(__, _) \
                : divmod_floor(__, _)) \
        : ((___ = (__ - (\
                   __ %= _)) / _)^!_ * __ - _) "=%_/=" (++___)
}

Since awk lack tuples as a return type, these functions attempt to emulate that effect by simultaneously returning both remainder and quotient as a "string-connected tuple", in the format

REMAINDER=%_/=QUOTIENT

All zeros are treated as unsigned. Division by zero return unsigned NAN as the remainder, and one of NAN, INF, or -INF as its quotient.

This pair of primes will showcase their differences :

 468888899996789 /  23456789

TRUNC ::   2701014=%_/=19989475
FLOOR ::   2701014=%_/=19989475
ECLID ::   2701014=%_/=19989475

-468888899996789 /  23456789

TRUNC ::  -2701014=%_/=-19989475
FLOOR ::  20755775=%_/=-19989476
ECLID ::  20755775=%_/=-19989476

 468888899996789 / -23456789

TRUNC ::   2701014=%_/=-19989475
FLOOR :: -20755775=%_/=-19989476
ECLID ::   2701014=%_/=-19989475

-468888899996789 / -23456789

TRUNC ::  -2701014=%_/=19989475
FLOOR ::  -2701014=%_/=19989475
ECLID ::  20755775=%_/=19989476

These functions are fully POSIX-compliant and works on all awks. No numbers are hard-coded at all in these functions since all numeric constants and offsets required by the functions are generated on the fly as part of input cleansing.

Daimyo answered 30/5, 2024 at 20:55 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags