Perl printf to use commas as thousands-separator
Asked Answered
L

15

15

Using awk, I can print a number with commas as thousands separators.
(with a export LC_ALL=en_US.UTF-8 beforehand).

awk 'BEGIN{printf("%\047d\n", 24500)}'

24,500

I expected the same format to work with Perl, but it does not:

perl -e 'printf("%\047d\n", 24500)'

%'d

The Perl Cookbook offers this solution:

sub commify {
    my $text = reverse $_[0];
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}

However I am assuming that since the printf option works in awk, it should also work in Perl.

Lengthwise answered 30/10, 2015 at 18:1 Comment(3)
More specifically: perldoc.perl.org/…Lengthwise
However I'm still surprised that the printf ' option doesn't workLengthwise
@ChrisKoknat: See my answer below regarding thatGeometric
G
13

The apostrophe format modifier is a non-standard POSIX extension. The documentation for Perl's printf has this to say about such extensions

Perl does its own "sprintf" formatting: it emulates the C function sprintf(3), but doesn't use it except for floating-point numbers, and even then only standard modifiers are allowed. Non-standard extensions in your local sprintf(3) are therefore unavailable from Perl.

The Number::Format module will do this for you, and it takes its default settings from the locale, so is as portable as it can be

use strict;
use warnings 'all';
use v5.10.1;

use Number::Format 'format_number';

say format_number(24500);

output

24,500
Geometric answered 30/10, 2015 at 18:13 Comment(12)
But is not part of the standard distribution, I think.Bloodroot
@syck: That shouldn't be an obstacleGeometric
Depends on how portable you want to be.Bloodroot
@syck: It's not a portability issue if someone would prefer not to install a module, only if that module won't run on a given platformGeometric
The Number::Format includes, "require 5.010;" which has been EOL'd about 10 years ago. I'm not sure there's an advantage to using an outdated CPAN module.Rhinoscopy
@Hold: That's a very strange conclusion! There is no end of lining in Perl 5: the team go to great lengths to ensure as much backward-compatibility as possible with every release. Sadly, because of this, there are still several installations of v5.8 out there. The use of require 5.010 bears no relation to how "up to date" the module might be, and if every module had require 5.026 there would be an outcry. It is in the author's interests to put as few restrictions as possible on the version of perl it requires. It is foolish to suggest that any module using require 5.010 is "outdated".Geometric
cpan.org/src "5.10 5.10.1 End of life 2009-08-23" (this is listed among the issue tickets on the CPAN page itself for that module)Rhinoscopy
@Hold: It would be as well to understand what you're reading before you regurgitate it. Here, "End of life" means simply that v5.10.1 is the last iteration of v5.10 to be released. 23-Aug-2009 is the date of that release. It would be very odd to kill off a version on the same day as releasing a new version, and this happens to every release! Regardless, there is still no value in your assertion that Number::Format is "outdated" because it requires Perl v5.10 or later to run. What you are saying is nonsense.Geometric
What I mean is -- The documentation says (your link) [Requires] "Perl, version 5.8 or higher," but the code has "require 5.010." This change must be at least ten years old (otherwise why not use something more modern?); but it hasn't been updated. (That is the advantage of using other libraries, but it's not an advantage if it's not updated. This problem isn't just your listed lib, but quite a few on CPAN.)Rhinoscopy
@Hold: How does a documentation error make a module out of date? If you're saying that the source can't have been modified for ten years because there is a require 5.010 in there then that's ridiculous. As I tried to explain, ideally there would be no such requirement at all, and if it must be there then the lower the version the better. "otherwise why not use something more modern?" makes no sense at all. Do you know anything about Perl at all?Geometric
@Rhinoscopy require 5.010 means "the following code requires Perl 5.10.0 or newer". If the module author were to update that line to something newer, developers who are stuck on older versions of Perl would not be able to use the module, even though there is no good reason for that limitation. The latest release of Number::Format is from 2015 - but even "old" code may still work perfectly today! The (CPAN Testers Matrix)[matrix.cpantesters.org/?dist=Number-Format+1.75] shows that the module runs just fine on Perl 5.10 through 5.28, so there is no need to worry.Kingsize
Correct link: CPAN Testers MatrixKingsize
B
10

A more perl-ish solution:

$a = 12345678;                 # no comment
$b = reverse $a;               # $b = '87654321';
@c = unpack("(A3)*", $b);      # $c = ('876', '543', '21');
$d = join ',', @c;             # $d = '876,543,21';
$e = reverse $d;               # $e = '12,345,678';
print $e;

outputs 12,345,678.

Bloodroot answered 30/10, 2015 at 18:7 Comment(4)
Tested, agreed and removed. perldoc is a bit misleading on that.Bloodroot
@syck: You should raise a bug report if you think soGeometric
It is not wrong, only easy to misunderstand when read on the fly.Bloodroot
This code completely breaks with floating point values. Example: 12345678.98 becomes 12,345,678,.98.Honora
T
8

I realize this question was from almost 4 years ago, but since it comes up in searches, I'll add an elegant native Perl solution I came up with. I was originally searching for a way to do it with sprintf, but everything I've found indicates that it can't be done. Then since everyone is rolling their own, I thought I'd give it a go, and this is my solution.

$num = 12345678912345; # however many digits you want
while($num =~ s/(\d+)(\d\d\d)/$1\,$2/){};
print $num;

Results in:

12,345,678,912,345

Explanation: The Regex does a maximal digit search for all leading digits. The minimum number of digits in a row it'll act on is 4 (1 plus 3). Then it adds a comma between the two. Next loop if there are still 4 digits at the end (before the comma), it'll add another comma and so on until the pattern doesn't match.

If you need something safe for use with more than 3 digits after the decimal, use this modification: (Note: This won't work if your number has no decimal)

while($num =~ s/(\d+)(\d\d\d)([.,])/$1\,$2$3/){};

This will ensure that it will only look for digits that ends in a comma (added on a previous loop) or a decimal.

Throat answered 25/9, 2019 at 18:50 Comment(1)
I tweaked yours a bit to work for both in a single regex (and gave you a +1, of course!): https://mcmap.net/q/757764/-perl-printf-to-use-commas-as-thousands-separatorClosehauled
L
7

Most of these answers assume that the format is universal. It isn't. CLDR uses Unicode information to figure it out. There's a long thread in How to properly localize numbers?.

CPAN has the CLDR::Number module:

#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);

my $locale = $ARGV[0] // 'en';

my @numbers = qw(
    123
    12345
    1234.56
    -90120
    );

my $cldr = CLDR::Number->new( locale => $locale );

my $decf = $cldr->decimal_formatter;

foreach my $n ( @numbers ) {
    say $decf->format($n);
    }

Here are a few runs:

$ perl comma.pl
123
12,345
1,234.56
-90,120

$ perl comma.pl es
123
12.345
1234,56
-90.120

$ perl comma.pl bn
১২৩
১২,৩৪৫
১,২৩৪.৫৬
-৯০,১২০

It seems heavyweight, but the output is correct and you don't have to allow the user to change the locale you want to use. However, when it's time to change the locale, you are ready to go. I also prefer this to Number::Format because I can use a locale that's different from my local settings for my terminal or session, or even use multiple locales:

#!perl
use v5.10;
use CLDR::Number;
use open qw(:std :utf8);

my @locales = qw( en pt bn );

my @numbers = qw(
    123
    12345
    1234.56
    -90120
    );


my @formatters = map {
    my $cldr = CLDR::Number->new( locale => $_ );
    my $decf = $cldr->decimal_formatter;
    [ $_, $cldr, $decf ];
    } @locales;

printf "%10s %10s %10s\n" . '=' x 32 . "\n", @locales;

foreach my $n ( @numbers ) {
    printf "%10s %10s %10s\n",
        map { $_->[-1]->format($n) } @formatters;
    }

The output has three locales at once:

        en         pt         bn
================================
       123        123        ১২৩
    12,345     12.345     ১২,৩৪৫
  1,234.56   1.234,56   ১,২৩৪.৫৬
   -90,120    -90.120    -৯০,১২০
Longevous answered 30/1, 2020 at 11:56 Comment(0)
L
5

1 liner: Use a little loop with a regex:

while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}

Example:

use strict;
use warnings;

my @numbers = (
    12321,
    12.12,
    122222.3334,
    '1234abc',
    '1.1',
    '1222333444555,666.77',
);
for (@numbers) {
    my $number = $_;
    while ($number =~ s/^(\d+)(\d{3})/$1,$2/) {}
    print "$_  ->  $number\n";
}

Output:

12321  ->  12,321
12.12  ->  12.12
122222.3334  ->  122,222.3334
1234abc  ->  1,234abc
1.1  ->  1.1
1222333444555,666.77  ->  1,222,333,444,555,666.77


Pattern:

(\d+)(\d{3})
    -> Take all numbers but the last 3 in group 1
    -> Take the remaining 3 numbers in group 2 on the beginning of $number
    -> Followed is ignored

Substitution

$1,$2
    -> Put a separator sign (,) between group 1 and 2
    -> The rest remains unchanged

So if you have 12345.67 the numbers the regex uses are 12345. The '.' and all followed is ignored.

1. run (12345.67):
  -> matches: 12345
  -> group 1: 12,
     group 2: 345
  -> substitute 12,345
  -> result: 12,345.67
2. run (12,345.67):
  -> does not match!
  -> while breaks.
Libido answered 27/9, 2021 at 13:16 Comment(0)
D
5

Here's an elegant Perl solution I've been using for over 20 years :)

1 while $text =~ s/(.*\d)(\d\d\d)/$1\.$2/g;

And if you then want two decimal places:

$text = sprintf("%0.2f", $text);
Derwon answered 16/10, 2021 at 15:13 Comment(1)
One issue with the leading .* in s/(.*\d)(\d\d\d)/$1\.$2/g is that it would format matching non-numbers such as astronomical objects, but if you stay down on Earth that is not a problem. And one minor flaw detrimental to the elegance of the solution is that the backslash in the replacement \. is not necessary.Saltillo
C
2

Parting from @Laura's answer, I tweaked the pure perl, regex-only solution to work for numbers with decimals too:

while ($formatted_number =~ s/^(-?\d+)(\d{3}(?:,\d{3})*(?:\.\d+)*)$/$1,$2/) {};

Of course this assumes a "," as thousands separator and a "." as decimal separator, but it should be trivial to use variables to account for that for your given locale(s).

Closehauled answered 14/2, 2021 at 7:2 Comment(0)
C
1

I used the following but it does not works as of perl v5.26.1

sub format_int
{
        my $num = shift;
        return reverse(join(",",unpack("(A3)*", reverse int($num))));
}

The form that worked for me was:

sub format_int
{
        my $num = shift;
        return scalar reverse(join(",",unpack("(A3)*", reverse int($num))));
}

But to use negative numbers the code must be:

sub format_int
{
    if ( $val >= 0 ) {
        return scalar reverse join ",", unpack( "(A3)*", reverse int($val) );
    } else {
        return "-" . scalar reverse join ",", unpack( "(A3)*", reverse int(-$val) );
    }

}
Consumedly answered 6/1, 2020 at 16:49 Comment(0)
M
1

Did somebody say Perl?

perl -pe '1while s/(\d+)(\d{3})/$1,$2/'

This works for any integer.

Malleus answered 23/2, 2021 at 4:20 Comment(0)
L
1

Most of these solutions fail with real numbers that have a decimal fraction or one that is longer than three decimal digits; or they are overly complex. Here are two solutions, both use the /e perl regular expression modifier in s/PATTERN/CODE/e syntax. The idea is to extract with grouping to $1 the integer component leaving the decimal fraction untouched, and then use either format_number() from Number::Format, or a regex in the CODE part of s///, to wit:

use v5.14;
use strict;
use Number::Format 'format_number';

my $nbr = 'This is a real number, 834569.334656';
(my $res = $nbr) =~ s{ (\d+) }{ format_number($1) }xe;
say $res;

or:

(my $res = $nbr) =~ s{ (\d+) }{
    $1 =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
say $res;

Running either of these fragments yields:

This is a real number, 834,569.334656

/xrg is: x = ignore white-space; r = return substitution and leave original string untouched, necessary because $1 is immutable; g = replace globally.

Lewison answered 12/9, 2023 at 22:55 Comment(0)
G
0
# turning above answer into a function

sub format_float
# returns number with commas..... and 2 digit decimal
# so format_float(12345.667) returns "12,345.67"
{
        my $num = shift;
        return reverse(join(",",unpack("(A3)*", reverse int($num)))) . sprintf(".%02d",int(100*(.005+($num - int($num)))));
}

sub format_int
# returns number with commas.....
# so format_int(12345.667) returns "12,345"
{
        my $num = shift;
        return reverse(join(",",unpack("(A3)*", reverse int($num))));
}
Guinness answered 15/2, 2019 at 19:25 Comment(1)
Welcome to StackOverflow! A code-only answer is not recommended, so to make your answer more relevant I recommend you to explain what your code does.Guanine
P
0

With modern Perls:

$commafied = scalar reverse (reverse($number) =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/gr);

s/.../.../r is "non destructive" substitution, returning the modified string as the result.

Prop answered 2/9, 2023 at 19:12 Comment(0)
L
-1

I wanted to print numbers it in a currency format. If it turned out even, I still wanted a .00 at the end. I used the previous example (ty) and diddled with it a bit more to get this.

    sub format_number {
            my $num = shift;
            my $result;
            my $formatted_num = ""; 
            my @temp_array = (); 
            my $mantissa = ""; 
            if ( $num =~ /\./ ) { 
                    $num = sprintf("%0.02f",$num);
                    ($num,$mantissa) = split(/\./,$num);
                    $formatted_num = reverse $num;
                    @temp_array = unpack("(A3)*" , $formatted_num);
                    $formatted_num = reverse (join ',', @temp_array);
                    $result = $formatted_num . '.'. $mantissa;
            } else {
                    $formatted_num = reverse $num;
                    @temp_array = unpack("(A3)*" , $formatted_num);
                    $formatted_num = reverse (join ',', @temp_array);
                    $result = $formatted_num . '.00';
            }   
            return $result;
    }
    # Example call
    # ...
    printf("some amount = %s\n",format_number $some_amount);

I didn't have the Number library on my default mac OS X perl, and I didn't want to mess with that version or go off installing my own perl on this machine. I guess I would have used the formatter module otherwise.

I still don't actually like the solution all that much, but it does work.

Ludicrous answered 8/11, 2017 at 14:41 Comment(0)
E
-1

This is good for money, just keep adding lines if you handle hundreds of millions.

sub commify{
    my $var = $_[0];
    #print "COMMIFY got $var\n"; #DEBUG
    $var =~ s/(^\d{1,3})(\d{3})(\.\d\d)$/$1,$2$3/;
    $var =~ s/(^\d{1,3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3$4/;
    $var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4$5/;
    $var =~ s/(^\d{1,3})(\d{3})(\d{3})(\d{3})(\d{3})(\.\d\d)$/$1,$2,$3,$4,$5$6/;
    #print "COMMIFY made $var\n"; #DEBUG
    return $var;
}
Entropy answered 1/5, 2018 at 1:38 Comment(0)
C
-2

A solution that produces a localized output:

# First part - Localization
my ( $thousands_sep, $decimal_point, $negative_sign );
BEGIN {
        my ( $l );
        use POSIX qw(locale_h);
        $l = localeconv();

        $thousands_sep = $l->{ 'thousands_sep' };
        $decimal_point = $l->{ 'decimal_point' };
        $negative_sign = $l->{ 'negative_sign' };
}

# Second part - Number transformation
sub readable_number {
        my $val = shift;

        #my $thousands_sep = ".";
        #my $decimal_point = ",";
        #my $negative_sign = "-";

        sub _readable_int {
                my $val = shift;
                # a pinch of PERL magic
                return scalar reverse join $thousands_sep, unpack( "(A3)*", reverse $val );
        }

        my ( $i, $d, $r );
        $i = int( $val );
        if ( $val >= 0 ) {
                $r =  _readable_int( $i );
        } else {
                $r = $negative_sign . _readable_int( -$i );
        }
        # If there is decimal part append it to the integer result
        if ( $val != $i ) {
                ( undef, $d ) = ( $val =~ /(\d*)\.(\d*)/ );
                $r = $r . $decimal_point . $d;
        }

        return $r;
}

The first part gets the symbols used in the current locale to be used on the second part.
The BEGIN block is used to calculate the sysmbols only once at the beginning.
If for some reason there is need to not use POSIX locale, one can ommit the first part and uncomment the variables on the second part to hardcode the sysmbols to be used ($thousands_sep, $thousands_sep and $thousands_sep)

Consumedly answered 7/1, 2020 at 19:10 Comment(1)
BEGIN blocks are not for executing something "only once at the beginning". They are meant to be executed at compile time. If you don't need the block to be executed at compile time, it's not necessary at all (like in this case).Honora

© 2022 - 2024 — McMap. All rights reserved.