"Pathname can't be converted from UTF-8 to current locale" warning with Libarchive::Read module - McMap

About

"Pathname can't be converted from UTF-8 to current locale" warning with Libarchive::Read module

Asked 18/1, 2023 at 4:53 Answered 18/1, 2023 at 17:26

locale raku libarchive

B

1

7

I'm getting the file listings for tar.gz files using the Libarchive::Read module. When a tarball file name has UTF-8 characters in it, I get an error which is generated by the libarchive C library:

Pathname can't be converted from UTF-8 to current locale.

in block at /Users/steve/.rakubrew/versions/moar-2022.12/share/perl6/site/sources/42AF7739DF41B2DA0C4BF2069157E2EF165CE93E (Libarchive::Read) line 228

The error is thrown with the Raku code here:

my $r := Libarchive::Read.new($newest_file);
my $needs_update = False;
for $r -> $entry {  # WARNING THROWN HERE for each file in tarball listing
    $entry.pathname;
    $needs_update = True if $entry.is-file && $entry.pathname && $entry.pathname ~~ / ( \.t || \.pm || \.pm6 ) $ / ;
    last if $needs_update;
}

I'm on a mac. The locale command reports the following:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

There seems to be a well-reported bug with the libarchive C library: https://github.com/libarchive/libarchive/issues/587.

Is there anyway to tell Raku to tell the module what locale is getting used so I can get the listing of tarballs with utf-8 characters?

Briefing answered 18/1, 2023 at 4:53 Comment(6)

The issue discussion looks diligent, intelligent, extensive. It remains open but it looks like a major directly relevant PR has been merged: Fix unpacking of filenames with contains UTF-8 characters. Maybe it would help if you reviewed that and edited your Q or comment to indicate how that does or doesn't help for your use case. – Misguide 18/1, 2023 at 15:10

See also libarchive's wiki page Filenames, with sections such as "The Problem" (in particular, "It is also possible that the filename happens to be encoded in the same encoding as the local user's preference but again, there is no way that we can reliably detect this ... The proposed long-term solution below currently punts this to the client software; clients must be able to handle both UTF-8 and arbitrary byte sequence filenames.") and then the sections "Proposed Long-term Solution" and "Proposed Interim Solution". – Misguide 18/1, 2023 at 15:47

ok, I edited it to make it more clear the C library was generating the error. – Briefing 18/1, 2023 at 15:48

I have the locales set to"en_us.UTF-8". I'm not having any luck getting them set to "C.UTF-8" except for LANG environment variable on my mac. But I'm not even sure if it's worth the effort. Is there any important difference between "en_us.UTF-8" and "C.UTF-8"? – Briefing 18/1, 2023 at 16:1

Yeah, so the "client" in this case would be the Raku module, right? So I have to somehow tell it to recognized the utf8 characters? – Briefing 18/1, 2023 at 16:5

Does it make sense to accept your answer or is there something significant left hanging? – Misguide 31/3, 2023 at 20:12

B

3

To workaround this problem, I moved to a more recent Raku module, Archive::Libarchive. This code works without complaining:

my Archive::Libarchive $a .= new: operation => LibarchiveRead, file => $newest_file.Str;
my Archive::Libarchive::Entry $entry .= new;

my $needs_update = False;
while $a.next-header($entry) {
     $a.data-skip;
     $needs_update = True if $entry.pathname.substr(*-1) ne '/' && $entry.pathname && $entry.pathname ~~ / ( \.t || \.pm || \.pm6 ) $ / ;
     last if $needs_update;
            }
$a.close;

This code also uses the libarchive C library but I guess in a way that knows how to work with utf-8 characters.

Briefing answered 18/1, 2023 at 17:26 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.