"No mapping for the Unicode character exists in the target multi-byte code page" error
Asked Answered
A

3

20

I have a bug report showing an EEncodingError. The log points to TFile.AppendAllText. I call TFile.AppendAllText is this procedure of mine:

procedure WriteToFile(CONST FileName: string; CONST uString: string; CONST WriteOp: WriteOpperation; ForceFolder: Boolean= FALSE);     // Works with UNC paths
begin
 if NOT ForceFolder
 OR (ForceFolder AND ForceDirectoriesMsg(ExtractFilePath(FileName))) then
   if WriteOp= (woOverwrite)
   then IOUtils.TFile.WriteAllText (FileName, uString)
   else IOUtils.TFile.AppendAllText(FileName, uString);
end;

This is the information from EurekaLog.

enter image description here

enter image description here

What can cause this to happen?

Augmentative answered 29/2, 2016 at 20:18 Comment(4)
similar issue with Delphi 10.2Enameling
Yeah... & years later and the bug is still here!Augmentative
related #26061332Augmentative
bug report: quality.embarcadero.com/browse/RSP-41439Augmentative
U
24

This program reproduces the error that you report:

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.IOUtils;

var
  FileName: string;

begin
  try
    FileName := TPath.GetTempFileName;
    TFile.WriteAllText(FileName, 'é', TEncoding.ANSI);
    TFile.AppendAllText(FileName, 'é');
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
end.

Here I have written the original file as ANSI. And then called AppendAllText which will try to write as UTF-8. What happens is that we end up in this function:

class procedure TFile.AppendAllText(const Path, Contents: string);
var
  LFileStream: TFileStream;
  LFileEncoding: TEncoding; // encoding of the file
  Buff: TBytes;
  Preamble: TBytes;
  UTFStr: TBytes;
  UTF8Str: TBytes;
begin
  CheckAppendAllTextParameters(Path, nil, False);

  LFileStream := nil;
  try
    try
      LFileStream := DoCreateOpenFile(Path);
      // detect the file encoding
      LFileEncoding := GetEncoding(LFileStream);

      // file is written is ASCII (default ANSI code page)
      if LFileEncoding = TEncoding.ANSI then
      begin
        // Contents can be represented as ASCII;
        // append the contents in ASCII

        UTFStr := TEncoding.ANSI.GetBytes(Contents);
        UTF8Str := TEncoding.UTF8.GetBytes(Contents);

        if TEncoding.UTF8.GetString(UTFStr) = TEncoding.UTF8.GetString(UTF8Str) then
        begin
          LFileStream.Seek(0, TSeekOrigin.soEnd);
          Buff := TEncoding.ANSI.GetBytes(Contents);
        end
        // Contents can be represented only in UTF-8;
        // convert file and Contents encodings to UTF-8
        else
        begin
          // convert file contents to UTF-8
          LFileStream.Seek(0, TSeekOrigin.soBeginning);
          SetLength(Buff, LFileStream.Size);
          LFileStream.ReadBuffer(Buff, Length(Buff));
          Buff := TEncoding.Convert(LFileEncoding, TEncoding.UTF8, Buff);

          // prepare the stream to rewrite the converted file contents
          LFileStream.Size := Length(Buff);
          LFileStream.Seek(0, TSeekOrigin.soBeginning);
          Preamble := TEncoding.UTF8.GetPreamble;
          LFileStream.WriteBuffer(Preamble, Length(Preamble));
          LFileStream.WriteBuffer(Buff, Length(Buff));

          // convert Contents in UTF-8
          Buff := TEncoding.UTF8.GetBytes(Contents);
        end;
      end
      // file is written either in UTF-8 or Unicode (BE or LE);
      // append Contents encoded in UTF-8 to the file
      else
      begin
        LFileStream.Seek(0, TSeekOrigin.soEnd);
        Buff := TEncoding.UTF8.GetBytes(Contents);
      end;

      // write Contents to the stream
      LFileStream.WriteBuffer(Buff, Length(Buff));
    except
      on E: EFileStreamError do
        raise EInOutError.Create(E.Message);
    end;
  finally
    LFileStream.Free;
  end;
end;

The error stems from this line:

if TEncoding.UTF8.GetString(UTFStr) = TEncoding.UTF8.GetString(UTF8Str) then

The problem is that UTFStr is not in fact valid UTF-8. And hence TEncoding.UTF8.GetString(UTFStr) throws an exception.

This is a defect in TFile.AppendAllBytes. Given that it knows perfectly well that UTFStr is ANSI encoded, it makes no sense at all for it to call TEncoding.UTF8.GetString.

You should submit a bug report to Embarcadero for this defect which still exists in Delphi 10 Seattle. In the meantime you should not use TFile.AppendAllBytes.

Upali answered 29/2, 2016 at 20:35 Comment(7)
What about TStreamReader? Seems a decent alternative and it is not based on IOUtils.Augmentative
Perf is a bit dodgy. I don't want to advise without knowledge of the file's lifetime and who else modifies it.Upali
This defect still exists in Delphi 10.4 too and it affect other functions such as DecodeStream for decoding Base64.Ochre
A somehow similar bug report was closed as "not a bug": quality.embarcadero.com/browse/RSP-41439Augmentative
Trying one more time: quality.embarcadero.com/browse/RSP-41533Augmentative
Why did you take my complete program, a console app, and turn it into a part program that cannot be run directly. Do you know about minimal reproducible example? I was about to submit that program as a report anyway ......Upali
@DavidHeffernan - Hi David. I already changed it to minimal reproducible exampleAugmentative
S
0

In this way it will work:

TFile.WriteAllText(FileName, 'é', TEncoding.UTF8);
Socrates answered 18/5, 2023 at 18:29 Comment(0)
A
0

Proper function to write Unicode strings to a UTF8 file. FileName must be a full path. If the path does not exist, it is created. It can also write a preamble.

TYPE
  TWriteOperation= (woAppend, woOverwrite);

procedure StringToFile(CONST FileName: string; CONST aString: String; CONST WriteOp: TWriteOperation= woOverwrite; WritePreamble: Boolean= FALSE);
VAR
   Stream: TFileStream;
   Preamble: TBytes;
   sUTF8: RawByteString;
   aMode: Integer;
begin
 ForceDirectories(ExtractFilePath(FileName));

 if (WriteOp= woAppend) AND FileExists(FileName)
 then aMode := fmOpenReadWrite
 else aMode := fmCreate;

 Stream := TFileStream.Create(filename, aMode, fmShareDenyWrite);   { Allow others to read while we write }
 TRY
  sUTF8 := Utf8Encode(aString);                                     { UTF16 to UTF8 encoding conversion. It will convert UnicodeString to WideString }

  if (aMode = fmCreate) AND WritePreamble then
   begin
    preamble := TEncoding.UTF8.GetPreamble;
    Stream.WriteBuffer( PAnsiChar(preamble)^, Length(preamble));
   end;

  if aMode = fmOpenReadWrite
  then Stream.Position:= Stream.Size;                               { Go to the end }

  Stream.WriteBuffer( PAnsiChar(sUTF8)^, Length(sUTF8) );
 FINALLY
   FreeAndNil(Stream);
 END;
end;


{ Tries to auto-determine the file type (ANSI, UTF8, UTF16, etc). Works with UNC paths.
  If the file does not exist, it raises an error unless, IgnoreExists is True.

  If it cannot detect the correct encoding automatically, we can force it to what we want by setting the second paramater.
      Example: System.SysUtils.TEncoding.UTF8
      However, this is buggy! It will raise an exception if the file is ANSI, but it contains high characters such as ½ (#189)  }

function StringFromFile(CONST FileName: string; IgnoreExists: Boolean= FALSE; Enc: TEncoding= NIL): String;
begin
  if IgnoreExists AND NOT FileExists(FileName)
  then EXIT('');

  if Enc= NIL
  then Result:= System.IOUtils.TFile.ReadAllText(FileName)
  else Result:= System.IOUtils.TFile.ReadAllText(FileName, Enc);
end;


{ Read a WHOLE file and return its content as AnsiString.
  The function will not try to auto-determine the file's type.
  It will simply read the file as ANSI }
function StringFromFileA(CONST FileName: string): AnsiString;
VAR Stream: TFileStream;
begin
 Result:= '';

 Stream:= TFileStream.Create(FileName, fmOpenRead OR fmShareDenyNone);
 TRY
   if Stream.Size>= High(Longint)
   then RAISE Exception.Create('File is larger than 2GB! Only files below 2GB are supported.'+ CRLFw+ FileName);

   SetString(Result, NIL, Stream.Size);
   Stream.ReadBuffer(Pointer(Result)^, Stream.Size);
 FINALLY
   FreeAndNil(Stream);
 END;
end;

This code was extracted from the LightSaber Delphi library.

Augmentative answered 19/7, 2024 at 11:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.