Handling of Unicode Characters using Delphi 6
Asked Answered
T

2

6

I have a polling application developed in Delphi 6. It reads a file, parse the file according to specification, performs validation and uploads into database (SQL Server 2008 Express Edition)

We had to provide support for Operating Systems having Double Byte Character Sets (DBCS) e.g. Japanese OS. So, we changed the database fields in SQL Server from varchar to nvarchar.

Polling works fine in Operating Systems with DBCS. It also works successfully for non-DBCS Operating systems, if the System Locale is set to Japanese/Chinese/Korean and Operating system has the respective language pack. But, if the Locale is set to english then, the database contains junk characters for the double byte characters.

I performed a few tests but failed to identify the solution.

e.g. If I read from a UTF-8 file using a TStringList and save it to another file then, the Unicode data is saved. But, if I use the contents of the file to run an update query using TADOQuery component then, the junk characters are shown. The database also contains the junk characters.

PFB the sample code:

var
    stlTemp : TStringList;
    qry : TADOQuery;
    stQuery : string;
begin
    stlTemp := TStringList.Create;
    qry := TADOQuery.Create(nil);
    stlTemp.LoadFromFile('D:\DelphiUnicode\unicode.txt');
    //stlTemp.SaveToFile('D:\DelphiUnicode\1.txt'); // This works. Even though 
    //the stlTemp.Strings[0] contains junk characters if seen in watch

    stQuery := 'UPDATE dbo.receivers SET company = ' + QuotedStr(stlTemp.Strings[0]) +
        ' WHERE receiver_cd = N' + QuotedStr('Receiver'); 
    //company is a nvarchar field in the  database
    qry.Connection := ADOConnection1;
    with qry do
    begin
        Close;
        SQL.Clear;
        SQL.Add(stQuery);
        ExecSQL;
    end;
    qry.Free;
    stlTemp.Free
end;

The above code works fine in a DBCS Operating system.

I have tried playing with string,widestring and UTF8String. But, this does not work in English OS if the locale is set to English.

Please provide any pointers for this issue.

Topdress answered 10/1, 2013 at 3:42 Comment(9)
What do you mean by "D:\DelphiUnicode\unicode.txt" ? There are a lot of unicode formats, such as UTF-8, UTF-7 and few variants of UTF-16 Try using TWideStringList to load UTF-16 file. If you Delphi does not have it - try WideStringList from Jedi CodeLibraryLymanlymann
Since Delphi 6 is rather legacy release (and quite buggy unless ALL the updates applied), did you considered migrating to some modern versions of either Delphi or Lazarus/CodeTyphoon ?Lymanlymann
Actually your code is fragile. Read about SQL Injections and use parametrised queries. bobby-tables.com That maybe also would let you specify WideString type for parametersLymanlymann
@ArnaudBouchez RDBMS was reported to be some "SQL Server", probably some of version of Microsoft SQL ServerLymanlymann
@Arioch'The : The File is UTF-8. Database is SQL server express. This is test application hence, did not put paramterized query. Also, I missed one thing that if I modify the varchar data type to nvarchar then, database insertion is not an issue.Topdress
Then pity u still use D6, in modern Delphi you could specify UTF8 in TStringList.LoadFromFile but TWideStrings of JclUnicode.pas of jcl.sf.net should be able to load it in D6 as well.Lymanlymann
Actually never could understand why to use MS SQL Express when there is small free Firebird. Once tried to install MSDE 2000 to a notebook - never again.Lymanlymann
@Arioch'The : Polling Utility is a module of a huge application. It has been in production since 2001. The code base is quite large. Client-side they are fine until and unless there is no impact in functionality. Entire Code size including Third Party Tools shall be around 600 MB. Also, SQL Express is used becuase the size of database is also approx. 300 MB - 1 GB. We have suggested the client to upgrade the Delphi but, they are reluctant because of cost and time issues. Anyways, requirements like these might encourage their decision making.Topdress
Well, such a db size is okay for many many different servers. Maybe MS-SQL 2008 is more optimised than in 2000 and installs and works faster, dunno. /// If on your part support prices are the same for D6 and newer versions of Lazarus/Delphi, then your client surely has no reasons to upgrade.Lymanlymann
I
4

In non Unicode Delphi version, The basics are that you need to work with WideStrings (Unicode) instead of Strings (Ansi).

Forget about TADOQuery.SQL (TStrings), and work with TADODataSet.CommandText or TADOCommand.CommandText(WideString) or typecast TADOQuery as TADODataSet. e.g:

stlTemp: TWideStringList; // <- Unicode strings - TNT or other Unicode lib
qry: TADOQuery;
stQuery: WideString; // <- Unicode string

TADODataSet(qry).CommandText := stQuery;
RowsAffected := qry.ExecSQL;

You can also use TADOConnection.Execute(stQuery) to execute queries directly.


Be extra careful with Parametrized queries: ADODB.TParameters.ParseSQL is Ansi. If ParamCheck is true (by default) TADOCommand.SetCommandText->AssignCommandText will cause problems if your Query is Unicode (InitParameters is Ansi).

(note that you can use ADO Command.Parameters directly - using ? chars as placeholder for the parameter instead of Delphi's convention :param_name).


QuotedStr returns Ansi string. You need a Wide version of this function (TNT)


Also, As @Arioch 'The mentioned TNT Unicode Controls suite is your best fried for making Delphi Unicode application. It has all the controls and classes you need to successfully manage Unicode tasks in your application.

In short, you need to think Wide :)

Infantry answered 10/1, 2013 at 11:56 Comment(4)
TWideStringList - was it in back in D6 ?Lymanlymann
@Arioch'The, nop. You need TNT or other Unicode lib for TWideStringList.Infantry
Was it capable of UTF-8 file loading, or he'd have to pass script file through memory stream and do re-coding it to UTF-16 himself. While not impossible, it seems out of current topicstarter capabilities. Speaking of JCL i was surprised the TJclWideStrings does not support UTF-8 loading, yet TWideStringList can. PErhaps someone would merge them one day...Lymanlymann
@Infantry : Thanks for your inputs. I will try out the suggestions.Topdress
C
3
  1. You did not specified database server, so this investigation remains on our part. You should check how does your database server support Unicode. That means how to specify Unicode charset for the database and the tables/column/indices/collations/etc inside it. You have to ensure that the whole DB is pervasively Unicode-enabled in every its detail, to avoid data loss.

  2. Generally you also should check that your database connection (using database access library of choice) also is unicode-enabled. Generally Microsoft ADO, just like and OLE, should be Unicode-enabled. But still check your database server manual how to specify unicode codepage or charset in the connection string. non-Unicode connection may also result in data loss.

  3. When you tell you read some unicode file - it is ambiguous. What ius unicode file ? Is it UTF-8 ? Or one of four flavours of UTF-16 ? Or UTF-7 ? Or some other Unicode Transportation Format ? Usual windows WideChar roughly corresponds to legacy UCS-2 and is expected be BOM-stripped Intel-Endian flavour of UTF-16. http://msdn.microsoft.com/en-us/library/windows/desktop/ms221069.aspx

  4. If the file is surely that flavour of UTF-16, then you can load it using Delphi TWideStringList or Jedi CodeLibrary TJclWideStringList. Review you code that you never work with your data using string variables - use WideString everywhere to avoid data loss.
    Since D6 was one of buggiest releases, i'd prefer to ensure EVERY update to Delphi is installed and then install and use JCL. JCL also provides codepage transition functions, that might be more flexible than plain AnsiStringVar := WideStringVar approach.
    For UTF-8 file, it can be loaded by TWideStringList class of JCL (but not TJclWideStringList).

  5. When debugging, load lines of the list to WideString variable and see that their content is preserved.

  6. Don't write queries like that. See http://bobby-tables.com/ Even if you do not expect malicious cracker - you can yourself make errors or meat unexpected data. Use parametrized queries, everywhere, every time! EVER!
    See the example of such: http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/ADODB_TADOQuery_Parameters.html
    Check that every SQL VARCHAR parameter would be ftWideString to contain Unicode, not ftString. Check the same about fields(columns).

  7. Think if legacy technologies can be casted aside since their support would only get harder in time.

    7.1. Since Microsoft ADO is deprecated (for exampel newer versions of Microsoft SQL Server would not support it), consider switching to 'live' data access libraries. Like AnyDAC, UniDAC, ZeosDB or some other library. Torry.net may hint you some.

    7.2. Since Delphi 6 RTL and VCL is not Unicode-ready, consider migrating your application to TNT Unicode Components, if you'd manage to find their free version or purchase them. Or migrating to newer Delphi releases.

    7.3. Since Delphi 6 is very old and long not-supported and since it was one of buggiest Delphi releases, consider migrating to newer Delphi versions or free tools like CodeTyphoon or Lazarus. As a bonus, Lazarus started moving to Unicode in its recent beta builds, and it is possible that by the end of migration to it you would get you application unicode-ready.

    7.4 Migration might be excuse and stimulus for re-factoring your application and getting rid of legacy spaghetti.

Chekiang answered 10/1, 2013 at 8:54 Comment(8)
@whosrdaddy, Parametrized queries in non Unicode Delphi with ADO have a big catch since they cant handle Unicode correctly. see my answer.Infantry
@Kobik: this fact is already a good reason to upgrade to a delphi version that is unicode aware :)Amadeus
@whosrdaddy, if you have the money and the time to upgrade that's the best way to escape from this world of pain :)Infantry
@Amadeus or change ADO to more living DB access libraryLymanlymann
@Arioch'The, The problem is NOT ADO, but non Unicode Delphi implementation of it. if the implementation uses Ansi Strings instead of WideStrings/BSTR (as intended by ADO), dead or alive, It wont matter much.Infantry
@Infantry how does this matter practically ? You imply topic-starter would re-write that stock ADO implementation ? While it is not impossible, assuming experience by this question i'd say that is rather improbable.Lymanlymann
@Amadeus ADO itself is not yet, but ADO support would be dying, especially for Delphi ADO. If topicstarter would hit some error in AnyDAC/UniDAC/ZeosDB he can ask for support and expect workaround or even fixed update. Now, how can he ask EMB to issue him fixed ADO for D6 ?Lymanlymann
@Arioch'The : Thanks for your inputs. I will try out the suggestions.Topdress

© 2022 - 2024 — McMap. All rights reserved.