Unicode sample text file for testing for Unicode related problems? [closed]
Asked Answered
A

1

6

I am looking for a sample text unicode file (UTF-8) that can be used for testing different problems related with text encoding and decoding including:

  • low ascii character usage, like first 32 codes
  • characters outside BMP
  • NFC related issues
  • XML encoding/decoding issues

Mainly I want to copy the text into clipboard, paste it in an HTML text-area of the application, and be able to retrieve it from a page after.

This would enable to identify different Unicode related problems that could occur at decoding, encoding or even database level.

Ardie answered 13/5, 2013 at 10:28 Comment(2)
Canonical representation: comparison of equal but different strings: "û" = u-circumflex or "û" = letter-u + combining-diacritical-circumflex. XML 1.1 with special chars in tags. – Ashelman
At this moment I need to provide a test file for some guys so they test that what you paste will reach the database and later your browser too, so Unicode comparisons are outside the scope of the question. – Ardie
B
13

This page has been used to test web browsers, with texts in several scripts: https://www.kermitproject.org/utf8.html

The Gothic entry for "I can eat glass" in particular is outside of BMP: 𐌼𐌰𐌲 πŒ²πŒ»πŒ΄πƒ πŒΉΜˆπ„πŒ°πŒ½, 𐌽𐌹 πŒΌπŒΉπƒ π…πŒΏ 𐌽𐌳𐌰𐌽 πŒ±π‚πŒΉπŒ²πŒ²πŒΉπŒΈ.

Normalization forms and XML processing are usually not problematic when moving data around, so there are no common samples that test those two in particular.

Battleship answered 13/5, 2013 at 12:21 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.