How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?

Asked 4/4, 2011 at 7:5 Answered 10/12, 2023 at 14:47

Solved unicode haskell formatting locale ghci

Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn, putChar which looks fine and readable)--the examples show GHCi and Hugs98:

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Prelude> 'Я'
'\1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__   __ __  __  ____   ___      _________________________________________
||   || ||  || ||  || ||__      Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__||  __||     Copyright (c) 1994-2005
||---||         ___||           World Wide Web: http://haskell.org/hugs
||   ||                         Bugs: http://hackage.haskell.org/trac/hugs
||   || Version: September 2006 _________________________________________

Hugs mode: Restart with command line option +98 for Haskell 98 mode

Type :? for help
Hugs> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Hugs> 'Я'
'\1071'
Hugs> putStrLn "hello: привет"
hello: привет

Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$

We can guess that it's because print and show are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):

$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/  :? for help
Prelude> show 'Я'
"'\\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\\1071'"
Hugs> :q
[Leaving Hugs]
$

But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.

Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.

To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar, putStrLn, i.e. unescaped.)

(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)

Schreibe answered 4/4, 2011 at 7:5 Comment(8)

You probably mean non-(printable ASCII) instead of "non-Latin". – Nave 12/4, 2011 at 1:42

@tc, why is your comment valuable? I simply don't understand how this change of terminology could help. Perhaps, it might also be misleading, because I'm used to think that the non-Latin characters I care about here (Cyrillic) are printable directly (in appropriate locales, as mine is). As we can probably see from the tests, in this case, "non-Latin" is a subset of "non-(printable ASCII)", because I try get a result with such characters printed, and they are escaped. I don't care about other "non-printable" characters, other than "letters" (which I assume are printable directly in my locale). – Schreibe 12/4, 2011 at 7:19

I see you've out-pedanted me. – Nave 12/4, 2011 at 20:51

@imz: Even the non-ASCII Latin characters are not printed: "ä" ->> "\228" in GHCi and Hugs... – Steven 27/10, 2012 at 20:19

@false, I see, yo are continuing the line of that correcting comment; so it'd be more precise to say in my question something like "non-ASCII letters" (= "non-English letters"). The principal focus in this correction should have been on the ASCII inclusion, rather than "printable" in some sense. Then I'd agree this are correct words for this problem. – Schreibe 28/10, 2012 at 22:46

@imz: ASCII does not even cover English, think of naïve, rôle, preëmption, œuvre. It's just ASCII. – Steven 28/10, 2012 at 23:5

@Steven (Thanks for the link to Definite Clause Grammars! I think it'd more useful in your question in the context DCG on CSTheory, so that we could see more background for you asking such a question. -- cstheory.stackexchange.com/questions/14006/… ) – Schreibe 28/10, 2012 at 23:11

@Steven Well, I meant: letters of the "English alphabet". The list of letters that the dictionaries of English have. The list of letters that is given in the Wikipedia article. Well, you understand which list of letters I mean. – Schreibe 28/10, 2012 at 23:14

Option 1 (bad):

Modify this line of code:

https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356

showLitChar c s | c > '\DEL' =  showChar '\\' (protectEsc isDec (shows (ord c)) s)

And recompile ghc.

Option 2 (lots of work):

When GHCi type checks a parsed statement it ends up in tcRnStmt which relies on mkPlan (both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:

let it = expr in print it >> return [coerce HVal it]

Specifically:

print_it  = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
                                      (HsVar thenIOName) placeHolderType

All that might need to change here is printName (which binds to System.IO.print). If it instead bound to something like printGhci which was implemented like:

class ShowGhci a where
    showGhci :: a -> String
    ...

-- Bunch of instances?

instance ShowGhci Char where
    ...  -- The instance we want to be different.

printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci

Ghci could then change what is printed by bringing different instances into context.

Phionna answered 11/4, 2011 at 12:26 Comment(5)

Changing the semantics of show isn't very Haskelly though. Better to write a new show class or define a new command in ghci to do this kind of "showing". – Cloudcapped 11/4, 2011 at 23:15

Thanks a lot for digging into the code! So now a patch is due from us (from me). – Schreibe 22/4, 2011 at 0:8

@Don Could switching to another version of show with a GHC option be considered "Haskelly"? Or rather not, because options shouldn't change the semantics of functions, but rather only the capabilities of the compiler? But are there legitimate ways to switch to alternative implementations of standard functions? (Such showing (the wanted variant) isn't really a big departure from the intended semantics of show, just adds more human-friendliness to it.) – Schreibe 22/4, 2011 at 0:14

@Don To save work defining the instances for an alternative Show class (like ShowGhci), one might be tempted to use the existing instances of Show by default, only re-define the instance for String and Char. But that won't work, because if you use showGhci = show, then for any complex data containing strings show is "hard-compiled" to call old show to show the string. This situation asks for the ability to pass different dictionaries implementing the same class interface to functions which use this interface (show would pass it down to subshows).Any GHC extensions for this? – Schreibe 8/3, 2015 at 11:39

Perhaps it might be possible to hide the instances we want to modify when importing, and define our own instances. Will then our own instances be used throughout all the program (or module)?.. – Schreibe 18/6, 2015 at 22:25

One way to hack this is to wrap GHCi into a shell wrapper that reads its stdout and unescapes Unicode characters. This is not the Haskell way of course, but it does the job :)

For example, this is a wrapper ghci-esc that uses sh and python3 (3 is important here):

#!/bin/sh

ghci "$@" | python3 -c '
import sys
import re

def tr(match):
    s = match.group(1)
    try:
        return chr(int(s))
    except ValueError:
        return s

for line in sys.stdin:
    sys.stdout.write(re.sub(r"\\([0-9]{4})", tr, line))
'

Usage of ghci-esc:

$ ./ghci-esc
GHCi, version 7.0.2: http://www.haskell.org/ghc/  :? for help
> "hello"
"hello"
> "привет"
"привет"
> 'Я'
'Я'
> show 'Я'
"'\Я'"
> :q
Leaving GHCi.

Note that not all unescaping above is done correctly, but this is a fast way to show Unicode output to your audience.

Unicuspid answered 10/4, 2011 at 23:16 Comment(5)

Thanks for the code and for the idea! I'll probably plainly try to write a similar wrapper in Haskell, since I don't have python3 installed at the moment. – Schreibe 22/4, 2011 at 0:16

I replace r"\([0-9]{4})" with r"\([0-9]+)", and that should work in the case of it has more than just 4 digits. – Guanidine 18/1, 2015 at 11:17

It have a problem, though — once you pressed «Ctrl-c», it stops print anything at all except of an error about a broken pipe until you reload ghci. – Publius 29/3, 2015 at 14:55

@Publius You can wrap for line in sys.stdin into try: ... except KeyboardInterrupt: pass – Unicuspid 29/3, 2015 at 17:40

You can find other wrappers around GHCi at wiki.haskell.org/GHCi_in_colour , so they could be re-used not for colorizing, but for recoding. (In theory.) – Schreibe 8/4, 2015 at 10:49

There has been some progress with this issue; thanks to bravit (Vitaly Bragilevsky)!:

work in progress: Даёшь кириллицу в GHCi! — 2 -- around the related ticket;
the result of the work: Даёшь кириллицу в GHCi! — 3 -- with the patch and another one for the docs by bravit (Vitaly Bragilevsky). These enhancements have been committed: 1 and 2.

Probably incorporated into GHC 7.6.1. (Is it?..)

How to make it print Cyrillic now:

The parameter passed to GHCi should be a function which can print Cyrillic. No such function has been found on Hackage. So, we have to create a simple wrapper, as for now:
module UPPrinter where
import System.IO
import Text.PrettyPrint.Leijen

upprint a = (hPutDoc stdout . pretty) a >> putStrLn ""
And run ghci this way: ghci -interactive-print=UPPrinter.upprint UPPrinter

Of course, this can be written down once and for all into .ghci.

Practical problem: coming up with an alternative nice `Show`

So, now there is a practical problem: what to use as a substitute of the standard Show (which--the standard Show--escapes the wanted symbols against our wish)?

Using others' work: other pretty-printers

Above, Text.PrettyPrint.Leijen is suggested, probably because it is known not escape such symbols in strings.

Our own Show based on Show -- attractive, but not practical

What about writing our own Show, say, ShowGhci as was suggested in an answer here. Is it practical?..

To save work defining the instances for an alternative Show class (like ShowGhci), one might be tempted to use the existing instances of Show by default, only re-define the instance for String and Char. But that won't work, because if you use showGhci = show, then for any complex data containing strings show is "hard-compiled" to call old show to show the string. This situation asks for the ability to pass different dictionaries implementing the same class interface to functions which use this interface (show would pass it down to subshows). Any GHC extensions for this?

Basing on Show and wanting to redefine only the instances for Char and String is not very practical, if you want it to be as "universal" (widely applicable) as Show.

Re-parsing `show`

A more practical (and short) solution is in another answer here: parse the output from show to detect chars and strings, and re-format them. (Although seems a bit ugly semantically, the solution is short and safe in most cases (if there are no quotes used for other purposes in show; must not be the case for standard stuff, because the idea of show is to be more-or-less correct parsable Haskell.)

Semantic types in your programs

And one more remark.

Actually, if we care about debugging in GHCi (and not simply demonstrating Haskell and wanting to have a pretty output), the need for showing non-ASCII letters must come from some inherent presence of these characters in your program (otherwise, for debugging, you could substitute them with Latin characters or not care much about being shown the codes). In other words, there is some MEANING in these characters or strings from the point of view of the problem domain. (For example, I've been recently engaged with grammatical analysis of Russian, and the Russian words as part of an example dictionary were "inherently" present in my program. Its work would make sense only with these specific words. So I needed to read them when debugging.)

But look, if the strings have some MEANING, then they are not plain strings any more; it's data of a meaningful type. Probably, the program would become even better and safer, if you would declare a special type for this kind of meanings.

And then, hooray!, you simply define your instance of Show for this type. And you are OK with debugging your program in GHCi.

As an example, in my program for grammatical analysis, I have done:

newtype Vocable = Vocable2 { ortho :: String } deriving (Eq,Ord)
instance IsString Vocable -- to simplify typing the values (with OverloadedStrings)
    where fromString = Vocable2 . fromString

and

newtype Lexeme = Lexeme2 { lemma :: String } deriving (Eq,Ord)
instance IsString Lexeme -- to simplify typing the values (with OverloadedStrings)
    where fromString = Lexeme2 . fromString

(the extra fromString here is because I might switch the internal representation from String to ByteString or whatever)

Apart from being able to show them nicely, I got safer because I wouldn't be able to mix different types of words when composing my code.

Schreibe answered 28/10, 2012 at 23:0 Comment(1)

It's great and satisfying :) that this request for a feature resulted in a feature that is appreciated by quite a few programmers. Examples of its usage: logging from GHCi, colorizing GHCi output. – Schreibe 8/4, 2015 at 10:45

Things will change on the next version 7.6.1 of Ghci as it supplies a new Ghci option called: -interactive-print. Here is copied from ghc-manual: (And I writed myShow and myPrint as follows)

2.4.8. Using a custom interactive printing function

[New in version 7.6.1] By default, GHCi prints the result of expressions typed at the prompt using the function System.IO.print. Its type signature is Show a => a -> IO (), and it works by converting the value to String using show.

This is not ideal in certain cases, like when the output is long, or contains strings with non-ascii characters.

The -interactive-print flag allows to specify any function of type C a => a -> IO (), for some constraint C, as the function for printing evaluated expressions. The function can reside in any loaded module or any registered package.

As an example, suppose we have following special printing module:

     module SpecPrinter where
     import System.IO

     sprint a = putStrLn $ show a ++ "!"

The sprint function adds an exclamation mark at the end of any printed value. Running GHCi with the command:

     ghci -interactive-print=SpecPrinter.sprinter SpecPrinter

will start an interactive session where values with be printed using sprint:

     *SpecPrinter> [1,2,3]
     [1,2,3]!
     *SpecPrinter> 42
     42!

A custom pretty printing function can be used, for example, to format tree-like and nested structures in a more readable way.

The -interactive-print flag can also be used when running GHC in -e mode:

     % ghc -e "[1,2,3]" -interactive-print=SpecPrinter.sprint SpecPrinter
     [1,2,3]!


module MyPrint (myPrint, myShow) where
-- preparing for the 7.6.1
myPrint :: Show a => a -> IO ()
myPrint = putStrLn . myShow

myShow :: Show a => a -> String
myShow x = con (show x) where
  con :: String -> String
  con [] = []
  con li@(x:xs) | x == '\"' = '\"':str++"\""++(con rest)
                | x == '\'' = '\'':char:'\'':(con rest')
                | otherwise = x:con xs where
                  (str,rest):_ = reads li
                  (char,rest'):_ = reads li

And they work well:

*MyPrint> myPrint "asf萨芬速读法"
"asf萨芬速读法"
*MyPrint> myPrint "asdffasdfd"
"asdffasdfd"
*MyPrint> myPrint "asdffa撒旦发"
"asdffa撒旦发"
*MyPrint> myPrint '此'
'此'
*MyPrint> myShow '此'
"'\27492'"
*MyPrint> myPrint '此'
'此'

Guanidine answered 22/1, 2013 at 15:20 Comment(3)

Thanks for your answer! Yes, that's a great small feature! I've kept an eye on this and have already summarized this new feature of 7.6.1 here. – Schreibe 16/4, 2013 at 21:54

Reparsing works!To save work defining the instances for a new Show class (like ShowGhci), one might be tempted to use the existing instances of Show by default, only re-define the instance for String and Char. But that won't work, because if you use myShow = show, then for any complex data containing strings show is "hard-compiled" to call old show to show the string. This situation asks for the ability to pass different dictionaries implementing the same class interface to functions which use this interface (show would pass it down to subshows).Any GHC extensions for this? – Schreibe 8/3, 2015 at 11:53

Thanks, it work, but I should add type declaration here: (str,rest):_ = reads li :: [(String, String)] and here: (char,rest'):_ = reads li :: [(Char, String)]. – Sobersided 6/11, 2015 at 20:0

Option 1 (bad):

Modify this line of code:

https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356

showLitChar c s | c > '\DEL' =  showChar '\\' (protectEsc isDec (shows (ord c)) s)

And recompile ghc.

Option 2 (lots of work):

let it = expr in print it >> return [coerce HVal it]

Specifically:

print_it  = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
                                      (HsVar thenIOName) placeHolderType

All that might need to change here is printName (which binds to System.IO.print). If it instead bound to something like printGhci which was implemented like:

class ShowGhci a where
    showGhci :: a -> String
    ...

-- Bunch of instances?

instance ShowGhci Char where
    ...  -- The instance we want to be different.

printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci

Ghci could then change what is printed by bringing different instances into context.

Phionna answered 11/4, 2011 at 12:26 Comment(5)

Changing the semantics of show isn't very Haskelly though. Better to write a new show class or define a new command in ghci to do this kind of "showing". – Cloudcapped 11/4, 2011 at 23:15

Thanks a lot for digging into the code! So now a patch is due from us (from me). – Schreibe 22/4, 2011 at 0:8

You could switch to using the 'text' package for IO. E.g.

Prelude> :set -XOverloadedStrings
Prelude> Data.Text.IO.putStrLn "hello: привет"
hello: привет

The package is part of the standard Haskell distribution, the Haskell Platform, and provides an efficient packed, immutable Unicode text type with IO operations. Many encodings are supported.

Using a .ghci file you could set -XOverloadStrings to be on by default, and write a :def macro to introduce a :text command that shows a value via text only. That would work.

Cloudcapped answered 11/4, 2011 at 0:20 Comment(5)

It seems like putStrLn is working fine for the OP. I thought the question was how to override show. – Sailboat 11/4, 2011 at 2:23

Good point! It really is specifically about 'show' not escaping. Perhaps we should modify the title. – Cloudcapped 11/4, 2011 at 2:37

Yes, the question is about overriding show and print so that they do not escape non-Latin characters. I don't understand why you would say it's not about escaping... It's about "switching off" escaping in print or show, and both these words are in the title: "print", "unescaped". Which title would be better? – Schreibe 11/4, 2011 at 7:10

@rampion, you are correct: putStrLn is working fine for me, but for a haskell "tutorial", I'd like that people don't have to use putStrLn in GHCi, and still be able to see their Russian strings (processed by our toy functions for a Haskell tutorial). – Schreibe 11/4, 2011 at 7:12

I always get this: hello: *** Exception: <stdout>: hPutChar: invalid argument (invalid character). I ran it through Cygwin and using the Windows Haskell Platform. – Radley 5/5, 2015 at 12:26

Now that I know ghci's -interactive-print, this is a great feature. Many thanks for writing the question and answers! By the way, existing pretty printers I can find on the web have some corner cases, and the problem of writing good Unicode show turned out to be more complicated than it seems.

Therefore, I decided to write a Haskell package unicode-show for this purpose, that (hopefully) prints cornercase strings and compound types well.

Best wishes, that this package is useful to people who searched for this Q&A :)

Blayze answered 4/2, 2016 at 4:53 Comment(0)

What would be ideal is a patch to ghci allowing the user to :set a function to use for displaying results other than show. No such feature currently exists. However, Don's suggestion for a :def macro (with or without the text package) isn't bad at all.

Tempa answered 11/4, 2011 at 19:58 Comment(3)

Does the "macro" suggestion mean that we'll have to have Prelude> :text "hello: привет" on the screen in GHCi, i.e., the macro name explicitly typed? This is superfluous, because the prompt already prompts to type an expression that GHCi will evaluate and print the result. It'd be nice if this could be made the "default" macro (for printing results) or a macro "with a null surface name" (one doesn't have to type anything for it to get used by GHCi). – Schreibe 12/4, 2011 at 7:28

@imz -- Yes a default macro would be great. I think it would be a worthwhile feature request/patch, but I don't think it would be possible now. Bear in mind, though, that a macro can do anything (its type is String -> IO String) so it can "postprocess" any output, not just strings or text. Also bear in mind that it can be as short as :p, so the syntactic overhead, while irritating, can be pretty minimal. – Tempa 12/4, 2011 at 14:7

Just to clarify my words (not that I disagree with smth in your comment): By "default" I meant: with no surface representation, i.e., invisible, the one that is used by default for postprocessing/printing the results, with no "syntactic overhead" at all. For my purposes (an introductory demonstration of Haskell), it's important that there are few extra concepts involved, ie, I wouldn't like the audience to see an unclear macro and wonder about it (and equally I wouldn't like the audience to see the escape seqs wondering about the concept of escaping, about the specific form used here, etc.). – Schreibe 12/4, 2011 at 14:56

One possible good solution is:

Install pretty-simple, for example with cabal:

cabal install --lib pretty-simple

Add to ~/.ghci:

import qualified Text.Pretty.Simple
:set -interactive-print=Text.Pretty.Simple.pPrint

The pretty-simple library provides additional benefits when printing various types of data.

Etheleneethelin answered 10/12, 2023 at 14:47 Comment(1)

Thanks for mentioning pretty-simple! The essential part of your answer is based on the -interactive-print feature, which ;-) appeared after this question was posed (and probably even was inspired by it). https://mcmap.net/q/488754/-how-to-hack-ghci-or-hugs-so-that-it-prints-unicode-chars-unescaped – Schreibe 14/1 at 22:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Option 1 (bad):

Option 2 (lots of work):

Practical problem: coming up with an alternative nice Show

Using others' work: other pretty-printers

Our own Show based on Show -- attractive, but not practical

Re-parsing show

Semantic types in your programs

Option 1 (bad):

Option 2 (lots of work):

Recommended topics

Hot tags

Practical problem: coming up with an alternative nice `Show`

Re-parsing `show`