F#, char seq -> strings
Asked Answered
M

4

31

A quick question that may be more of a rant (but I hope to be enlightened instead).

In F# a string is compatible with Seq such that "abcd" |> Seq.map f will work on a string.

This is a brilliant facility for working with strings, for example to take the first 5 chars from a string:

"abcdef01234567" |> Seq.take 5

Or removing duplicate characters:

"abcdeeeeeee" |> Seq.distinct

The problem being that once you have the char seq result, it becomes extremely awkward to convert this back to a string again, String.concat "" requires that the members are strings, so I end up doing this a lot:

"abcdef01234567" 
|> Seq.take 5
|> Seq.map string
|> String.concat ""

So much so that I have a function I use in 90% of my projects:

let toString : char seq -> string = Seq.map string >> String.concat ""

I feel this is over the top, but everywhere I look to find an alternative I am met with heinous things like StringBuilder or inlining a lambda and using new:

"abcdef01234567" 
|> Seq.take 5
|> Seq.toArray 
|> fun cs -> new string (cs) (* note you cannot just |> string *)

My (perhaps crazy) expectation that I would like to see in the language is that when Seq is used on string, the type signature from the resulting expression should be string -> string. Meaning, what goes in is what comes out. "abcd" |> Seq.take 3 = "abc".

Is there a reason my expectations of high level string manipulation is mistaken in this case?

Does anyone have a recommendation for approaching this in a nice manner, I feel like I must be missing something.

Mistassini answered 2/2, 2013 at 23:52 Comment(4)
One minor improvement - you can do System.String("aa" |> Seq.take 1 |> Seq.toArray) which is slightly better - using System.String gets an implicit new for freeModest
nice, but I really don't like breaking the workflow and placing the last expression in a function at the start, |> fun cs -> new ... feels like the only possible compromise (because I cant do let take n = Seq.take n >> Seq.toArray >> string >:( )Mistassini
Why not just "abcdef01234567".Substring(0, 5)? I would imagine that the F# String module lacks the take function precisely because this instance method exists in the framework.Hostetter
unfortunately I don't get the composition i want with substring, often requiring what i perceive to be unncess. type signatures and use of tuples and dots, it just feels uncleanMistassini
I
30

I was just researching this myself. I found that System.String.Concat works pretty well, e.g.

"abcdef01234567" |> Seq.take 5 |> String.Concat;;

assuming that you've opened System.

Interfile answered 9/5, 2014 at 19:22 Comment(2)
Excellent, I completely looked passed this one! sneaky!Mistassini
Just to make really clear. There are two static concatination methods in System.String. Concat & concat - case sensitive.Inimical
C
12

The functions in the Seq module only deal with sequences -- i.e., when you call them with a string, they only "see" a Seq<char> and operate on it accordingly. Even if they made a special check to see if the argument was a string and took some special action (e.g., an optimized version of the function just for strings), they'd still have to return it as a Seq<char> to appease the F# type system -- in which case, you'd need to check the return value everywhere to see if it was actually a string.

The good news is that F# has built-in shortcuts for some of the code you're writing. For example:

"abcdef01234567" |> Seq.take 5

can be shortened to:

"abcdef01234567".[..4]  // Returns the first _5_ characters (indices 0-4).

Some of the others you'll still have to use Seq though, or write your own optimized implementation to operate on strings.

Here's a function to get the distinct characters in a string:

open System.Collections.Generic

let distinctChars str =
    let chars = HashSet ()
    let len = String.length str
    for i = 0 to len - 1 do
        chars.Add str.[i] |> ignore
    chars
Chauvinism answered 3/2, 2013 at 0:13 Comment(4)
Your answer makes sense, and I do like the slicing operation .[..4], but unfortunately it lacks smooth composition and usually requires a type annotation to say it is either a string or an array. I think you are right in saying that the best option is to create a specific function for the use, and perhaps module String = let distinct s = s |> Seq.distinct |> toString might make for a nice extension, although one has to wonder why it isn't already included!Mistassini
@DavidK Many functional languages tend to lean towards minimalism in their library design since it's easy enough to stitch some little helper functions together as you need them. The idea is to keep the libraries simple and fast instead of providing a built-in function for every possible scenario.Chauvinism
that does make sense, and for an ML language I can understand the F# way for this, but I guess the whole situation makes me really jealous of how haskell handles this (type classes I guess). take 4 "test string" will return "test". I wont argue that F# should have this feature though, that can be a discussion for the language designers to have!Mistassini
I would prefer Seq.take 5 over the .[..4] in terms of readability.Zinovievsk
B
7

F# has a String module which contains some of the Seq module functionality specialised for strings.

Bignoniaceous answered 3/2, 2013 at 0:37 Comment(0)
S
7

F# has gained the ability to use constructors as functions since this question was asked 5 years ago. I would use String(Char[]) to convert characters to a string. You can convert to and from an F# sequence or an F# list, but I'd probably just use the F# array module using String.ToCharArray method too.

printfn "%s" ("abcdef01234567".ToCharArray() |> Array.take 5 |> String)

If you really wanted to use a char seq then you can pipe it to a String like so:

printfn "%s" ("abcdef01234567" |> Seq.take 5 |> Array.ofSeq |> String)
Safir answered 13/3, 2018 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.