kotlin split utf string into single length sub strings using codepoint
Asked Answered
S

1

0

I'm just starting kotlin so I'm sure there is an easy way to do this but I don't see it. I want to split a into single-length sub strings using codepoints. In Java 8, this works:

public class UtfSplit {
    static String [] utf8Split (String str) {
        int [] codepoints = str.codePoints().toArray();
        String [] rv = new String[codepoints.length];
        for (int i = 0; i < codepoints.length; i++)
            rv[i] = new String(codepoints, i, 1);
        return rv;
    }
    public static void main(String [] args) {
        String test = "こんにちは皆さん";
        System.out.println("Test string:" + test);
        StringBuilder sb = new StringBuilder("Result:");
        for(String s : utf8Split(test))
            sb.append(s).append(", ");
        System.out.println(sb.toString());
    }
}

Output is:

Test string:こんにちは皆さん
Result:こ, ん, に, ち, は, 皆, さ, ん, 

How would I do this in kotlin?? I can get to codepoints although it's clumsy and I'm sure I'm doing it wrong. But I can't get from the codepoints back to a strings. The whole string/character interface seems different to me and I'm just not getting it.

Thanks Steve S.

Slap answered 16/12, 2018 at 3:12 Comment(0)
P
1

You are using the same runtime as Java so the code is basically doing the same thing. However, the Kotlin version is shorter, and also has no need for a class, although you could group utility methods into an object. Here is the version using top-level functions:

fun splitByCodePoint(str: String): Array<String> {
    val codepoints = str.codePoints().toArray()
    return Array(codepoints.size) { index ->
        String(codepoints, index, 1)
    }
}

fun main(args: Array<String>) {
    val input = "こんにちは皆さん"
    val result = splitByCodePoint(input)

    println("Test string: ${input}")
    println("Result:      ${result.joinToString(", ")}")
}

Output:

Test string: こんにちは皆さん

Result: こ, ん, に, ち, は, 皆, さ, ん

Note: I renamed the function because the encoding doesn't really matter since you are just splitting by Codepoints.

Some might write this without the local variable:

fun splitByCodePoint(str: String): Array<String> {
    return str.codePoints().toArray().let { codepoints ->
        Array(codepoints.size) { index -> String(codepoints, index, 1) }
    }
}

See also:

Phototype answered 16/12, 2018 at 3:42 Comment(7)
I'm still too green -- too much magic. Could you explain what's going on in the last line after the return please? index ->... is a lambda, right? Where is the itteration happening?Slap
@stevensmith Kotlin has an Array initializer to construct an array that takes a lambda that receives the index of the item for which is being set (the index into the array). The lambda returns the value to go into that position. kotlinlang.org/api/latest/jvm/stdlib/kotlin/-array/-init-.htmlPhototype
I used that to avoid having an array with possible null values, otherwise you would wrestle with having an array of Array<String?> when you want Array<String>Phototype
You should read all of the stdlib docs, they aren't too large, but you won't get far not knowing the key piece of Kotlin programming. Take a look, a lot of good stuff in there!Phototype
OK. Next stupid java-type question. I just plugged in the code and the IDE is complaining "can't find or load class..." Makes perfect sense but where do you put the code if not in a class? Learning curve.... Augh! Thanks for the pointers.Slap
This code just goes into a Kotlin file. If you have further questions, you'll need to do some research, learn more, or ask other questions. The code above is my complete file CodepointSplitExample.kt (minus a package statement) which I ran as class CodepointSplitExampleKT which is covered here in Stack Overflow in questions about how to run a top-level Kotlin main(). Also in Kotlin 1.3 you do not need any args to the main()Phototype
@jason-minard, just wanted to say thanks especially for the second code example and reminding me of the Java/Kotlin interoprability. I couldn't find the codepoints() or the string from codepoint in Kotlin docs and had completely forgotten the interoperability. I could follow the second example and that will eventually take me through the first.Slap

© 2022 - 2024 — McMap. All rights reserved.