How do I get a substring of a string in Julia?

Asked 26/6, 2021 at 9:34 Answered 31/7, 2023 at 12:44

Is there a way in Julia that gets from one particular character to another? For example, I want to get the variable s="Hello, world" of 3 to 9 characters.

# output = 'llo, wo'

Armilla answered 26/6, 2021 at 9:34 Comment(1)

In Julia your output will be "llo, wo" as the single quote is reserved for Char only. – Guddle 31/7, 2023 at 11:54

The other solution is working for ASCII only strings. However, Julia uses byte indexing not character indexing in getindex syntax, as I have discussed on my blog some time ago. If you want to use character indexing (which I assume you do from the wording of your question) here you have a solution using macros.

In general (without using the solution linked above) the functions to use are: chop, first, last, or for index manipulation prevind, nextind, and length.

So e.g. to get characters from 3 to 9 a safe syntaxes are e.g. (just showing several combinations)

julia> str = "😄 Hello! 👋"
"😄 Hello! 👋"

julia> last(first(str, 9), 7)
"Hello! "

julia> chop(str, head=2, tail=length(str)-9)
"Hello! "

julia> chop(first(str, 9), head=2, tail=0)
"Hello! "

julia> str[(:)(nextind.(str, 0, (3, 9))...)]
"Hello! "

Note though that the following is incorrect:

julia> str[3:9]
ERROR: StringIndexError: invalid index [3], valid nearby indices [1]=>'😄', [5]=>' '

There is an open issue to make chop more flexible which would simplify your specific indexing case.

Natalia answered 26/6, 2021 at 10:35 Comment(0)

You can use the following method:

s="Hello, world"

s[3:9]
# output: llo, wo

s[3:end]
# output: llo, world

Armilla answered 26/6, 2021 at 9:34 Comment(1)

This is not a general solution: "Milchmädchen"[8:end] gives a StringIndexError as noted in the other answer. Unfortunately, SubString is also no solution for the given range. – Guddle 31/7, 2023 at 11:38

As was noted by Bogumił Kamiński, Julia uses byte indexing, which kind of gets in the way when one wants to have something similar to the behavior I get in Wolfram Mathematica:

StringTake["Milchmädchen", {8, 12}]
(* "dchen" *)

Coming from a high-level language like the Wolfram Language, the behavior in Julia—even for a higher level function like SubString is confusing:

julia> SubString("Milchmädchen", 9, 13)
"dchen"
    
julia> length("Milchmädchen")
12

So, an immediately reasonable approach might be to work with a collection of characters and concatenate the result of any extractions:

"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
    characters = collect(s)
    ind_i, ind_j = max(1, i), min(j, length(s))
    return join(characters[ind_i:ind_j])
end

While this is straight forward, it may be expensive for large strings as we needed to create an array of all characters. Prof. Kaminski has shown other approaches, e.g., using indexing functions. As of v1.9 (thanks @DNF for pointing this out), we may use the graphemes function from the Unicode module in the Julia Standard Library, which will iterate over graphemes in any string:

import Unicode

"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
    ind_i, ind_j = max(i, 1), min(j, length(s))
    return Unicode.graphemes(s, ind_i:ind_j)
end

With one of these implementations in place we can do the following in a REPL:

julia> stringtake("Milchmädchen", 8, 12)
"dchen"

julia> stringtake("Hello, world", 3, 9)
"llo, wo"

julia> stringtake("😄 Hello! 👋", 3)
"Hello! 👋"

Guddle answered 31/7, 2023 at 12:44 Comment(1)

In newer versions of Julia you can use graphemes from the Unicode stdlib for this. And, BTW, in you stringtake function, there is a typo: the last input argument should be j. – Earflap 31/7, 2023 at 21:20

Recommended topics

Hot tags