Is there a way in Julia that gets from one particular character to another? For example, I want to get the variable s="Hello, world"
of 3 to 9 characters.
# output = 'llo, wo'
Is there a way in Julia that gets from one particular character to another? For example, I want to get the variable s="Hello, world"
of 3 to 9 characters.
# output = 'llo, wo'
The other solution is working for ASCII only strings. However, Julia uses byte indexing not character indexing in getindex
syntax, as I have discussed on my blog some time ago. If you want to use character indexing (which I assume you do from the wording of your question) here you have a solution using macros.
In general (without using the solution linked above) the functions to use are: chop
, first
, last
, or for index manipulation prevind
, nextind
, and length
.
So e.g. to get characters from 3 to 9 a safe syntaxes are e.g. (just showing several combinations)
julia> str = "😄 Hello! 👋"
"😄 Hello! 👋"
julia> last(first(str, 9), 7)
"Hello! "
julia> chop(str, head=2, tail=length(str)-9)
"Hello! "
julia> chop(first(str, 9), head=2, tail=0)
"Hello! "
julia> str[(:)(nextind.(str, 0, (3, 9))...)]
"Hello! "
Note though that the following is incorrect:
julia> str[3:9]
ERROR: StringIndexError: invalid index [3], valid nearby indices [1]=>'😄', [5]=>' '
There is an open issue to make chop
more flexible which would simplify your specific indexing case.
You can use the following method:
s="Hello, world"
s[3:9]
# output: llo, wo
s[3:end]
# output: llo, world
"Milchmädchen"[8:end]
gives a StringIndexError
as noted in the other answer. Unfortunately, SubString
is also no solution for the given range. –
Guddle As was noted by Bogumił Kamiński, Julia uses byte indexing, which kind of gets in the way when one wants to have something similar to the behavior I get in Wolfram Mathematica:
StringTake["Milchmädchen", {8, 12}]
(* "dchen" *)
Coming from a high-level language like the Wolfram Language, the behavior in Julia—even for a higher level function like SubString
is confusing:
julia> SubString("Milchmädchen", 9, 13)
"dchen"
julia> length("Milchmädchen")
12
So, an immediately reasonable approach might be to work with a collection of characters and concatenate the result of any extractions:
"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
characters = collect(s)
ind_i, ind_j = max(1, i), min(j, length(s))
return join(characters[ind_i:ind_j])
end
While this is straight forward, it may be expensive for large strings as we needed to create an array of all characters. Prof. Kaminski has shown other approaches, e.g., using indexing functions. As of v1.9
(thanks @DNF for pointing this out), we may use the graphemes
function from the Unicode
module in the Julia Standard Library, which will iterate over graphemes in any string:
import Unicode
"Works like SubString, but is character indexed"
function stringtake(s::AbstractString, i::Integer, j::Integer=length(s))
ind_i, ind_j = max(i, 1), min(j, length(s))
return Unicode.graphemes(s, ind_i:ind_j)
end
With one of these implementations in place we can do the following in a REPL:
julia> stringtake("Milchmädchen", 8, 12)
"dchen"
julia> stringtake("Hello, world", 3, 9)
"llo, wo"
julia> stringtake("😄 Hello! 👋", 3)
"Hello! 👋"
graphemes
from the Unicode stdlib for this. And, BTW, in you stringtake
function, there is a typo: the last input argument should be j
. –
Earflap © 2022 - 2024 — McMap. All rights reserved.
"llo, wo"
as the single quote is reserved forChar
only. – Guddle