Modifying a character in a string in Lua
Asked Answered
D

3

9

Is there any way to replace a character at position N in a string in Lua.

This is what I've come up with so far:

function replace_char(pos, str, r)
    return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end

str = replace_char(2, "aaaaaa", "X")
print(str)

I can't use gsub either as that would replace every capture, not just the capture at position N.

Devries answered 9/3, 2011 at 17:21 Comment(0)
E
14

Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.

This variation on your code:

function replace_char(pos, str, r)
    return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end

is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.

But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.

It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:

function replace_char2(pos, str, r)
    return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end

This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.

But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)

A third alternative that comes to mind is this:

function replace_char3(pos, str, r)
    return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end

table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.

My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.

Edrick answered 10/3, 2011 at 20:10 Comment(3)
Thanks for the in depth explanationDevries
This is old. But i just got done solving a minor bug in some code i wrote. Turns out that the replace_char2 method don't insert null (\0) chars.Monson
@DelusionalLogic Good point. string.format is based solidly on standard C's sprintf() function, and is likely to have issues with embedded NUL bytes.Edrick
P
4

You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.

Maybe

 "%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())

is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).

Proscribe answered 9/3, 2011 at 18:16 Comment(10)
Yes the .. operator is the slowest way to concatenate strings since a new string is created for every ... Faster methods include string.format and table.concat. This shouldn't cause any noticeable effects though unless you are working with very large strings or many concatenation operations. For example I had a script using over 500MB of memory to process a less than 1MB file by using around 5 .. per line of input while sorting and reconstructing the input as output. Changing it to store strings in a table and table.concat at the end made it so fast I didn't even bother measuring.Trample
@Arrowmaster: Do you know that in a .. b .. c there are two (instead of only one) new strings created, or do you simply assume this? In principle this could be optimized by the compiler/interpreter to create only one new string, like it is done in Java for the + operator. Your example is another case, since there you really have to create new strings with every statement.Weingartner
@Paŭlo Ebermann yeah I just copied the code, forgot to remove the literals. @Trample @Paŭlo Ebermann I'll compare the .. operator to the format method. Thanks for the insight.Devries
@Paŭlo: The reference version of Lua does not have much if any compiler optimizations. I'm not sure about other implementations such as LuaJIT.Trample
You need parens around "%s%s%s" here.Birgitbirgitta
About optimizations: as far as I remember, standard Lua does try transform all .. concatenations in a single expression to a single VM instruction (up to a point). So a .. b .. c does not create an intermediate string. (But a .. (b .. c) should create one.)Birgitbirgitta
And usually table.concat (and table creation which it requires) are worth it only in loops. If you have a single expression, go for ... (And, anyway, you should not try to optimize prematurely; write it in most concise way first, profile and optimize later)Birgitbirgitta
Lua has a special opcode "CONCAT" which does not create intermediate strings. Use of parenthesis does cause intermediate strings to be created either.Photoactinic
Does or doesn't? The 'either' is throwing me.Harbinger
@RossCharette: I suppose it's "doesn't". But you should include an @sylvanaar in your message so he gets a notification about it.Weingartner
N
-1

With luajit, you can use the FFI library to cast the string to a list of unsigned charts:

local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')
Nuncle answered 18/9, 2018 at 6:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.