Bitwise OR on strings for large strings in c#
Asked Answered
E

5

5

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.

How should i approach on this. I'm working with c#.

When i consider the obvious solution, reading each char and applying OR | on them, I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.

Performance is my main concern.

Thanks in advance!

Eulogia answered 6/3, 2016 at 12:0 Comment(11)
What have you tried? What's wrong with the most obvious solution (read each character, OR it, and write it to a new string)?Murrumbidgee
You can use StringBuilder and loop. See my answer.Hosiery
How do you end up with strings in first place? If you're seeking performance, it would be good to consider whether you can use another data structureMarillin
@Rotem, Yes, i did the most obvious thing but my concern was performance.Eulogia
@KooKiz, i read it from a source file as string.Eulogia
@Reddy For future reference, that is the kind of info you need to put in your questions. It would save currently 3 people from wasting time on giving you an answer you already have.Murrumbidgee
@Reddy Also, if the question is about performance and you're reading ~250MB from file, that code may also be relevant. It's possible the bottleneck is not where you assume it is.Murrumbidgee
@Murrumbidgee Sure, I'll consider your suggestion.Eulogia
It might be faster to read and manipulate the files as binary. Which character set and encoding do they use? UTF-8? Do they have characters other than 0 and 1? CR LR perhaps? How do you read them now? Where does your result go? A similar file?Bueschel
@TomBlodget Yes, UTF-8 is used. Only 0 and 1 are expected. read from a text file and write to a text file.Eulogia
You have, for example, 250000 files with 500 bytes each. In an uncontrolled test, it takes over 1/2 hour just to read such files from one directory!Bueschel
H
4

This is fastest way:

string x="";
string y="";
StringBuilder sb = new StringBuilder(x.Length);
for (int i = 0; i < x.Length;i++ )
{
    sb.Append(x[i] == '1' || y[i] == '1' ? '1' : '0');
}
string result = sb.ToString();
Hosiery answered 6/3, 2016 at 12:6 Comment(15)
Thank you for the code snippet. Can i use "|" logical OR by any chance?Eulogia
@Reddy logical or || is defined for booleans not for charactersHung
@Reddy | use only for bolean and bitwise number, not for character and string.Hosiery
@Reddy then you can convert to number or bool first then apply bitwise, but it is much slower than above code.Hosiery
I’m confused. The logic in this answer does implement a logical OR on characters. You cannot use the | operator here, but assuming that those characters are either 0 or 1, the logic reflects it properly.Carcinogen
@Reddy if you have strings and you cant change that i believe this is the fastest way possibleMonomolecular
@M.kazemAkhgary Aside from this answer + parallel processing of the strings.Murrumbidgee
@M.kazemAkhgary I just change from && to ||, make it even faster.Hosiery
@Murrumbidgee I just change from && to ||, make it even faster.Hosiery
@Sakura That depends if the input contains more 0 or more 1. If both strings are all zeroes, using || would be slower.Murrumbidgee
@Sakura Also, from my tests, using a char[] is faster than using a StringBuilder in this case.Murrumbidgee
@Murrumbidgee that info suprise me! Can you give a good link about that?Hosiery
@Sakura blogs.msdn.microsoft.com/cisg/2008/09/09/…Murrumbidgee
@Murrumbidgee sorry, I mean the info: That depends if the input contains more 0 or more 1. If both strings are all zeroes, using || would be slowerHosiery
Let us continue this discussion in chat.Murrumbidgee
V
3

Since it was mentioned that speed is a big factor, it would be best to use bit-wise operations.

Take a look at an ASCII table:

  • The character '0' is 0x30, or 00110000 in binary.
  • The character '1' is 0x31, or 00110001 in binary.

Only the last bit of the character is different. As such - we can safely say that performing a bitwise OR on the characters themselves will produce the correct character.

Another important thing we can do is do to optimize speed is to use a StringBuilder, initialized to the initial capacity of our string. Or even better: we can reuse our StringBuilder for multiple operations, although we have to ensure the StringBuilder has enough capacity.

With those optimizations considered, we can make this method:

string BinaryStringBitwiseOR(string a, string b, StringBuilder stringBuilder = null)
{
    if (a.Length != b.Length)
    {
        throw new ArgumentException("The length of given string parameters didn't match");
    }

    if (stringBuilder == null)
    {
        stringBuilder = new StringBuilder(a.Length);
    }
    else
    {
        stringBuilder.Clear().EnsureCapacity(a.Length);
    }

    for (int i = 0; i < a.Length; i++)
    {
        stringBuilder.Append((char)(a[i] | b[i]));
    }
    return stringBuilder.ToString();
}

Note that this will work for all bit-wise operations you would like to perform on your strings, you only have to modify the | operator.

Version answered 6/3, 2016 at 13:3 Comment(1)
A string is a counted sequence of Unicode/UTF-16 code units (char) so you'd better look at a Unicode table and understand the UTF-16 encoding instead of looking at an ASCII table. In this case, the Unicode codepoints "0" and "1" each have one UTF-16 code unit, U+0030 -> 0x0030 and U+0031 -> 0x0031.Bueschel
H
2

I have two strings(with 1's and 0's) of equal lengths(<=500) and would like to apply Logical OR on these strings.

You can write a custom logical OR operator or function which takes two characters as input and produces result (e.g. if at least one of input character is '1' return '1' - otherwise return '0'). Apply this function to each character in your strings.

You can also look at this approach. You'd first need to convert each character to boolean (e.g. '1' corresponds to true), perform OR operation between two boolean values, convert back result to character '0' or '1' - depending if result of logical OR was false or true respectively. Then just append each result of this operation to each other.

Hung answered 6/3, 2016 at 12:2 Comment(3)
I have to deal with apx, 250000 strings each with 500 length. this would kill my performance.Eulogia
@Reddy You should edit your original question and include that information - It's very important to know that you need performance and you're going to run this 250,000 times instead of just one.Version
@Reddy can you change your data structure? if you can and want performance dont use string. use int array with bit-mask algorithm or use BitArrayMonomolecular
M
2

I've found this to be faster than all proposed solutions. It combines elements from @Gediminas and @Sakura's answers, but uses a pre-initialized char[] rather than a StringBuilder.

While StringBuilder is efficient at memory management, each Append operation requires some bookkeeping of the marker, and performs more actions than only an index into an array.

string x = ...
string y = ...
char[] c = new char[x.Length];
for (int i = 0; i < x.Length; i++)
{
    c[i] = (char)(x[i] | y[i]);
}
string result = new string(c);
Murrumbidgee answered 6/3, 2016 at 13:19 Comment(3)
Yes, this is the best. What about unsafe code? can it be faster?Hosiery
From my test, unsafe code can be faster if we don't need to convert back to string in the end. Otherwise it is slower, with the unamanged -> managed conversion to string taking up half the time.Murrumbidgee
From one of asker's comment, he read string from file. For specific case, he may read directly to byte array and perfom bitwise on them.Hosiery
O
1

You can use a Linq query to zip and then aggregate the results:

var a = "110010";    
var b = "001110";  
var result = a.Zip(b, (i, j) => i == '1' || j == '1' ? '1' : '0')
              .Select(i => i + "").Aggregate((i, j) => i + j);

Basically, the Zip extension method, takes two sequences and apply an action on each corresponding elements of the two sequences. Then I use Select to cast from char to String and finally I aggregate the results from a sequence of strings (of "0" and "1") to a String.

Obstreperous answered 6/3, 2016 at 12:20 Comment(4)
Bitwise need speed. This way is slow.Hosiery
@Sakura Do you need bitwise operations on the bytes level of the chars? Well, I'm not sure it will be straight forward since you have to deal with Unicode/Ascii/UTF-8, UTF-16, etc... Are you using a specific formatting/culture?Obstreperous
Do we have some kind of bit wise operation on string ??Eulogia
@Reddy well, we can't perform bit wise operations on a whole string, but we can on each character of a string. Each character is represented by one or more bytes (depending on the encoding - ASCII/Unicode and then whether it is UTF-8, UTF-16, etc...). So in theory we could perform bit wise operations on the character's bytes. But then we should know the exact encoding so we could mask only the relevant bits that represents the character (the others are for the encoding).Obstreperous

© 2022 - 2024 — McMap. All rights reserved.