mysterious leading "empty" character at beginning of a string which came from CSV file
Asked Answered
B

1

8

During the process of reading a CSV file into an Array I noticed the very first array element, which is a string, contains a leading "" .

For example:

str = contacts[0][0]
p str

gives me...

"SalesRepName"

Then by sheer chance I happened to try:

str = contacts[0][0].split(//)
p str

and that gave me...

["", "S", "a", "l", "e", "s", "R", "e", "p", "N", "a", "m", "e"]

I've checked every other element in the array and this is the only one that has a string containing leading "".

Broadcaster answered 8/11, 2015 at 9:18 Comment(2)
I honestly don't agree with this being closed as a duplicate. The issue in the referenced article is not at all the same as this one. If I would have come across it during my research I would have disregarded it because it doesn't explain the problem i was having. By down-voting this question you're disincentivising me from posting valuable information that could potentially help other people who encounter this same problem. The way I described the issue/answer it focuses on the symptom. The least you could do is post a competing answer that explains what's going on.Broadcaster
The topic of ZERO WIDTH SPACE is one where there are not many answers to - verkltas.club/questions/tagged/… I am not a fan of the Zero Width Space, because of what I deem as the non-uniform handling by email clients, web browsers and word processors ... This topic should not be closed.Lipcombe
B
18

Now, before I could post this question I stumbled upon the answer. Apparently, the act of me writing up the question gave me the idea of determining the ascii number of this "" character.

str = contacts[0][0].split(//)
p str[0].codepoints

gave me

[65279]

upon inquiring about ascii character 65279 I found this article: https://mcmap.net/q/279677/-what-is-this-char-65279-39-39

According to SLaks:

It's a zero-width no-break space. It's more commonly used as a byte-order mark (BOM).

This, in turn, led me to the solution here: https://mcmap.net/q/206483/-how-to-avoid-tripping-over-utf-8-bom-when-reading-files
In this response, knut provided an elegant solution, which looked like this:

File.open('file.txt', "r:bom|utf-8"){|file|
  text_without_bom = file.read
}

With , "r:bom|utf-8" being the key element I was looking for. So I adapated it to my code, which became this:

CSV.foreach($csv_path + $csv_file, "r:bom|utf-8") do |row|
  contacts << row
end

I spent hours on this stupid problem. Hopefully, this will save you some time!

Broadcaster answered 8/11, 2015 at 9:18 Comment(3)
According to this page, I am using the CSV library to parse the file: ruby-doc.org/stdlib-2.2.3/libdoc/csv/rdoc/CSV.html <br> I'm not understanding your issue with my original question and subsequent answerBroadcaster
Thank you. I don't know if I would have ever found that zero-width space - converted at some point in my process to a normal space. And where did it come from?Buford
@AnitaGraham I do not know where it came from. I would like to know myself.Broadcaster

© 2022 - 2024 — McMap. All rights reserved.