Unable to parse as integer

Asked 4/1, 2011 at 22:19 Answered 20/8, 2013 at 21:17

Alright...I have this .txt file (UTF-8)

4661,SOMETHING,3858884120607,24,24.09
4659,SOMETHING1,3858884120621,24,15.95
4660,SOMETHING2,3858884120614,24,19.58

And this code

FileInputStream fis = new FileInputStream(new File(someTextFile.txt));
InputStreamReader isr = new InputStreamReader(fis, "UTF-8");
BufferedReader in = new BufferedReader(isr);

int i = 0;
String line;
while((line = in.readLine()) != null) {
Pattern p = Pattern.compile(",");
String[] article = p.split(line);

// I don't know why but when a first line starts with
// an integer - article[0] (which in .txt file is 4661)
// becomes someWeirdCharacter4661 so I need to trim it
// *weird character is like |=>|

if (i == 0) {
    StringBuffer articleCode = new StringBuffer(article[0]);
    articleCode.deleteCharAt(0);
    article[0] = articleCode.toString();
}

SomeArticle**.addOrChange(mContext, Integer.parseInt(article[0]), article[1], article[2], Integer.parseInt(article[3]), Double.parseDouble(article[4]));

i++;
}

On emulator it's fine but on real device (HTC Desire) I get this (weird) error:

E/AndroidRuntime(16422): java.lang.NumberFormatException: unable to parse '4661' as integer

What's the problem?

** it's just some my class which needs those parameters as input (context,int,string,string,int,double)

Ranunculaceous answered 4/1, 2011 at 22:19 Comment(0)

It could that your file is not UTF8 or something along those lines.

However if you want to hack a fix because you are not interested in the problem just a solution :) then strip out anything that isn't a digit or decimal point.

String[] article = p.split(line);
Integer i = Integer.parseInt(article[0].replaceAll("[^0-9.]",""));

The regular expression isn't perfect (it would affect ...999.... for example) but it will do for you.

EDIT:

I did not read the question properly it seems. If it is only at the start of the file then it is very likely that what you have is a byte order mark, which is used to tell you if the file is unicode and also in UTF16/32 whether it is is little endian or big endian. You don't need tend to see it used very often.

http://unicode.org/faq/utf_bom.html#bom10

Saldana answered 4/1, 2011 at 22:33 Comment(2)

Yes, it works! :) But, why is it showing only digits '4661'? Where is the other "problematic" part of the string? – Ranunculaceous 4/1, 2011 at 22:49

No idea without seeing your file. It is likely your original file either has an odd encoding or is corrupt. – Saldana 5/1, 2011 at 4:35

I was going to add this as a comment but decided to include an image as well. It seems the problem is not that the file isn't UTF-8 but in fact the opposite is true - it seems it IS UTF-8 but it isn't being read correctly.

The image is from a hex editor looking at a UTF-8 file I created containing the first line. Note the 3 characters preceding 4661...

alt text

If I save the file in ANSI format, those characters aren't there.

Persuasion answered 4/1, 2011 at 23:9 Comment(0)

You can use Notepad++, open your text file, choose menu Encoding-->"Encoding in UTF-8 without BOM" and save with this option. The encoded bytes (EF BB BF) will be removed, so your code can parse string to integer without any problem.

Hope this help.

Checkrow answered 7/5, 2013 at 15:49 Comment(0)

I've converted the file to read into ascii format, and it was read correctly in a similar application.

Piero answered 20/8, 2013 at 21:17 Comment(0)

Recommended topics

Hot tags