.Net multipart/form-data form enctype and UTF-8 "special" characters => � (MVC w/ HttpPostedFileBase)
Asked Answered
B

2

7

Goal:

Upload / post CSV file w/ UTF-8 characters to an MVC action, read the the data and stick it in a database table.

Problem:

Only the plain text characters make it through. UTF-8 "special" characters like á are not coming through correctly, in code and in the database they render as this character => �.

More:

I'm convinced that this isn't a problem with my C# code although I've included the important parts below.

I thought the problem was that the uploaded file is encoded a plain text or "plain/text" MIME type, but I was able to change that by changing the file extension to .html

Summary:

How do you get a form with an enctype attribute set to "multipart/form-data" to correctly interpret UTF-8 characters in a posted file?

Research:

From my research this appears to be a common problem without a common and clear solution.

I've found more solutions for java and PHP than .Net as well.


  • csvFile variable is of type HttpPostedFileBase

  • this is the MVC action signature

[HttpPost]

public ActionResult LoadFromCsv(HttpPostedFileBase csvFile)


Things I've tried:

1)

using (Stream inputStream = csvFile.InputStream)
{
    byte[] bytes = ReadFully(inputStream);
    string bytesConverted = new UTF8Encoding().GetString(bytes);
}

2)

using (Stream inputStream = csvFile.InputStream)
{
    using (StreamReader readStream = new StreamReader(inputStream, Encoding.UTF8, true))
    {
        while (!readStream.EndOfStream)
        {
            string csvLine = readStream.ReadLine();
            // string csvLine = new UTF8Encoding().GetString(new UTF8Encoding().GetBytes(readStream.ReadLine())); // stupid... this can not be the way!
        }
    }
}

3)

<form method="post" enctype="multipart/form-data" accept-charset="UTF-8">

4)

<input type="file" id="csvFile" name="csvFile" accept="UTF-8" />

<input type="file" id="csvFile" name="csvFile" accept="text/html" />

5)

When the file has a .txt extension, the ContentType property of the HttpPostedFileBase is "text/plain"

When I change the file extension from .txt to .csv the ContentType property of the HttpPostedFileBase is "application/vnd.ms-excel"

When I change the file extension to .html, the ContentType property of the HttpPostedFileBase is "text/html" - I thought this was going to be a winner, but it wasn't.


In my soul I have to believe there is an easy solution to this problem. It surprises me that I haven't been able to figure this one out on my own, uploading UTF-8 characters in a file is a common task! Why am I failing here?!?!

Perhaps I have to adjust mime types in IIS for the website?

Perhaps I need different DOCTYPE / html tag / meta tags?


@Gabe -

Here is what my post looks like in fiddler. This is really interesting because the � is plain as day, right there in the post value.

http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost/AwesomeGeography/GeoBytesCities/LoadFromCsv?adsf
Content-Type: multipart/form-data; boundary=---------------------------199122566726299
Content-Length: 354

-----------------------------199122566726299
Content-Disposition: form-data; name="csvFile"; filename="cities_test.html"
Content-Type: text/html

"CityId","CountryID","RegionID","City","Latitude","Longitude","TimeZone","DmaId","Code"
3344,10,1063,"Luj�n de Cuyo","-33.05","-68.867","-03:00",0,"LDCU"
-----------------------------199122566726299--
Belding answered 3/6, 2012 at 16:34 Comment(8)
Are you using SQL Server database? Check its collation. You can learn more about it here.Kbp
#1 is what I would think would work. If it doesn't, I would check a network sniffer (or maybe Fiddler) to verify that the right bytes are making it up to the server.Avicenna
@Kbp - when I use the MS sql server import wizard the UTF-8 characters make it into the database, so it's not the database. The ? character is present in the C# values, so it's there before the db insert.Belding
@Avicenna - Here is what my post looks like in fiddler.Belding
(fiddler post data with � character added to question)Belding
I wonder if I have to mess with the server Accept-Encoding header or somehow alter the "Content-Disposition: form-data;" bit of the posted data / file.Belding
Well, if it's there in Fiddler, it doesn't sound like a server-side issue.Avicenna
Right. I'm amazed that this has been such a hard problem to solve. I can't upload a file with UTF-8 characters in .Net? No way!Belding
M
4

I have the same problem, you can use

StreamReader reader = new StreamReader(archivo_origen.InputStream, Encoding.GetEncoding("iso-8859-1"));

and it work, "iso-8859-1" is for latin derived language like, spanish, aleman, frances

Microspore answered 20/11, 2012 at 19:30 Comment(1)
I.e. it is not a UTF-8 encoded file as the OP expected.Greyson
S
3

Based on the information given, I would guess that the problem is with the file encoding itself - not with your code.

I ran a simple test to demonstrate this:

  1. I exported a simple csv file from Excel containing special characters.

  2. Then, I uploaded it through the following form and action method.

Form

<form method="post" action="@Url.Action("UploadFile", "Home")" enctype="multipart/form-data">
    <input type="file" id="file" name="file" />
    <input type="submit" />
</form>

Action method

[HttpPost]
public ActionResult UploadFile(HttpPostedFileBase file)
{
    using (StreamReader reader = new StreamReader(file.InputStream, System.Text.Encoding.UTF8))
    {
        string text = reader.ReadToEnd();
    }

    return RedirectToAction("Index");
}

I had the same problem as you in this case - the special characters were replaced with �.

I opened the file in Notepad and the special characters were displayed correctly there, so it seemed that it couldn't be a file problem, but when I opened the "Save As" dialog, the selected encoding was "ANSI". I switched it to UTF-8 and saved it, ran it through the uploader, and it all worked fine.

Sulfite answered 23/9, 2012 at 17:28 Comment(1)
You can also use Google Docs to transform the file to UTF-8: #4221676Greyson

© 2022 - 2024 — McMap. All rights reserved.