Why does Mercurial think my SQL files are binary?
Asked Answered
A

7

49

I just scripted out my SQL Server stored procs, table definitions, etc using SQL Server Management Studio, and tried to add them to my Mercurial source control repository. They got added just fine, but now when I change and diff them, Mercurial calls them "binary files" and doesn't give me a proper unified diff.

I thought the encoding might be a problem, so I tried regenerating the scripts and specifying ANSI for the text file output, but I get the same behavior. I can view them just fine in notepad without any odd-looking characters showing up. Why does Mercurial think these files are binary?

Otherwise, if someone can recommend a good tool for scripting out a SQL Server database that might not cause this issue, that might work, too.

Anglo answered 2/3, 2010 at 20:26 Comment(0)
T
40

I've run into this problem because SQL Server Management Studio saves the files as Unicode. The first two bytes (most of the time) of a Unicode text file define the encoding. Most newer text editors (e.g. Notepad) handle this transparently.

The first two bytes are probably where your problem is. They may look like ÿþ. Or FF FE in hex.

On the "Save" button on the Save dialog is a pick list. Choose "Save with Encoding..." and select "US-ASCII-Codepage20127". I believe this setting is sticky and will remain for future saves.

Townsville answered 2/3, 2010 at 21:49 Comment(4)
To be clear, it's not Unicode that's the issue. It's UTF-16, which has embedded nulls. UTF-8 does not, unless you actually use U+0000 (which a SQL file generally would not).Cantillate
It is good to know why hg thinks it is binary, but it will be better to find a fix for mercurial to force it to change its mind. Re-saving all scripts is ugly workaround. The problem is in mercurial, not in the files.Infinity
The answer worked for me, but I used "Unicode (UTF-8 without signature) - Codepage 65001" instead of ASCIICeramal
And this is NOT a sticky setting, at least not in SSMS 2012. It is, however, an enormous pain in the ass.Incommunicado
C
4

According to the docs, it's considered binary iff there are null bytes in the file. SQL files shouldn't have null bytes, so I would check that first (try looking in a hex editor). I assume you do know you can force diff to treat it as text

Cantillate answered 2/3, 2010 at 20:36 Comment(0)
M
3

Andrew is right; it's a NUL byte somewhere (my guess would be a Byte Order Mark at the start inserted by a rude editor tool). Don't worry about it though, unlike SVN or CVS Mercurial doesn't handle binary vs. text differently at all. It displays them different when you do 'hg log', but they're not handled at all differently.

Upcoming mercurial releases special case BOMs and don't let them trigger the "user probably doesn't want to see a diff of this on console" behavior.

Mamey answered 2/3, 2010 at 20:52 Comment(1)
We actually came to the conclusion that we cannot handle UTF-16 or UTF-32 in a consistent way that will work under Windows. Please see: mercurial.markmail.org/thread/lsoj7dj47mx6xoyx The patch format just cannot handle non-ASCII characters :-/ Suggestions welcome (on the mailinglist, please).Steady
U
2

I ran into this when editing a file of stored procedures from SQL Server on linux and using git. Git thought it was a binary file because the file from SQL Server was UTF-16, and therefore contained NULs. My fix for this was emacs, which lets you change the encoding to UTF-8.

Underpinning answered 2/3, 2010 at 21:34 Comment(0)
A
0

I know it's a bit late, but I came up with a script to batch save the *.sql files into UTF-8.

Full answer is posted in another thread on StackOverflow, so I'll just post the link here - https://mcmap.net/q/356814/-files-with-sql-extension-identified-as-binary-in-mercurial-duplicate.

Agist answered 16/3, 2012 at 19:30 Comment(0)
C
0

I had a similar problem and decided to use a tool found at http://www.devio.at/index.php/smoscript to help me solve the problem. I scripted SMOscript by placing the following in a cmd file.

rd /s /q [the scripts folder]
"C:\Program Files\devio IT Services\SMOscript\smoscript.exe" -s [server] -d [database] -F [the scripts folder] -U

The idea is to remove the old folder so that any objects deleted from the database will be deleted from source control. This also saves files as UTF8 without any date/time stamping, so they work great in version control.

Calculous answered 22/6, 2012 at 17:26 Comment(0)
H
0

An alternative for SQL Server Management Studio is to change the default SQL template file to UTF-8 (or whatever encoding you want), which will affect all future saves through SSMS.

  • Open in Notepad (as administrator): C:\Program Files (x86)\<ssms-version>\Common7\IDE\SqlWorkbenchProjectItems\Sql\SQLFile.sql
  • File > Save As
  • Change "Encoding" to UTF-8 or similar
  • Overwrite the original file

Credit goes to https://joehanna.com/sql-server/changing-the-default-encoding-of-sql-files-in-ssms/

Humphreys answered 24/3, 2020 at 14:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.