Convert text in PascalCase
Asked Answered
V

5

8

Is it possible to convert text from a table column in SQL Server to PascalCase only using a proper SQL code?

TABLE DEFINITION
----------------------
ID  int
CITTA   varchar(50)
PROV    varchar(50)
CAP varchar(50)
COD varchar(50)

The field that contains text to convert is CITTA. It contains all uppercase values like "ABANO TERME", "ROMA", and so on. The words are delimited by a space.

EDIT

I forgot to mention that some words have an accent character in it '. This character can be found either at the end of the word or in the middle.

EDIT 2:

Some quirks found on results:

  • If I have a name like "ISOLA BALBA" this name get translated to "IsolaBalba" (correct case but missed space)
  • If I have a name like "ISOLA D'ASTI" this get converted to "IsolaD'asti" (missed space as before and incorrect case. In this case the correct result is "Isola D'Asti"

could you please give me some advice on this small problem?

Vibratory answered 8/12, 2010 at 0:14 Comment(11)
Probably. Not sure SQL is the best environment to do this in but if you give example input and output I'll have a go! Are words currently delimited with a space? i.e. Does it need to convert pascal case to PascalCase?Laconia
@Martin: thanks for your quick answer. have a look at my question edit pleaseVibratory
@Vibratory - do UDFs count as proper SQL code for your purposes?Vaginitis
RE: Your comment on Accents what is the expected input -> output for these then?Laconia
@Martin: the character shall remain unchanged in the same positionVibratory
That doesn't change anything then does it? I think the original rules would still work?Laconia
@Martin: I am trying just now. I'll let you knowVibratory
@Vibratory - typically Pascal Case is used for identifiers in code - which is why both answers have removed spaces (spaces are usually not legal in identifiers). Hadn't appreciated that spaces were required to be kept in.Vaginitis
@Vibratory - See edit. I've also replaced the CTE with an indexed #temp table in the hope this will be quicker.Laconia
@Will A: You're right. These are not identifiers anyway. It's a list of town :)Vibratory
@Martin: thanks for the update. Di you mean quicker? It now takes just 5 seconds!!!! :OVibratory
L
5
DECLARE @T TABLE
(
ID  INT PRIMARY KEY,
CITTA   VARCHAR(50)
)
INSERT INTO @T
SELECT 1, 'ABANO TERME' UNION ALL SELECT 2, 'ROMA' UNION ALL SELECT 3, 'ISOLA D''ASTI';

IF OBJECT_ID('tempdb..#HolderTable') IS NOT NULL
    DROP TABLE #HolderTable

CREATE TABLE #HolderTable
(
Idx INT IDENTITY(1,1) PRIMARY KEY,
ID INT,
Word  VARCHAR(50)
)

CREATE NONCLUSTERED INDEX ix ON #HolderTable(ID)
;

WITH T1 AS
(
SELECT ID, CAST(N'<root><r>' + REPLACE(REPLACE(CITTA, '''', '''</r><r>'), ' ', ' </r><r>') + '</r></root>' AS XML) AS xl
FROM @T
)
INSERT INTO #HolderTable
SELECT ID, 
       r.value('.','NVARCHAR(MAX)') AS Item
 FROM T1
 CROSS APPLY
xl.nodes('//root/r') AS RECORDS(r)

SELECT 
      ID, 
      (SELECT STUFF(LOWER(Word),1,1,UPPER(LEFT(Word,1))) FROM #HolderTable WHERE [@T].ID =  #HolderTable.ID ORDER BY Idx FOR XML PATH('') )
FROM @T [@T]
Laconia answered 8/12, 2010 at 0:43 Comment(5)
Neat stuff - watch out for issues with characters that require XML escaping, tho', 'HAM & EGGS' isn't treated well by the above.Vaginitis
The query is working very good even if it's a bit slow: 3 minutes for 400 records. And I have about 10k to convert :) No the data have no characters that need to be escaped. Many thanks!!! :)Vibratory
That is indeed pretty slow. Is some of the text quite long? I guess this is just a one off task anyway?Laconia
Yes it is. I dont mind for the time needed. The longest text is 34 chars.Vibratory
i have found some small quirks after a deep analysys of the results. Could you please have a look at my latest edit on the question? BTW the query took 88 minutes to completely execute. Thanks!Vibratory
W
7

I encourage you to try the code I posted in a blog a while ago. I suspect it will accommodate your requirements very well, and also perform better than many of the other methods.

SQL Server Proper Case Function

CREATE FUNCTION dbo.Proper(@DATA VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
  DECLARE @Position INT

  SELECT @DATA = STUFF(LOWER(@DATA), 1, 1, UPPER(LEFT(@DATA, 1))),
         @Position = PATINDEX('%[^a-zA-Z][a-z]%', @DATA COLLATE Latin1_General_Bin)

  WHILE @Position > 0
    SELECT @DATA = STUFF(@DATA, @Position, 2, UPPER(SUBSTRING(@DATA, @Position, 2))),
           @Position = PATINDEX('%[^a-zA-Z][a-z]%', @DATA COLLATE Latin1_General_Bin)

  RETURN @DATA
END

This function is a bit faster than most because it only loops once for each word that requires a capital letter.

Wine answered 8/12, 2010 at 14:1 Comment(3)
@G Maastros: thanks for your help! I have tried the function and it performs very well even if I am not sure that handle correctly the accent character '. Anyway this task was a one off task and the latest edit by @Martin works very fast (only 5 seconds to do the job). RegardsVibratory
The question asks for PascalCase, not Proper Case. They are not the same thing.Hanrahan
@Hanrahan Please note that it says PascalCase in the title, but further down, in the body of the question, it says "The words are delimited by a space."Wine
L
5
DECLARE @T TABLE
(
ID  INT PRIMARY KEY,
CITTA   VARCHAR(50)
)
INSERT INTO @T
SELECT 1, 'ABANO TERME' UNION ALL SELECT 2, 'ROMA' UNION ALL SELECT 3, 'ISOLA D''ASTI';

IF OBJECT_ID('tempdb..#HolderTable') IS NOT NULL
    DROP TABLE #HolderTable

CREATE TABLE #HolderTable
(
Idx INT IDENTITY(1,1) PRIMARY KEY,
ID INT,
Word  VARCHAR(50)
)

CREATE NONCLUSTERED INDEX ix ON #HolderTable(ID)
;

WITH T1 AS
(
SELECT ID, CAST(N'<root><r>' + REPLACE(REPLACE(CITTA, '''', '''</r><r>'), ' ', ' </r><r>') + '</r></root>' AS XML) AS xl
FROM @T
)
INSERT INTO #HolderTable
SELECT ID, 
       r.value('.','NVARCHAR(MAX)') AS Item
 FROM T1
 CROSS APPLY
xl.nodes('//root/r') AS RECORDS(r)

SELECT 
      ID, 
      (SELECT STUFF(LOWER(Word),1,1,UPPER(LEFT(Word,1))) FROM #HolderTable WHERE [@T].ID =  #HolderTable.ID ORDER BY Idx FOR XML PATH('') )
FROM @T [@T]
Laconia answered 8/12, 2010 at 0:43 Comment(5)
Neat stuff - watch out for issues with characters that require XML escaping, tho', 'HAM & EGGS' isn't treated well by the above.Vaginitis
The query is working very good even if it's a bit slow: 3 minutes for 400 records. And I have about 10k to convert :) No the data have no characters that need to be escaped. Many thanks!!! :)Vibratory
That is indeed pretty slow. Is some of the text quite long? I guess this is just a one off task anyway?Laconia
Yes it is. I dont mind for the time needed. The longest text is 34 chars.Vibratory
i have found some small quirks after a deep analysys of the results. Could you please have a look at my latest edit on the question? BTW the query took 88 minutes to completely execute. Thanks!Vibratory
V
3

Try the below function (adjust string type as appropriate). Just don't use this in a WHERE clause - and consider the performance ramifications elsewhere. The 12345678 is just some arbitrarily large value that you might want to replace with something more appropriate!

CREATE FUNCTION dbo.ufn_PascalCase(@str AS VARCHAR(MAX)) RETURNS VARCHAR(MAX)
BEGIN
    SET @str = LOWER(@str)

    DECLARE @result VARCHAR(MAX) = ''

    DECLARE @spaceIndex INTEGER = CHARINDEX(' ', @str)
    WHILE @spaceIndex > 0
    BEGIN
        SET @result += UPPER(SUBSTRING(@str, 1, 1)) + SUBSTRING(@str, 2, @spaceIndex - 2)
        SET @str = SUBSTRING(@str, @spaceIndex + 1, 12345678)
        SET @spaceIndex = CHARINDEX(' ', @str)
    END

    SET @result += UPPER(SUBSTRING(@str, 1, 1)) + SUBSTRING(@str, 2, 12345678)

    RETURN @result
END
Vaginitis answered 8/12, 2010 at 0:46 Comment(5)
@Thanks, I am going to try it! Did you see my last edit on question?Vibratory
@Vibratory - my pleasure. This is the simple approach (and doesn't cope with ' ' as input - oops!). It's probably not brilliant in terms of execution time. Martin's approach may well work better - just watch out for any potential problems with characters that need XML escaping.Vaginitis
@Loernzo - wouldn't expect accents to pose a problem - this is just splitting on spaces and UPPERing the first character after each space - should work fine w/ accents.Vaginitis
Thanks a lot for your contribution. As you expected is posing problems with accents. Luckily, the other answer is working good even if it's a bit slow... please see my latest comment on Martin answerVibratory
This doesn't work for underscores, or for multiple spaces in a row.Hanrahan
H
1

As of SQL 2017, you can do this pretty elegantly. Unlike some of the other answers, this gives PascalCase, as asked for, rather than Proper Case. It also works for underscores or spaces (and multiple ones in a row).

CREATE OR ALTER FUNCTION dbo.fnPascalCase(@word NVARCHAR(MAX)) RETURNS NVARCHAR(MAX)
BEGIN
    DECLARE @t1 TABLE (Val NVARCHAR(MAX));
    INSERT INTO @t1 (Val)
    SELECT Val = UPPER(SUBSTRING(LTRIM(RTRIM(value)), 1, 1)) + LOWER(SUBSTRING(LTRIM(RTRIM(value)), 2, 2000000000)) FROM STRING_SPLIT(REPLACE(@word, ' ', '_'), '_')

    DECLARE @result NVARCHAR(MAX);

    SELECT @result = STRING_AGG(Val, '') FROM @t1
    RETURN @result
END

Usage:

SELECT dbo.fnPascalCase('taco___tuesday is   today') --Returns 'TacoTuesdayIsToday`
SELECT dbo.fnPascalCase('HELLO MY fRiEnD') --Returns 'HelloMyFriend`
Hanrahan answered 19/3, 2020 at 20:57 Comment(0)
I
0

I've modied @George Mastro's response to also work with Common Accented Characters

CREATE FUNCTION dbo.PascalCase(@DATA VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
  DECLARE @Position INT

  SELECT @DATA = STUFF(LOWER(@DATA), 1, 1, UPPER(LEFT(@DATA, 1))),
         @Position = PATINDEX('%[^a-zA-ZÂÀÇÈÉËÊÎÏÔÛÙÜâàçèéëêîïôûùü][a-zâàçèéëêîïôûùü]%', @DATA COLLATE Latin1_General_Bin)

  WHILE @Position > 0
    SELECT @DATA = STUFF(@DATA, @Position, 2, UPPER(SUBSTRING(@DATA, @Position, 2))),
           @Position = PATINDEX('%[^a-zA-ZÂÀÇÈÉËÊÎÏÔÛÙÜâàçèéëêîïôûùü][a-zâàçèéëêîïôûùü]%', @DATA COLLATE Latin1_General_Bin)

  RETURN @DATA
END
Inhospitality answered 21/11, 2023 at 21:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.