MySQL: trim *both* whitespace and newline characters
Asked Answered
D

5

5

I've got a text column whose content has a mix of both newline and whitespace characters in the front and rear of the string. I'm trying to write a SELECT statement which shows me the content without the leading and trailing junk.

The following query trims whitespaces:

SELECT TRIM(column)
FROM table;

While this one trims newlines:

SELECT TRIM('\n' FROM column)
FROM table;

I've also tried this answer but it doesn't work:

SELECT TRIM(BOTH '\t' OR '\n' FROM TRIM(BOTH '\n' OR '\t' FROM TRIM(column)))
FROM table;

Is there a way to remove the mix of both types of leading and trailing characters?

UPDATE: I can't REPLACE because I still want the whitespace/newline characters to be present when they occur inside the content.

I have some fields with a whitespace followed by a bunch of newlines which are subsequently followed by more whitespace/newlines. Hence the dilemma.

Durazzo answered 6/2, 2015 at 15:11 Comment(3)
Not sure why you bothered trying that answer when the comments indicate that it is broken.Mattison
So to be absolutely clear, your field looks like: (1) a preface of a bunch of white space and new lines, followed by (2) actual content which may have inside of it more white space and newlines, followed by (3) trailing white space and new lines? And you want to remove (1) and (3)>Cluck
That is exactly right.Durazzo
D
0

Ended up doing it in Ruby:

def process_text(text)
  if text
    text.force_encoding('UTF-8')
    .gsub(/((\.\s)|\.|\u2026){4,}/, '...') # shorten multiple dots (or ellipsis) to 3 dots
    .gsub(/((\-\s)|\-){4,}/, '---') # shorten multiple dashes to 3 dashes
    .gsub(/((\_\s)|\_){4,}/, '___') # shorten multiple underscores to 3 underscores
    .gsub(/\p{Sm}|\u2022|\u2023/, ' ') # remove bullets
    .gsub(/[^[[:alnum:]] | \p{Sc} | \p{Cc} | [[:punct:]]]/, ' ') # preserve accented letters and currency signs
    .gsub(/^(\v|\f|\n|\t|\s|\u2029){3,}/, "\n")
    .strip
  end
end
Durazzo answered 19/2, 2016 at 11:32 Comment(0)
R
5

Unfortunately TRIM(BOTH '\n' OR '\r' FROM myfield) is not working. So following code only trims one newline (\r\n) statement, and not multiple:

SELECT TRIM(TRIM(BOTH '\n' FROM(TRIM(BOTH '\r' FROM myfield)))) FROM mytable

Note that the statement above does not trim <space>\r\n mytext to mytext, because \r\n is not at the beginning.

Solution for MariaDB

If you are using MariaDB (https://mariadb.com/kb/en/mariadb/pcre/) you could use regular expression to solve your problem. Following expression removes all white space prefix from the text:

SELECT REGEXP_REPLACE(myfield, '^[\r\n ]*(.*)', '\\1') FROM mytable
Ridden answered 25/4, 2017 at 9:51 Comment(0)
D
2

An alternative way is to replace the '\n' with empty string, and then trim it. Here is the code:

SELECT TRIM(REPLACE(column, '\n','' )) from table;

Or, you can call trim twice if you like.

SELECT TRIM(TRIM('\n' FROM column)) from table;
Doubletime answered 6/2, 2015 at 16:9 Comment(1)
I can't REPLACE because I still want the whitespace/newline characters to be present when they occur inside the content. SELECT TRIM(TRIM('\n' FROM column)) from table; won't work because I have some fields with a whitespace followed by a bunch of newlines which are subsequently followed by more whitespace. Hence the dilemma.Durazzo
K
1

Below query will help to remove the carriage return(new line character) in mysql column data

Using the ASCII value of the new line character I'm able to replace it with nothing

select replace(col_name, CHAR(13),'' ) from table;
Keratosis answered 17/12, 2019 at 10:17 Comment(0)
D
0

Ended up doing it in Ruby:

def process_text(text)
  if text
    text.force_encoding('UTF-8')
    .gsub(/((\.\s)|\.|\u2026){4,}/, '...') # shorten multiple dots (or ellipsis) to 3 dots
    .gsub(/((\-\s)|\-){4,}/, '---') # shorten multiple dashes to 3 dashes
    .gsub(/((\_\s)|\_){4,}/, '___') # shorten multiple underscores to 3 underscores
    .gsub(/\p{Sm}|\u2022|\u2023/, ' ') # remove bullets
    .gsub(/[^[[:alnum:]] | \p{Sc} | \p{Cc} | [[:punct:]]]/, ' ') # preserve accented letters and currency signs
    .gsub(/^(\v|\f|\n|\t|\s|\u2029){3,}/, "\n")
    .strip
  end
end
Durazzo answered 19/2, 2016 at 11:32 Comment(0)
C
0

You can use REGEXP_REPLACE to trim both ends, this is what I use to trim newline characters \r and \n, indent character \t and spaces.

SELECT REGEXP_REPLACE(REGEXP_REPLACE(mycolumn, '^[\r\n\t ]+', ''), '[\r\n\t ]+$', '') FROM mytable
Cotter answered 12/4, 2023 at 8:28 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.