Is it possible to remove white spaces from the CSV files header name in NiFi?
Asked Answered
M

3

7

I have a CSV file in which some column name have white spaces in it and some column names are without the white space between characters. I want to remove the white spaces from all the header names that has white space in it. Please help. Thank you!

Attaching screenshot for reference. Example: 'First Name' I want 'FirstName'

I am using ReplaceText processor in which under Search value I have passes \s to search just the header row white spaces and replacement value as Empty string. Also my evaluation mode is 'Line-by-Line'. so now the ouput file is showing as FirstName,LastNameshraddha,srivastavsanstuti,srivastav So it's showing everything in one line. I want white spaces only to be removed from header row and do not touch/merge the data values at all.Thanks

Please tell me how to remove them. Thank you!

enter image description here

Magnific answered 20/9, 2018 at 15:25 Comment(0)
C
3

@Shu is on the right direction, but the problem is that all spaces in the text will be replaced. In order to replace spaces ONLY in header row, in Shu's solution change:

  1. Search Value:

(?s)(^[^\n]*)(.*$)

  1. Replacement Value:

${'$1':replace(" ","")}$2

Chericheria answered 22/9, 2018 at 1:18 Comment(0)
D
3

Try with below ReplaceText configs:

Search Value

(.*)

Replacement Value

${'$1':replace(" ","")} //we are applying NiFi expression language replace function on the captured group.

Character Set

UTF-8

Maximum Buffer Size

1 MB

Replacement Strategy

Regex Replace

Evaluation Mode

Entire text //works with Line-By-Line mode also

enter image description here

Refer to this link for more details regards to NiFi expression language.

Input flowfile:

First Name,Last Name
shraddha,srivastav
sanstuti,srivastav

Output flowfile:

FirstName,LastName
shraddha,srivastav
sanstuti,srivastav
Donoghue answered 20/9, 2018 at 19:44 Comment(0)
C
3

@Shu is on the right direction, but the problem is that all spaces in the text will be replaced. In order to replace spaces ONLY in header row, in Shu's solution change:

  1. Search Value:

(?s)(^[^\n]*)(.*$)

  1. Replacement Value:

${'$1':replace(" ","")}$2

Chericheria answered 22/9, 2018 at 1:18 Comment(0)
P
1

You should be able to do the following match:

/[\s]/g

It matches all white Space. Then simply replace with an empty string. Edit:

Try this instead:

/(?<!\n.+)[ ]/g

It will match a Space only if there's no Newline before it. In other Words, only the first line will match. Again, replace with an empty string.

Pettifogger answered 20/9, 2018 at 16:27 Comment(4)
Thank you for replying. I tried your Regex in search value and set the replacement value as empty string but it didnt ermove the white space in the output file. Any reason why?Magnific
May be a stupid question: Do you save the file, after you have replaced?Pettifogger
So my flow is GetFile->ReplaceText->PutFile. So after replacement the file is there in the putfile directory. Can you tell me what I am doing wrong? My replaceText has Evaluation Mode as 'Line-by-Line'. Please help!Magnific
I tried passing \s in search value and replacement value as Empty string but now the ouput file is showing as FirstName,LastNameshraddha,srivastavsanstuti,srivastav So it's showing everything in one line. Please helpMagnific

© 2022 - 2024 — McMap. All rights reserved.