Removing all fraction symbols like “¼” and “½” from a string

Asked 12/4, 2017 at 2:41 Answered 12/4, 2017 at 2:55

I need to modify strings similar to "¼ cups of sugar" to "cups of sugar", meaning replacing all fraction symbols with "".

I have referred to this post and managed to remove ¼ using this line:

itemName = itemName.replaceAll("\u00BC", "");

but how do I replace every possible fraction symbol out there?

Diarrhea answered 12/4, 2017 at 2:41 Comment(8)

what about removing all non alphanumeric character except space: using: itemName.replaceAll("[^A-Za-z0-9 ]", ""); – Hoover 12/4, 2017 at 2:53

Java is not Android – Archaeopteryx 12/4, 2017 at 2:57

@Archaeopteryx got it. tag removed. – Diarrhea 12/4, 2017 at 3:3

Perhaps I spend too long on cooking.se but I wonder why you're doing this (as opposed to replacing "¼ cups of sugar" with " 1/4 cups of sugar"). – Britishism 12/4, 2017 at 10:35

May I ask why you would want to completely remove things that will change the semantic meaning of the string? I'm curious. – Carden 12/4, 2017 at 13:45

@ChrisH and Matti - I'm building an app for recipes and shopping lists - and I'm using an API which returns a JSON with ingredients combined with their quantity needed. I am still keeping the original string, but giving the user an option to see items grouped by their 'clean names' (so they only see one item) instead of seeing 5 rows of different quantities of garlic. Did I explain that right? Sorry, I'm a total novice. – Diarrhea 12/4, 2017 at 23:12

@Diarrhea that sounds reasonable if tricky to get just right (I could imagine a recipe calling for "1 cup of sugar" as well as "sugar (for dusting)" so the grouping could be a challenge. Good luck – Britishism 13/4, 2017 at 5:54

If it's for a cooking app I'd suggest just hard coding the replacements for a limited number of fractions, maybe 1/2 to 1/10. I've never seen a recipe which called for 1/1076... – Voe 19/4, 2017 at 18:16

Fraction symbols like ¼ and ½ belong to Unicode Category Number, Other [No]. If you are ok with eliminating all 676 characters in that group, you can use the following regular expression:

itemName = itemName.replaceAll("\\p{No}+", "");

If not, you can always list them explicitly:

// As characters (requires UTF-8 source file encoding)
itemName = itemName.replaceAll("[¼½¾⅐⅑⅒⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞↉]+", "");

// As ranges using unicode escapes
itemName = itemName.replaceAll("[\u00BC-\u00BE\u2150-\u215E\u2189]+", "");

Davila answered 12/4, 2017 at 2:49 Comment(7)

Note that fonts may render any sequence like 23/12 as fractions, thus enabling any fraction to be shown like that, not just the pre-composed ones. If that happens you may need to remove a lot more than just a list of characters. – Counterstamp 12/4, 2017 at 6:16

Why the + in the regex'es ? Can't you just simply leave it out or does it do anything for efficiency ? – Dutton 12/4, 2017 at 10:1

@Dutton In this case the + operator causes the character set ([...]) to repeat multiple times. See this answer for more details: https://mcmap.net/q/303176/-what-is-the-meaning-of-in-a-regex – Bonnibelle 12/4, 2017 at 11:39

@Dutton yes they aren't necessary, and yes they should improve efficiency. One should probably not draw conclusions from it, but if you add a + at the end of the expression in this regex101 sample execution time will go down from 1 to 0ms and the number of steps will fall from 32 to 14. On an input without any repeats it only adds one step – Anticatalyst 12/4, 2017 at 12:24

@Anticatalyst I would refute that conclusion with regex101.com/r/9Md35x/1, the change seems marginal and I would attribute it to the javascript implementation potentially and maybe flow prediction – Dutton 12/4, 2017 at 12:30

@Dutton heh? Testing it on my side, it seems to behave marginally better with +, going down from 148305 steps to 139377 and from ~375ms to ~350ms. Thanks for taking the time to make a good data set in any case ! You're right that it probably depends on regex engines specifics – Anticatalyst 12/4, 2017 at 12:31

@I tested it with a larger sample and it's a 3% increase, but I would expect it to be dependant on the language and the code. Javascript is a slow scripting language so the prediction that another one might come aswell, could boost it to a larger % than for c or java. Would be interesting to test though. – Dutton 12/4, 2017 at 12:36

You can use below regex to replace all fraction with empty string.

str = str.replaceAll("(([\\xbc-\\xbe])?)", "")

Dissonancy answered 12/4, 2017 at 2:55 Comment(2)

Why the additional capturing groups () and the optional ? match? – Regality 12/4, 2017 at 8:48

You know, just in case, you wanted to replace "" with "" – Dutton 12/4, 2017 at 10:0

Recommended topics

Hot tags