How to find, and standardize the spelling of chapter and page references (Adobe InDesign)?
Asked Answered
C

2

7

I want to find digits followed by "f", "ff", "f." or "ff." to standardize the spelling following given conventions/rules.

I already tried some regular expressions, but unfortunately I did not find an universal expression grabbing all of the cases above (f, ff, f., ff.).

In spoken words it seems easy:

  • find digits
  • followed by an optional whitespace
  • then followed by f, ff, f. or ff.
  • only whitespaces or NOT word boundaries are allowed before and after the expression

The beginning of the regex is quite easy, but I can’t figure out how to handle the different "f"-cases and the NOT boundaries following.


My best guess yet is:

(?<=\b)(\d+(\h|\b)?f{1,2})\.?

but then still the stings followed by a word character are found.


When I extend the regex to:

(?<=\b)(\d+(\h|\b)?f{1,2})\.?(\W)

the numbered of "false funds" are decreasing, but still it is not the solution


I prepared lines for testing. The lines containing a plus "+" should be found, at the same time the ones with a minus "-" should not be found.

00f aaa +
00f. aaa +
00ff aaa +
00ff. aaa +
00 f aaa + 
00 f. aaa +
00 ff aaa +
00 ff. aaa +
+ aaa 00f aaa +
+ aaa 00f. aaa +
+ aaa 00ff aaa +
+ aaa 00ff. aaa +
+ aaa 00 f aaa + 
+ aaa 00 f. aaa +
+ aaa 00 ff aaa +
+ aaa 00 ff. aaa +
+ aaa 00f
+ aaa 00f.
+ aaa 00ff
+ aaa 00ff.
+ aaa 00 f 
+ aaa 00 f.
+ aaa 00 ff
+ aaa 00 ff.

00 faaa -
00 f.aaa -
00 ffaaa -
00 ff.aaa -
00af aaa - 
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa 00 faaa -
- aaa 00 f.aaa -
- aaa 00 ffaaa -
- aaa 00 ff.aaa -
- aaa 00af aaa - 
- aaa 00af. aaa -
- aaa 00aff aaa -
- aaa 00aff. aaa -
- aaa00f
- aaa00f.
- aaa00ff
- aaa00ff.
- aaa 00af 
- aaa 00af.
- aaa 00aff
- aaa 00aff.

00faaa -
00f.aaa -
00ffaaa -
00ff.aaa -
00af aaa - 
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa00 faaa -
- aaa00 f.aaa -
- aaa00 ffaaa -
- aaa00 ff.aaa -
- aaa00af aaa - 
- aaa00af. aaa -
- aaa00aff aaa -
- aaa00aff. aaa -
- aaa00af 
- aaa00af.
- aaa00aff
- aaa00aff.

Further, the aim is to group the digits anf "f"-cases in a manner, so that they can be uses in a replacement-expression to standardize the spelling to one of those cases:

  • 123 ff. (with whitespace, with dot)
  • 123 ff (with whitespace, without dot)
  • 123ff. (without whitespace, with dot)
  • 123ff (without whitespace, without dot)
Crazy answered 20/8, 2019 at 21:1 Comment(1)
would be helpful if you add language you are working with to your tagsKnavish
H
6

I suggest

\b(\d+)(\s?)(f{1,2})(?:(\.)\B|\b(?!\.))

See the regex demo

Details

  • \b - word boundary
  • (\d+) - Group 1: 1+ digits
  • (\s?) - Group 2: an optional whitespace
  • (f{1,2}) - Group 3: 1 or 2 fs
  • (?:(\.)\B|\b(?!\.)) - either of the two:
    • (\.)\B - a . captured in Group 4 if not followed with a word char
    • | - or
    • \b(?!\.) - a word boundary not followed with a dot.

Then, replacing is easy with:

  • 123 ff.: $1 $3.
  • 123 ff : $1 $3
  • 123ff. : $1$3.
  • 123ff : $1$3

If the whitespace and dot are not necessary in replacement patterns, remove the groupings and adjust the IDs in the replacement backreferences.

Her answered 20/8, 2019 at 21:16 Comment(5)
Hi, whilst looking at my own solution and checking yours, e.g. 00 f.!aaa aaa + would still match, but this doesn't adhere to requirements, since only whitespace/boundry at the end is allowed...?Knavish
@Knavish No idea why you think so. ff. is followed with a non-word boundary and it does meet the requirements.Sybille
@WiktorStribiżew - Interestingly, running your regexp via Adobe InDesign's Find/Change feature yields the exact same matches as your regexp with the flavor set to ECMAScript, i.e. several wanted matches are not found.Alejandroalejo
@Alejandroalejo I wrote: *(\h?) - Group 2: an optional horizontal whitespace (use \s if any whitespace is allowed here). My \b(\d+)(\s?)(f{1,2})(?:(\.)\B|\b(?!\.)) regex works. Edited to get rid of \h.Sybille
@WiktorStribiżew - Yes using \s instead does rectify the issue.Alejandroalejo
K
4

What about something like this?

\b\d+\s?(?:ff|f)+\.?(?=\s)

enter image description here

Regex Demo

\b          start with word boundary
\d+         match all digits
\s?         match optional whitespace
(?:ff|f)+   non-capturing group, match either ff or f
\.?         match optional dot (basically checking for ff. or ff or f. or f)
(?=\s)      match if followed by whitespace, without making the whitespace part of the match

With groups, same expression looks like:

\b(\d+)\s?((?:ff|f)+\.?)(?=\s)

enter image description here

Regex Demo

Replacement can be achieved via different combinations of the $1 and $2 groups.

Knavish answered 20/8, 2019 at 21:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.