Find and Remove Repeated Words Using GREP
How often do we come across repeated (duplicate) words in a document? We may inadvertently type these repeated words ourselves in InDesign or bring in a document containing these repeated words. Is there a way to get rid of these repeated words in InDesign correctly? Let us look at an example and a couple of alternatives to find the repeated words.
Here is an example. The valid repeated words have been underlined with Red. The words or parts of words that should not be considered repeated are underlined with Green.
One possible alternative is to use the Dynamic Spelling feature. And Dynamic Spelling does help to a certain extent in highlighting the repeated words. It shows an underline for the repeated words in the chosen color.
But the limitation of Dynamic Spelling is that it highlights only those words that are in the Dictionary of the Language that is chosen. If words are not in the dictionary, those words do not get underlined and Check Spelling does not help us in removing them. Even when the repeated words are found, it is a long and tedious process to find each instance of repeated word in the document and change it.
Let us see if we can do this using GREP Find Change:
We may start writing the GREP code like this:
Find what: \b(\w+)\b \1
Change to: $1
When we run this query, we see that it does help us in finding the valid repeated words. And we may be tempted to go with this query. But on a slightly deeper analysis, we find that it also finds words which, in fact, are not repeated. Also, it fails to find some repeated words. In the first sentence of the example shown above, the query finds valid repeated words ‘the’ (underlined with red) but in the second sentence, it also incorrectly flags the word ‘the’ and ‘the’ in thesis (underlined with green). Also, we see that in the last sentence, the query fails to find repeated words on two separate lines.
So let us refine our GREP Find query to:
Find what: \b(\w+)\s+\1\b
Change to: $1
Here’s what it means:
Look for a word boundary…
followed by any word character one or more times…
followed by any white space one or more times…
followed by the back reference to the previously found result…
followed by a word boundary.
The last word boundary is important to include as it ensures that we find only complete words that are repeated and not just parts of subsequent words.
This Find Change query successfully finds all the correct instances of repeated words only. We can now confidently click the Change All to remove all repeated words in the document.
Hope you find this GREP Find Change query useful in your workflows. Do try it out and suggest any further tweaks to make it even better.