How to Easily Count Characters When the Source Text Contains Mixed Writing Systems

This article offers a quick way to count characters in one writing system, given a text that initially contains multiple writing systems. Matches, repetitions, fuzzies are ignored for the purpose of this count. The final result is an orientative back of the envelope starting point for issuing a quotation or evaluating a purchase order.

Let us take the scenario where you need to rid your document of all letters in the Latin alphabet. The following works assuming your document can be read integrally by Microsoft Word. You will need to first save a copy of the document from which you want to remove characters(Latin) and numbers(Arabic System). So how do you get to a text containing one writing system only – when the initial text, you are provided looks like this:

Well in Microsoft Word it could not be easier. Activate Find and Replace using the CTRL+H shortcut then click on More>>and check the box for “Use wildcards“. In the Find What field type [a-zA-Z0-9] and leave the Replace with field blank. Click on Replace All and your file should now be void of Latin characters and Arabic numbers. This cleaned up file can now be used as a back of the envelope character count for the source file in Japanese/Chinese or any language that does not use the Latin writing system. Below is an example where you can see how to fully replace a fragment of Russian text with the letter f using a [а-яА-Я0-9] regex expression.

Regex expressions are useful in many scenarios. Further examples on how regex expressions can be used directly in SDL TRADOS Studio can be found on Multifarious:

Regular Expressions – Part 1

Regex… and “economy of accuracy” (Regular Expressions – Part 2)

Search and replace with Regex in Studio – Regular Expressions Part 3

DOGS and CATS… Regular Expressions Part 4!

A competitive edge…