How to clean text

hope, you, are, having, a, great, day

These are some ways I have cleaned and formatted text when coding

Remove “curly” or ‘smart’ quotes

A situation that can happen when copying and pasting a list of items from a text document or PDF file into in a programming langauge is that it may contain curly/smart quotes. Software like Microsoft Word and Powerpoint can use smart quotes ”“ ‘’ which leads to these quotes appearing in text. Most programming langauges use straight quotes for making strings "hi". I am not aware of a popular programming langauge that uses curly or smart quotes for strings. This is why it's necessary to fix this formatting issue and convert curly quotes into straight quotes when writing code.

Example : “a”, “b”, “c”, “d”, “e”
num_list = [“a”, “b”, “c”, “d”, “e”]

If I tried to do make this list in python and compile my code, I would get this error.
SyntaxError: invalid character '“' (U+201C)

You can clean this text by using the "special character fixes" tool on this website cleanertext.com and applying it to your text. You simply check off the boxes and then the text with straight quotes is shown.

num_list = ["a", "b", "c", "d", "e"]
python would appreciate parsing that list

Remove duplicate spaces

You may encounter text that has extra spaces or unnecessary line breaks making it difficult to read.

hope, you, are, having, a, great, day

There aren't many out of the box software or solutions for the average person to clean text with extra spaces. One way to remove extra spaces is by using regular expressions like \s+ to find spaces , tabs, linebreaks, and other space characters. Then replacing them with a single space character. Some text editors support regular expressions but for most people regexs may either be confusing or difficult to understand.

There is an easy to use space cleaner tool on cleanertext.com where you simply check off "remove duplicate spaces" and then press clean text and voila your text no longer has unnecessary spaces.

Fix math symbols

If you are copying a basic math formula from a text document or pdf file they may use special characters that aren't useful when writing code for example.

6×3 = 18 ↓ 6*3 = 18 6÷3 = 2↓ 6/2 = 2

The symbol × is what most people know to be multiplication symbol but for programmers * is used instead when doing math. And for division / is used instead of ÷ in math equations. I have personally ran into this situation many times when copying formulas to use in my code and then needing to change the character to the correct math operator/symbol.

Under the "special character fixes" section on cleanertext.com check off " × → * " and " ÷ → / " then press clean and your text will be cleaned! You can also use text editors to clean this text but it requires copying and pasting the multiplication and division symbols and then specifying which characters you want to find and replace. You can also hand delete each occurence of these symbols and replace it with its corresponding math operator. With my tool you simply check off what characters you want to change and then apply the changes.