Text is something that we work with often in programming related problems. Most programming languages implement several methods and functions to help with manipulating text. In this article, I’ll dive into the fundamentals of working with text in Python, to give you a good understanding of how this works.
To start, we first need to discuss how String variables are treated by Python. It may appear that Strings are like integers in the sense that they store a single value. Strings are considered lists by Python, and most other programming languages. So, for example, if you define a String variable in Python as variable1 = “Hello”, Python separates each of the letters into a single entry in the list. In Python’s view, variable1 is actually equal to [“H”,”e”,”l”,”l”,”o”]. This means that we can treat Strings similarly to lists and apply the ideas of lists to Strings as well.
The first feature that comes from treating Strings at lists is the ability to access specific characters based on their location in the String. As an example, suppose you wanted to check if a string started with the letter a. To access the first letter of a string, you simply add an index to the front of it, like variable1. Doing this would give you the first character in a String.
If you aren’t sure of where a character exists in a string, you can search for it using the index method. For instance, variable1.index(“e”) will tell you where the letter e exists in the string, if it exists at all. It’s important to note that index will return the first instance of the letter you are searching for. In a similar manner, variable1.rfind(“e”) would return the rightmost instance of the letter e.
You can also iterate a String to check all the characters for a certain condition. Using a for loop is typically a good way to do this, as the functionality works the exact same as a normal list. There are a lot of uses to iterating a String, mostly relating to the limitations of the index and
Another useful thing you can do with Strings is create substrings. A substring is a String of text that is contained in another one. For example, the text “Hello World” has a substring “Hello”. “H” is also a substring, as is “He”,”el”, and any other combination of 1 or more letters in the string. You can create substrings by providing Python with a start and endpoint, using the syntax variable1[start:end]. Furthermore, you can even split a string into pieces using a delimiter. For instance, if you have the String “First,Second,Third” you can divide it into a list, where each entry is between commas, giving you [First,Second,Third]. You can do this using the split method. For instance, variable1.split(“,”). With these two methods together, you can easily partition text in any way you need to.
Partitioning text is useful for situations where the text is in a defined format. For example, we may have a product code that contains details in each section of it. Suppose we have a product code that has the region in the first three letters, the product code in the next 5, and the last 3 are the store code. We can easily parse these three pieces of information using substrings.
There are a few things to note about how we split the text. Notice that when we split [0:3], we are getting three characters, even through the distance from index 0 to index 3 is 4 characters. This is because we stop one before the end index, so we start at index 0 and end at index 2. The second thing to note is that [8:] will start at character 8 and get the rest of the characters afterwards. If we don’t specify a starting index, Python assumes 0, and if we don’t specify an end index, Python assumes the end of the string.
In terms of splitting strings, common examples would include reading CSV files or parsing URLs. These sorts of Strings are in a very consistent format, and often deliberately separate using a special character. Reading CSV files tends to be a very common operation, so let’s look at how we can read one using the split method. Suppose we have a CSV with two entries, formatted as name
Regarding the file read, we read all of the lines of the file into a variable called lines. We can then iterate the variable to get at each line individually. Regarding splitting the strings, we use the split method to split up the String based on the comma. Once this is done, we can access the entries individually, using their indexes in the list after being split.