Python is a versatile programming language and it has a rich library. In the visualization series we introduced you to different libraries used for data visualization purposes. Now, we introduce you to the Regex library in Python for handling textual data.
In Python to perform pattern recognition on textual data Regex is a library that provides a range of methods which when used with right pattern gives us the desired results. For example, if you want to change the spelling of colour to color in your text you can easily do so with the help of a given method provided that you form the pattern correctly.
Type of textual data in Regex
Literals:- In Python literals are the characters or words with their original meaning intact like the word dog means a literal dog and there is no hidden meaning behind that word.
Meta-characters:- These are the words or characters which hold special meaning for example \n means a new line or \t means tab separated values.
Given below are few of the meta-characters used in python with their meanings:-
\d – Matches a digit .i.e. \d= 1 ,\d\d= 23, \d\d\d = 345
\w – Matches alpha-numeric characters i.e. \w= 1, \w= a, \w\w= a1
\W– Matches special characters i.e. \W= %
Dog[ogn]– Matches a single character within the square bracketsi.e. Dogo, Dogg, Dogn
Dog(ogn) – Matches the entire string within the parenthesisi.e. Dogogn
Dog(ogn|aaa)– Matches either ogn or aaa i.e. Dogogn or Dogaaa
*– Matches 0 or more characters i.e. tre* = tree, tre*= tr, tre*= treeeeee
?– Matches 0 or 1 character i.e. colou?r= color, colou?r= colour
+ – Matches 1 or more character i.e. tre+= tree, tre+= treee, tre+≠tre
. – Matches alpha-numeric or special characters but only one time i.e. tre.= tree, tre.= tre#, tre.=tre1, tre.≠tre#1
The above meta-characters alone or in combination are used to form a pattern which then are used for text mining for example tre.* means match anything 0 or more times that means now we can match tre#1 or tre.
Watch the video tutorial attached below to learn more about the fundamentals of this library.
Hopefully you found the discussion on Regex library helpful and at the end of it you must have become familiar with the way this particular library works. To learn more about python for data analysis, keep on exploring Dexlab Analytics blog, where you will always find informative posts.