In this tutorial, we will write our own Python script to extract all the email IDs from the given text file. Using this script, you don’t need any external tool to extract emails.
First of all, hope you have Python installed on your system.
To make it simple, divide the problem into multiple tasks.
Read each line from the text file.
fileToRead = 'readText.txt' file = open(fileToRead, 'r') listLine = file.readlines()
Related Read: Python code to check if file presents or not
Read each word from the line and save it into the list.
We can use the Python split function to get the words from the text line.
fileToRead = 'readText.txt' delimiterInFile = [',', ';'] file = open(fileToRead, 'r') listLine = file.readlines() for itemLine in listLine: item =str(itemLine) for delimeter in delimiterInFile: item = item.replace(str(delimeter),' ')
Note: If you are using replace()
string method, you have to save the result in a new string. Replacing characters from the string in place is not possible as the string is an immutable data type in Python.
Python to Validate / Verify Email ID:
Using re
Python module for pattern matching makes our job easy. Verify each of the strings if it is a valid email id or not.
import re def validateEmail(strEmail): if re.match("(.*)@(.*).(.*)", strEmail): return True return False
You can learn more about regular expression in Python.
Save all the extracted email IDs in the file.
After validation, save all the valid email IDs into the list listEmail
. Check if all the list items are unique (unique email IDs). And remove the duplicate email IDs from the list. Save the list into the file emailExtracted.txt
.
If there is no email in the text file, listEmail
will be empty. Print “No email found.”
For instance, if you found 40 emails in the file print “4o emails collected!”.
You can run this code with both the Python 2 and Python 3 version.
Here is the complete code:
import re fileToRead = 'readText.txt' fileToWrite = 'emailExtracted.txt' delimiterInFile = [',', ';'] def validateEmail(strEmail): # .* Zero or more characters of any type. if re.match("(.*)@(.*).(.*)", strEmail): return True return False def writeFile(listData): file = open(fileToWrite, 'w+') strData = "" for item in listData: strData = strData+item+'\n' file.write(strData) listEmail = [] file = open(fileToRead, 'r') listLine = file.readlines() for itemLine in listLine: item =str(itemLine) for delimeter in delimiterInFile: item = item.replace(str(delimeter),' ') wordList = item.split() for word in wordList: if(validateEmail(word)): listEmail.append(word) if listEmail: uniqEmail = set(listEmail) print(len(uniqEmail),"emails collected!") writeFile(uniqEmail) else: print("No email found.")
Most of the code in this Python script is self-explanatory. If you still have doubts, you can ask in the comment section.
Using Python as a scripting language has its perks.
Automate Email Marketing: You can use this Python script to extract emails from the text file. Many times we need to read all the emails for marketing.
You are ready to automate your email-extracting job with this simple Python script.
Extracting emails from the web pages is also simple. Get the source code from the web page using the browser. You can simply use the view-source feature. Example, view-source:http://example.com/
.
Open it in the browser and copy and paste the source code into the file readEmail.txt
. Running this script will give you all the email IDs present on the web page.
You can also use a CSV file rather than a text file to extract email IDs and save them. Using the CSV file in Python is pretty simple.
Automation: I use this script to extract the email IDs of the students subscribed to my Python channel. So that I can import these emails into the email server to send them a programming newsletter. It saves my time a lot, rather than adding each email ID.
That’s it all from this script written in Python to extract emails from the file.
Kindly share, what are the things you have automated using Python. I would like to hear from you.
Nice work!!
Thanks! Glad you like it.
TypeError: ‘str’ object is not callable
What Python version are you using?
What should be the input? Please tell.
Input should be two text files- ‘readText.txt’ and ’emailExtracted.txt’. Keep both the files in the same directory from where you are running this program. The ‘readTest.txt’ will be your input text files from where you want to extract the emails. After executing script, all the extracted emails will be saved in the file ’emailExtracted.txt’.
thank you so much. By the way, how can you write such complex codes like this? Please share your idea with me.
Follow these steps for solving any complex programming questions.
> break down the problem into small tasks.
> write code for each small task
> integrate these tasks
For example: In the above program, I followed the below steps.
> Breaking down the problem into multiple tasks like reading file data, writing a regular expression for email…
> Write functions for each task
> Integrate each task. Read each line from the file and extract emails from each line.
Hello! Thank you so much. One question please. How to extract phone number by doing this way.? please
Hi Sai,
Everything will be the same. Only, you have to write different regular expressions to extract phone numbers based on the rules (like in India, the phone number is 10 digits).
What’s about the closing file?
For why reason are you using such complex writing-in-file method?
That will be more efficiency and take less memory
Hi Vladimir,
You are performing write operations for each item in the list. The file write operation has more complexity associated with it. Its always good to replace these many write operations with the single.
Hi,
I was writing the code and got below error. Please help me out.
Hi Anand,
If you are trying to run the above program, don’t try it in Python interpreter console. Rather, save the above code in your
.py
file and run it.I have list of companies with company name, all these companies are newly registered. Now my task is to get email ids for all these companies. Please do suggest.
This is not so straight forward. You have to find out official websites and the their about-us or contact page. These pages mpsty will have contact email. You have to webscrap to read the page and to get the email ID.
If you want to do it no programming way, there is a chrome plugin by hunter.io. Using it, you can find the email ID for any given company.
Is this your college assignment?
What if I want to extract emails from all university personnel in a specific country?
In that case, you can modify the regular expression to match the desired email addresses from the university. I assume the email address from the university will have a specific domain (example: university-name.edu). Change the regular expression to “(.*)@” in validateEmail() function. That’s all.
Thank you so much for this program. I need your help i.e. what if one wanted to extract specific animals image from the big dataset which contains multiple images of different animals? please help me with this.
One simple solution is that you can name the images accordingly ex. cat-01, cat-02, dog-01. From the name, you can depict the images. But this requires your access to the dataset. Otherwise, if you want to automate this, you can use the pillow Python package for reading images and any efficient open-source image recognition package to identify the animals in the given image.
I have 100’s of files with emails in them. I need to write a python script to hide the user part in the email in those files, do you have something for a similar approach.
You can read each file separately and then you can perform required operations on the emails in that file.
Great Explanation. Can we do this email finding without using re? I am using the core Python language. Can you give a hint in this regard, Thanks a lot.
Thanks Amma!
Then there is a lot of stuff to verify.
The string to be email ID, it should have special characters ‘@’ and ‘.’ in the same order. The string should have the exact one ‘@’ character. You have to do this verification manualluy using Python string methods.
Hello, thanks for this code I have a question plz.
What if I want to exclude some emails from the file like I don’t want the print the “hotmail.com” email?
You can simply update the
validateEmail()
function in the program mentioned in this tutorial.Add the following line of code inside the
validateEmail()
function.Let me know if it solves your problem.