This article provides some background of the internet protocols with particular emphasis on the SMTP protocol that is used to send mail. It results from the requirement for visitors to my blog leaving a comment to provide at least a valid email address. The result is an email address validator tool which validates but is short from verifying. The verification of the address requires probing ports 25 or 587 which is not welcome by servers (I came very close to being black-listed). I found an easy solution: to subscribe to an e-mail validator.
I maintain a web site that I have programmed with HTML, PHP and Javascript. On this site, comments are invited via e-mail then there are no problem with the verification of the e-mail address of the visitors since I have received the e-mail. The situation is different on my blog. In this case, visitors can post comments directly and the lowest level of identification required from the visitor is a valid (and if possible verifiable) e-mail address. This prompted me to search ways of validating/verifying their e-mail address before their post hits the blog. The real question is: is it real or fake?
Generalities
For any user, sending an email is a very simple process. When you send mail to someone (say to user@domain), your mail typically goes from your E-mail client to an SMTP server. The SMTP server then checks for the MX record of the domain in the E-mail address. The MX record is a domain name, so the SMTP server then gets the A record (its IP address) for that domain name, and connects to the mail server. Once connected, the receiving mail host search for the user amongst its mail customer and either delivers the message or bounces it if the user is not found.
MX records are used to tell how to route mail. They were used because there was a need for SMTP traffic to be routed differently than other traffic for that domain. The main purpose behind the MX records is the ability to specify machine(s) to handle a specific protocol for the entire domain, and also to specify backup mail servers (with different priorities).
This description may appear obscure and this is why I will develop some basics of the Internet protocol in the section that follows.
Internet protocols
E-mail is a complex system, involving not only your computer, but also an "incoming mail server" (which holds your incoming e-mail until you ask for it), and an "outgoing mail server," which receives outbound mail you send, and routes it to its destination. This "outgoing mail server" is also known as an "SMTP" (Simple Mail Transfer Protocol) server. This having been said, let's explore the foundation of the process.
TCP/IP (Transmission Control Protocol/Internet Protocol) is the back bone of the Internet. In 1982 TCP/IP was standardized and a world-wide network of fully interconnected TCP/IP networks called the Internet was introduced. It took some time to enable it to carry commercial traffic and it was not before 1995 that The Internet was commercialized.
TCP/IP is a two-layer program.
- The higher layer, TCP, manages the assembling of a message or file into smaller packets that are transmitted over the Internet and received by a TCP layer that reassembles the packets into the original message; and
- The lower layer, IP, handles the address part of each packet so that it gets to the right destination.
Nowadays, all computer operating system offers the TCP/IP program.
I will not elaborate on TCP [more on TCP/IP (Transmission Control Protocol/Internet Protocol) and Wikipedia: History of the Internet] and do so also for IP except for what is my concern here: emailing and its associated protocol: SMTP (Simple Mail Transfer Protocol).
Email - SMTP protocol
Email is a fundamental part of the way we communicate today, carrying everything from personal day-to-day communications to important financial communications. SMTP is the technology by which servers handle and send email commands over the Internet. It goes hand in hand with POP (Post Office Protocol) which is used when receiving emails.
SMTP uses TCP as a transport protocol and in turn uses IP for routing. It is simple in design and acts as an electronic post office, enabling emails to be passed from one system to another. It has a number of status codes to enhance it’s functionality. These status codes are used to relay specific conditions between the client and server. Email is submitted by a mail client (MUA, mail user agent) to a mail server (MSA, mail submission agent) using SMTP on TCP port 587. Most mailbox providers still allow submission on traditional port 25 [more on Wikipedia: Simple Mail Transfer Protocol].
In fact, port 587 is for users to send out emails but requires sender authentication. Port 25 is for servers to relay messages to one another but it is used by many spammers and bots to relay spam. That way ISPs can block outgoing SMTP on their networks but still allow users to send email to any mail server through port 587 [more on MostlyGeek – Benson Wong’s Blog].
Internet addresses
An IP address (Internet Protocol address) is a numerical label assigned to each device (e.g., computer, printer) participating in a computer network that uses the IP protocol for communication. IP addresses are 32-bit binary numbers that are canonically displayed as human-readable strings of the form xxx.xxx.xxx.xxx where each xxx is a number between 0 and 255. There are 232 (4,294,967,296) possible IP addresses in the IPv4 (Internet Protocol version 4) system.
The rapid exhaustion of IPv4 address space, despite conservation techniques, prompted the development of the IPv6 (Internet Protocol Version 6) which is comprised of 128-bit binary numbers.
The Domain name system
The DNS (Domain Name System) is in place to organize and identify domains. It is the phone book of the Web and it allows the IP address to be translated to words (domain names) which are much easier to remember than IP addresses. Obviously, there is a one-to-one correspondence between domain names and IP addresses.
Where does your computer's IP address come from? It probably comes from a DHCP (Dynamic Host Configuration Protocol) server on your network. The job of a DHCP server is to make sure your computer has the IP address and other network configuration it needs whenever you're online. Because this is "dynamic," the IP address for your computer will probably change from time to time. Web servers and other computers that need a consistent point of contact use static IP addresses. This means that the same IP address is always assigned to that system's network interface when it's online.
Computers and other network devices on the Internet use an IP address to route your request to the site you're trying to reach. This is similar to dialling a phone number to connect to the person you're trying to call. Thanks to DNS, though, you don't have to keep your own address book of IP addresses. Instead, you just connect through a domain name server, also called a DNS server or name server, which manages a massive database that maps domain names to IP addresses.
DNS records
As pointed to above, the DNS system has one entry for each domain name and this entry is comprised of several records [List of DNS record types], the most important of which are the "A" record and the "MX" record - as far as this article is concerned.
- The "A" record returns the IPv4 address of the host; whereas
- The "MX" record maps a domain name to a list of message transfer agents for that domain.
The validation process
First I will explain how e-mail addresses are formatted. Their general format user@domain consists of two parts: the part before the @ sign is the local-part of the address, often the username of the recipient, and the part after the @ sign is a domain name to which the email message will be sent. The local-part has no significance other than the final mailbox destination in the mail host whereas the domain name is an identification string that points to the IP address of the host domain through the DNS (Domain Name System).
There are very specific rules defining the syntax. The local-part may be up to 64 characters long and the domain name may have a maximum of 253 characters but the entire email address is restricted to no more than 254 characters. The formal definitions are in RFC 5322 (sections 3.2.3 and 3.4.1) and RFC 5321 – with a more readable form given in the informational RFC 3696[2] and the associated errata [Wikipedia].
Given this, the e-mail validation process can be divided into 3 steps:
- Syntax validation - Does the email address has a valid format?
- Domain name verification - Does the domain exists (has a record in the DNS) and is it up and running?
- Username verification - Is the username really registered as a mail recipient with the domain?
Syntax validation
Syntax validation is generally made with regular expressions but the PHP function filter_var() can be used with FILTER_VALIDATE_EMAIL as its second argument. The regular expression used in the PHP 5.3.3 filter code is based on Michael Rushton's blog about Email Address Validation. I have used this solution.
Domain name validation
Domain name validation is generally made using PHP functions that will check the DNS (checkdnsrr()) and will verify if the domain mail server is up and running (fsockopen() querying port 80 - Ports 25 or 587 should be queried but such queries result in a time-out on most domains) as shown below:
function checkEmail($email)
{
if(filter_var($email, FILTER_VALIDATE_EMAIL) === FALSE) // checks proper syntax
return false;
list($username,$domain) = explode('@',$email); // gets domain name
if(!checkdnsrr($domain, 'A')) // checks if there is an A records in the DNS (added 27 Aug 2-012)
return false;
if($sock = fsockopen($domain, 80, $errno, $errstr, 30)) //attempts a socket connection to the domain
return true;
}
Following an email dated 25 August 2012 by Henry Timmes, I removed the test on "MX" records by one on "A" records because he stated that a test on the A record should be made since, according to ITEF (Internet Engineering Task Force) standards, if no "MX" record exists the default "A" record acts as the mail server. The previous code would have caused some real domain that don't have "MX" records to become "false" negatives.
You will use this program if you test an email in the form that follows:
Username verification
This is as far as I could achieve in terms of email address validation. It checks that the email address is formatted according to specification, is checks if the domain has an MX record and it verifies if the domain is up and running. Period! It should not be confused with verification which is a check to ensure that the username is real within the domain.
In order to get into verification, I found a piece of code that does this on PHP SMTP Email Validation and I used it (personalizing the sender address to my email address). It uses a sequence of instructions like this to find out if the username is verifiable:
if($sock = fsockopen($domain, 25, $errno, $errstr, 30)) //attempts a socket connection to mail server
{ // probing the domain name
fwrite($sock, "HELO <my email address>\r\n");
$reply = fread($sock, 2082);
fwrite($sock, "MAIL FROM: <".$username.'@'.$domain.">");
...
fclose($sock);
}
else
}
That sort of code resulted in time-outs on all domain names except on "gtro.com" (my own domain). Another way of probing was therefore necessary.
First, I found that it is not the domain name that should be probed but the mail servers associated with the domain name. One easy way to find the mail servers associated with the domain using the PHP function getmxrr() whose format is as follows:
int getmxrr ( string hostname, array mxhosts [, array weight]);
It searches the DNS (domain name server) for MX records (mail exchanger record, defined in RFC 1035) each pointing to an email server or mail transfer agent (MTA) that is configured to process mail for that domain. It returns true if any records are found and returns false if no records are found or if an error occurs. A list of the MX records found is placed into the array mxhosts. If the weight array is given, it will be filled with the weight information gathered. Try it here:
If you try this tool for videotron.ca, you will get mx.videotron.ca as the mail server. I tried to open this socket with fsockopen() on either port 25 or port 587 and it was timed-out. On the other hand, I tried to open the socket associated with the SMTP mail server provided by videotron.ca to its customer (it is not listed in the MX record), and it opened. I don't know how to go any further.
The easy solution
I did not go any further, I subscribed to an email verification service: "Free Email Verifier". Just try it!
Conclusion
The main reason why I gave up before succeeding the verification (rather than validation) of email addresses is that, during the experimentation, I have not been able to open ports 25 or 587 on any of the SMTP mail servers given in the MX records of several domains. I succeeded only in validating but not verifying email addresses.
This email validator tool that was developed attempts to validate the syntax and the existence of the domain of email addresses. It does the following: verify the syntax, make sure that the domain has a MX record and open Port 80 to make sure that the domain is up and running (this port is always open).
I failed doing what email validation tools on the Web succeed doing. I have used several pieces of code borrowed from the Web without success because I always failed to open ports 25 or 587 in order to probe the domain and its mail servers for the given username. How do Web-based tools like Thuenhuis Networking do remains a mystery for me. I have a solution and I can move forward!
References
When I started this article, I was a TCP/IP illiterate. Obviously, I had to search quite a bit for references on the subject and articles dealing with related PHP code. Bits of code were taken from several sites. Indebtedness is hereby acknowledged.
Web-based tools
- MX Toolbox - This test will list MX records for a domain in priority order. The MX lookup is done directly against the domain's authoritative name server, so changes to MX Records should show up instantly. You can click Diagnostics , which will connect to the mail server, verify reverse DNS records, perform a simple Open Relay check and measure response time performance. You may also check each MX record (IP Address) against 118 DNS based blacklists . (Commonly called RBLs, DNSBLs)
- Thuenhuis Networking - A Web-based tool to verify an email address.
- Free email verification - A Web-based tool that can be used to verify is an email address is real or fake.
- PHP Email address validation with Verify probe - Another Web-based tool that can be used to verify is an email address is real or fake.
- MX Lookup Tool - A Web-based tool to display the MX records associated with a domain.
- TCP/IP Portscan (Firewall Test) - A Web-based tool to identify open or closed ports given an IP address.
- DNSBL.info - SPAM Database Lookup - A Web-based tool allowng one to find out if his IP address is blacklisted somewhere. DNSBL Information provides a single place where you can check that status of your mail server's IP address on more than 100 DNS based blacklist.
- IP Blacklist Checker - Another tool of the same kind.
- Check Google's Domain Black List to see whether or not your domain is blacklisted.
- What is my IP provides a tool that allow one to find his IP address (that is your A record in the DNS)
Internet protocols
- SMTP Command References - A client computer communicates with an SMTP server (e-mail server) by using SMTP commands. There is a core list of SMTP commands that all SMTP servers supports and these are referred to as basic SMTP commands in this document. All basic SMTP commands that are specified by the SMTP protocol are described as well as those of the Extended SMTP (ESMTP).
- TCP/IP (Transmission Control Protocol/Internet Protocol) - TCP/IP (Transmission Control Protocol/Internet Protocol) is the basic communication language or protocol of the Internet. It can also be used as a communications protocol in a private network (either an intranet or an extranet).
- Wikipedia: History of the Internet - The history of the Internet began with the development of electronic computers in the 1950s. This began with point-to-point communication between mainframe computers and terminals, expanded to point-to-point connections between computers and then early research into packet switching. Packet switched networks such as ARPANET... In 1982 the Internet Protocol Suite (TCP/IP) was standardized and the concept of a world-wide network of fully interconnected TCP/IP networks called the Internet was introduced. Access to the ARPANET was expanded in 1981 when the National Science Foundation (NSF) developed the Computer Science Network (CSNET) and again in 1986 when NSFNET provided access to supercomputer sites in the United States from research and education organizations. Commercial internet service providers (ISPs) began to emerge in the late 1980s and 1990s. The ARPANET was decommissioned in 1990. The Internet was commercialized in 1995 when NSFNET was decommissioned, removing the last restrictions on the use of the Internet to carry commercial traffic.
- MX record - Wikipedia, the free encyclopedia -
- Send mail() with php and google apps mx record on 000webhost, and some questions
- Sender Policy Framework - Wikipedia, the free encyclopedia
SMTP Authentication
- Email Authentication - Email authentication is the effort to equip messages of the email transport system with enough verifiable information, so that recipients can recognize the nature of each incoming message automatically. A very important effort in stopping spam.
- Smtp Auth Email Script - a very simple script that allowing to send emails through smtp with auth. Bits were taken from all over the internet
- SMTP Authentication [Tutorial] - SMTP Authentication is a scheme which was introduced in 1999 by J. Myers of Netscape Communications and fiGnally released as RFC 2554 ("SMTP Service Extension for Authentication"). It is partly based on the SMTP Service Extensions as defined in RFC 1869. Most modern SMTP implementations support SMTP Authentication.
- Use TCP Port 587 For Mail Submission - The conventional way for a mail client program to send e-mail is using TCP port 25, which is also the port used by mail servers to talk to each other. But port 25 is widely abused by malware to spread worms and spam. As a result, many ISPs are restricting its use.
- How to Avoid Spam Filters with PHP mail() Emails - Just about everyone who uses PHP has encountered the popular PHP mail() function which enables email to be sent from a server. This function is preferred to other methods of sending email, such as sending mail with SMTP Authentication, because its implementation is quick and easy. Unfortunately, when using the mail() function, your emails are more likely to be marked as spam.
PHP code
- PHP SMTP Email Validation - Provides a PHP class that encapsulates the SMTP transation between the remote domain, as well as the DNS lookup for the Mail Transfer Agent (MTA) responsible for that domain and returns whether the email addresses are valid or not.
- Validate an E-Mail Address with PHP, the Right Way - teaches how to develop a working PHP function to validate e-mail addresses.
- Validation d'adresse e-mail en PHP - (in french) This tutorial shows how to validate an email address with PHP.
- How to Validate Email Addresses in a PHP Script - PHP (5 and later) comes with a handy set of functions and filters that make testing for email address validity a snap. It use the FILTER_VALIDATE_EMAIL PHP email validation filter.
- PHP email validation with filter_var - There's no longer any need in PHP to create your own regular expressions to try to validate an email address; simply use filter_var() instead. This is available from PHP 5.2.0.
- PHP Validate Email Address: Email Validator - When working with forms and interacting with your visitors, you may want to verify if your visitor entered an email address in the right field, and if it is valid or not