Properly validating e-mail addresses

tux0r - Aug 30 '18 - - Dev Community

If you are a developer of "web applications", you probably have written (or copy-pasted) some code which tries to validate an entered e-mail address. If you have indeed done that and your solution contains a long, hard-to-read regular expression (like most of them seen on Stack Overflow) do, I have bad news for you: Your code is wrong.

Did you know that parentheses, spaces and - according to the RFC 6531 document - emojis can be a part of a valid e-mail address - and that is it merely an implementation detail whether Unicode should be supported at all?

Did you know that both IPv6 addresses and resources in your intranet are valid parts of the part after the "@", so requiring a TLD (xxxx.yy) is entirely wrong?

Nor do all of the existing (and more complex than "is there an @ character?") validators I've come across. The RFCs are much more flexible than any regular expression can be, including special cases like certain characters which are only allowed when they are insite a quoted string, but the quoted string must not stand between two periods. Ah, the joy of standards.

Of course you can still catch a majority of all invalid e-mail addresses with a sufficiently complex regular expression, but I seriously recommend you to aim for more than just "almost bug-free" code. So here is a couple of suggestions:

  1. Send validation e-mails to all e-mail addresses you want to validate.

    If the user confirms having received it, the address is perfectly valid. But this is not really reliable because nobody is forced to click into an e-mail, and if the user does not, this does not automatically mean that the address is not valid. This suggestion only makes sense for things like web forums where a confirmation e-mail can properly be announced and clicking into it can be required.

  2. Read the RFCs.

    You probably do not want to do that. :-) But don't worry, I did it for you:

  3. Use libvldmail.

    libvldmail is a small, no-frills library written in C (with no special dependencies, not even regex.h) which I have developed to solve this very problem once and for all. I provide language bindings (over SWIG) as well, so if your web application runs on Python or Ruby or whatever, you should be able to safely use it. The SWIG website has a tutorial on how to generate bindings from the supplied template file if you need that.

You are invited to read the README and the example implementation for details on how to use it. Don't worry, I made it easy: There is only one function which takes one parameter.

I just thought this might come in handy for some of you.

. . . . . . . . . . . . . . . . .
Terabox Video Player