The term Greylisting is meant to describe a general method of blocking spam based on the behavior of the sending server, rather than the content of the messages. Greylisting does not refer to any particular implementation of these methods. Consequently, there is no single Greylisting product. Instead, there are many products that incorporate some or all of the methods described here.
Greylisting relies on the fact that most spam sources do not behave in the same way as “normal” mail systems. Although it is currently very effective by itself, it will perform best when it is used in conjunction with other forms of spam prevention.
How does Greylisting Work?
Typically, a server employing greylisting will record the three pieces of data known as a “triplet” for each incoming mail message:
The IP address of the connecting host
The envelope sender address
The envelope recipient address(es)
This is checked against the mail server’s internal database. If this triplet has not been seen before (within some configurable period), the email is greylisted for a short time and it is refused with a temporary rejection with a SMTP 4xx error code. The assumption is that since temporary failures are defined in the SMTP-related RFCs, a legitimate server will try again to deliver the email.
The temporary rejection can be issued at different stages of the SMTP dialogue, allowing an implementation to store more or less data about the incoming message. The tradeoff is more work and bandwidth for more exact matching of retries with original messages. Rejecting a message after its content has been received allows the server to store a choice of headers and/or a hash of the message body.
In practice, most greylisting systems do not require an exact match on the IP address and the sender address. Because large senders often have a pool of machines that can send (and resend) email, IP addresses that have the most-significant 24 bits the same are treated as equivalent, or in some cases SPF records are used to determine the sending pool. Similarly, some e-mail systems use unique per-message return-paths, for example variable envelope return path for mailing lists, Sender Rewriting Scheme for forwarded e-mail, Bounce Address Tag Validation for backscatter protection, etc. If an exact match on the sender address is required, every e-mail from such systems will be delayed. Some greylisting systems try to avoid this delay by eliminating the variable parts of the VERP by using only the sender domain and the beginning of the local-part of the sender address.
Why it works
Greylisting is effective because many mass email tools used by spammers will not bother to retry a failed delivery, so the spam is never delivered. A spam sender may retry with a different sender, and possibly a different message, because it has a queue of victims rather than the proper queue of messages that regular mail servers maintain.
In addition, if a spammer does retry a delivery after the waiting period has expired, any one of a number of automated spamtraps will have had a good chance of identifying the spam source and listing both the source and the particular message in their databases. Thus, these subsequent attempts are more likely to be detected as spam by other mechanisms than they were before the greylisting delay.
Advantages of Greylisting
The main advantage from the users’ point of view is that greylisting requires no additional configuration from their end. If the server utilizing greylisting is configured appropriately, the end user will only notice a delay on the first message from a given sender, so long as the sending email server is identified as belonging to the same whitelisted group as earlier messages. If mail from the same sender is repeatedly greylisted it may be worth contacting the mail system administrator with detailed headers of delayed mail.
From a mail administrator’s point of view the benefit is twofold. Greylisting takes minimal configuration to get up and running with occasional modifications of any local whitelists. The second benefit is that rejecting email with a temporary 451 error (actual error code is implementation dependent) is very cheap in system resources. Most spam filtering tools are very intensive users of CPU and memory. By stopping spam before it hits filtering processes, far fewer system resources are used. This allows more layers of spam filtering or higher throughput since greylisting can easily be configured as a first line of defense with SpamAssassin etc. handling messages that go through.
Some greylisting packages support an SQL backend which allows for a distributed multiple-server frontend to be deployed with the same greylisting data on all frontends.
Disadvantages of Greylisting
Perhaps the most significant disadvantage of greylisting is the fact that, like some other spam mitigation techniques, it destroys the near-instantaneous nature of email people have come to expect. A customer of a greylisting ISP can not always rely on getting every email in a pre-determined amount of time. However, the original specification for email states that it is not a guaranteed delivery mechanism and not an instantaneous delivery mechanism. This means that greylisting is a perfectly legitimate process and does not break any protocols or rules. Traditionally, greylisting is very good at flushing out poorly configured mail servers that cannot maintain state, queue email correctly, or retry delivery within a reasonably short time. Mail servers that are properly configured and fully conform to SMTP generally have no problems with greylisting techniques and delays are very small so as not to be a problem.
If mail from a particular frequent sender is sent from any of several mail servers, mail may be delayed unless the greylisting server recognizes the different servers as belonging to the same whitelisted group.
Greylisting delays much of the mail from non-whitelisted mail servers – not just spam – until typical patterns of communication are recorded by the greylisting system. For best results, whitelisting should be used extensively.
Also, legitimate mail might not get delivered if the retry doesn’t come within the time window the greylisting software uses, or if the retry comes from a different IP address than the original attempt. When the source of an email is a server farm or goes out through an anti-spam mail relay service, it is likely that on the retry a server other than the original server will make the next attempt. Since the IP addresses will be different, the recipient’s server will fail to recognize that the two attempts are related and refuse the latest connection as well. This can continue until the message ages out of the queue if the number of servers is large enough. Such server farming techniques can be construed as breaking RFCs detailed above since the original sending machine has absolved itself of the responsibility of mail delivery by tossing it back into the pool, which breaks the state of the mail delivery process. This problem can partially be bypassed by identifying and whitelisting such server farms in advance. However, it is not possible on a distributed network the size of the Internet to maintain a complete list of all such dedicated server farms.
Greylisting can be a particular nuisance with websites that require you to create an account and confirm your email address before you can begin using them. If the sending MTA of the site is poorly configured, greylisting may delay the initial email containing your signup confirmation link, thus introducing a waiting period even though the actual website may have attempted to send out your email confirmation code immediately. Almost all stock configured Sendmail MTAs (sendmail being the most widely deployed MTA on the internet) will retry after a few minutes, leading to typical delays of under 10 minutes in most cases (still dependent on the greylisting configuration). Greylisting is particularly effective in many cases at weeding out misconfigured MTAs, and is gaining in popularity as a very effective anti-spam tool. Those MTAs that do not correctly handle greylisting will, by their very nature, become less numerous over time.