OP5: Anti-spam service

Pierre-Julien Bringer

Spam detection should occur at the email server level. However, in accordance to the End-to-End principle, the server should pass all email to clients. The email should be tagged with a header containing a number between 0 and 1 representing how much the server believes that the email is spam. Depending on that value, clients can decide how to display the emails. On the other hand, while collaboration between email administrators can improve spam detection, asking the network to eliminate spam further upstream is subject to abuse, and not very efficient.

Learning which emails are spam requires humans to classify them. In order for this learning mechanism to function, it needs to happen on a large volume of emails, and a large number of human classifications. Because of this, it's necessary to perform spam detection at the email server, rather than at the client.

Deciding how safe the user wants to be about not dropping wanted emails is a personal preference. The user’s client should download all the emails and display them, taking into account the server’s evaluations. The client can give the user access to emails considered to be spam, for her to check and flag false positives.

A user’s client might not download emails for performance reasons: bandwidth waste is avoided by not downloading emails that are clearly spam, according to the server. For uncertain cases, the default might be to only send the headers of the email, and to let the
user request the rest if she wants. These thresholds should be configurable, but set to reasonable defaults.

To avoid wasting bandwidth, servers could drop obvious spam before they are even fully obtained. The SMTP protocol allows servers to reject emails before the data is sent. This can only be done after checking the recipient’s preferences. Unlike previously, user preferences need to be stored server-side for this. The benefit is to protect against attacks on email servers, but spam emails are usually small.

To improve spam detection, mail service providers could collaborate and exchange information about spam. This requires trusting the administrators of some domains. This is realistic: many email service providers use outsourced blacklists. This would be particularly useful for administrators of small email services, such as most corporate email systems.

Finally, it would be desirable to cut-off spam senders that generate large amounts of traffic. Locally this could be done using firewall techniques. This requires maintaining some state locally. Blocking traffic before it crosses the Internet would reduce the global resource cost of spam. However, this has the potential of disrupting many legitimate users, through denial of service. It would require passing the state to the ISP of the spam sender. For compromised home PCs, there might be an incentive for the ISP to contact its customer, but that route is slow. For rented servers, the incentive to disconnect a client is small. This approach seems disproportionately dangerous with regards to the benefits it brings, and should not be relied upon.