Notes from SPAM guru session

As promised, here are my notes on the [[anti] SPAM guru session held Thursday Decmber 7 from 9AM to 10:30AM. This document is member editable in wiki format. Add stuff as needed. Put offsite hyper links in square brackets. Put onsite hyperlinks in square brackets. Any link starting with http: is automatically hyperlinked; in square brackets it’s an offsite link.
==== examples: ====
[[onsite link]
[[offsite link|http://offsite.link]

see also [filter tips|http://lopsa.org/filter/tips]

== technology breakdown ==

* whitelists
* blacklists
* reputation
* bayesian
* spf and other sender verification
* greylisting (greylisting.org)
* dnsbl (GOOD! sbl-xbl http://open-systems.ufl.edu/services/smtp-relay/sblxbl.html [http://www.spamhaus.org/])
* challenge/response (aka captcha – Hate hate hate)

== commercial product sampling ==

* sophos
* ironport
* ironmail proofpoint
* postini
* cyphertrust
* tumbleweed.com
* brightmail (Symantec )
* mxlogic
* mailfrontier.com
* barracuda
* mirapoint
* sendmail (uses cloudmark in mail switch engine as add-on (mcafee av, fprot))
* cloudmark:
** “98% accuracy before changing”
** harness the powers of millions of users
** reputation system with multiple dimension hashes
** free 15 day trial

=== how it works ([CloudMark|http://cloudmark.com]) ===

* email arrives at server
* CSE computes a unique fingerprint for each message received (fingerprints do not reflect content and cannot be decoded)
* Cloudmark server edition (CSE) queries cloudmark service with fingerprints
* cloudmark services identifies each fingerprint as spam or legit
* multiple fingerprints are taken on message in the field of 14-20 byte numbers

=== terminology ===

==== multiplicity ====

how well a single signature classifies mutations of a single spam unbiased multiplicity – how close an experimental fingerprint algorithm gets to generating only a single metric per mutation class.
A fingerprint with high multiplicity is capable of covering multiple variations of the same spam

==== cross-class collision ====

* the potential rate at which the signature could cause false positives

==== fingerprint algorithms ====

* unique: generate a unique hash for every unique message
* oracle: where all spam messages in the same mutation class generate the same signature
* target: optimally acts as an oracle – developed in house

=== Sender Verification ===

* SPF
** http://www.openspf.org/ – why SPF is good
*** http://freshmeat.net/projects/libspf/
*** qmail spf http://www.saout.de/misc/spf/
*** thunderbird http://taubz.for.net/code/spf/
** others argue that it’s worse than useless (says nothing about spamminess)
** counter-counter point:
*** http://www.meyerweb.com/eric/comment/chech.html
* ESV (Email Sender Verification)
* Sender ID (http://www.microsoft.com/mscorp/safety/technologies/senderid/overview.mspx
* Sender Address Verification http://www.netsrv.admin.bton.ac.uk/publib/senderverify.html
* DomainKeys http://antispam.yahoo.com/domainkeys
* BlueBottle (http://bluebottle.com)

== Other Stuff ==

* [Death2Spam|death2spam.com]
** recommended by Paul Graham

* [Vipul’s Razor|razor.sourceforge.net]
:: User input is validated through reputation assignments based on consensus on report and revoke assertions which in turn is used for computing confidence values associated with individual signatures. (evolved into CloudMark reputation engine)

* [0SPAM.com|http://0spam.com]
** compatible with POP accounts and services like gmail. filters your mail for you.

* http://www.openspf.org/srs.html

* John Graham Cummings newsletter
:: http://www.jgc.org/ (once a month, 2 pages)

* [maia mailguard|http://renaissoft.com/maia/] (works with [amavisd|http://www.amavis.org])

* [openrbl.org|http://www.openrbl.org]
** http://noc.bit.nl/dnsbl/ascc/ (country code mapping by IP)

* Doug Hughes’s LISA 2004 anti-spam mini-symposium invited talk slides are attached at the bottom in openoffice format (.sxi)

== Organizations ==

* CAUCE – http://www.cauce.org/
Coalition against unsolicited commercial email (lobbying, political action, etc)
* [ASRG|http://asrg.sp.am] – Anti-spam research group

== Bayesian ==

* http://www.paulgraham.com/antispam.html – 2002
** (generally no whitelists, no anti-virus)
** bayesian filters compared: http://home.dataparty.no/kristian/reviews/bayesian/
* [CRM114|http://crm114.sourceforge.net] (one of the most highly regarded in some ways, but also see comparison for accuracy above)
* SpamBayes – http://spambayes.sf.net (python)
* popfile – http://popfile.sf.net (perl) (John Graham-Cumming)
* http://spamprobe.sourceforge.net (C++)
* http://dspam.nuclearelephant.com (C)
* http://bogofilter.sourceforge.net (eric raymond then David Relston — uses Fisher-Robinson chi-square)

== reputation based ==

* [DKIM|http://mipassoc.org/dkim]
* [DCC|http://www.rhyolite.com/anti-spam/dcc/] (distributed checksum clearinghouse)
* [pyzor|http://pyzor.sourceforge.net]
* [SIQ from ASRG|http://www.snert.com/Software/drafts/draft-irtf-asrg-iar-howe-siq-01.txt]

== free plugins for mail programs (a/s, a/v) ==

* [mailscanner|http://freshmeat.net/projects/mailscanner/] and http://mailscanner.biz (professional services)
* [MIME defang|http://www.mimedefang.org]
* [SpamAssassin|http://spamassassin.apache.org]
** [rules du jour|http://www.exit0.us/index.php?pagename=RulesDuJour]

== Links to other articles and websites about spam ==

* [sage-members thread on spam control| http://www.sage.org/mailarchive/sage-members-archive/members-only/2005/msg00647.html]