[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microsound] Re: {SpamScore: ssss} Re: [microsound] OT: those programs spammers use to generate random text?



Brilliant research Frank, as usual.  I knew you were a Pd guru, but
not a spam-ologist!

~Kyle

On 10/10/06, Frank Barknecht <fbar@xxxxxxxxxxx> wrote:
Hallo,
Joe Milutis hat gesagt: // Joe Milutis wrote:

> They are generated by Chomsky bots or Bayesian Poisoning, from what I can
> gather.  But I'd be interested in knowing the technicalities of either . . .

Well, probably we can only guess, what spammers use, but it's
interesting to search Google with complete phrases taken out of spam
mails.

For example I chose a random sentence from the latest entry on
www.spoetry.com "Colonels countess" and searched for the source of
this: "We may as well be frank up to the point at which we should lose
money". If you put this in quotes, you don't get many hits, in fact I
only got one: http://gutenberg.net.au/ebooks03/0300591h.html

It turned out to be the text: "TOO TRUE TO BE GOOD - A Political
Extravaganza" by George Bernard Shaw. Cool, huh?

There are more GBS-quotes spilled into the spam mail for example: "My
son was called Popsy in his infancy." In fact, the whole mail seems to
be an abbrevieated version of the Shaw-text, but I didn't analyze any
further.

Condensing such a text into spam could be done with this little
Python script:

from random import randint
sentences = open("shaw.txt").read().split(".")
length = len(sentences)
for i in range(5):
    print sentences[randint(0,length)].strip() + "."

Here are two Haikus I did from Shaw using above script:

  And a pious man should not marry a
  pious woman: two of a trade never agree.
  Let me sell it for you.
  Modern colossal fortunes have demonstrated its
  vanity.
  THE BURGLAR.
  THE COUNTESS.


D. A little below par: that is all. Hallo! [She climbs to the St Pauls platform and peers into the cell]. Just for the romance of it, you know: he meant no harm. My father, in fact.

Similar techniques are probably used in mailing list spam, where the
text is built from past posts and thus can pass the Bayes filters of
the subscribers.

Ciao
--
 Frank Barknecht                 _ ______footils.org_ __goto10.org__

---------------------------------------------------------------------
To unsubscribe, e-mail: microsound-unsubscribe@xxxxxxxxxxxxx
For additional commands, e-mail: microsound-help@xxxxxxxxxxxxx
website: http://www.microsound.org




--

http://theradioproject.com
http://perhapsidid.blogspot.com

(((())))(()()((((((((()())))()(((((((())()()())())))
(())))))(()))))))))))))(((((((((((()()))))))))((())))
))(((((((((((())))())))))))))))))))__________
_____())))))(((((((((((((()))))))))))_______
((((((())))))))))))((((((((000)))oOOOOOO

---------------------------------------------------------------------
To unsubscribe, e-mail: microsound-unsubscribe@xxxxxxxxxxxxx
For additional commands, e-mail: microsound-help@xxxxxxxxxxxxx
website: http://www.microsound.org