What and Why?

Eros Errors functions as an archive of pseudo-erotic texts/stories generated via a multi-layer recurrent neural network (In this case, I am using char-rnn. Check it out, it’s rad!) The network is implemented on various machines and trained using various data. I don’t know much about RNNs and I don’t really know what I’m doing, but I’ve always been interested in algorithmically-generated content.

The basic premise is that the computer is fed a large amount of data, the computer studies the data, and eventually learns to spit out similar data. This whole process is modeled after the way a brain would work (a cluster of neurons or something). The process takes quite a bit of time, and the computer is constantly checking its “understanding” of the data and comparing its generated version against the source dataset.

When the computer is first fed the data, it has no idea what letters are, let alone words or grammar. It learns to recognize patterns. Patterns that will hopefully give you an erection.


The Parts:

Machines One/Two: I have the RNN running on virtual machines/linux installs on two separate computers. These are the authors, I want to see what their tendencies are.

Dataset One: (4.68mb/4,912,058 characters of smut) This is the data for the initial tests. It is well-formed and consistent. The results are intriguing, and prompted me to create a larger dataset to train from. So far, I’ve found the stories generated from this dataset to be the most hilarious and filthy.

Note: After revisiting this two and a half years later, I must say that there is quality content generated from all of the datasets.

Dataset Two: (120mb/126,560,965 characters of smut) This version of the dataset attempts to eliminate some of the formatting issues present in Dataset Two (Unsanitized). In doing so, paragraph and line breaks have (unfortunately) pretty much been eliminated.

Dataset Two (Unsanitized): (127mb/133,713,539 characters of smut) This is a proto-version of Dataset Two before I attempted to provide some universal formatting to the text. The results are unruly.

Dataset Three: (13.9mb/14,618,634 characters of smut) After reviewing the output from Dataset Two, I have re-prioritized readability. This dataset is very consistent. The content is allegedly “higher brow” than the previous datasets as well.

Dataset Four: (4.91mb/5,150,096 characters of smut) This dataset features Victorian-era erotica written by a single author (and written from the perspective of the author). The hope is that the generated content will be more coherent across paragraphs, etc. Plus it’s from the Victorian era, so… classy, right?

Dataset Five: (2.18mb/2,294,619 characters of smut) Although small in size, this dataset consists of a collection of shorter amateur/confessions-style stories mined from a popular website. The writing styles and formatting are varied, the content is still disgusting.

Dataset Friends: (1.31mb/1,377,193 characters of smut) While preparing another dataset, I stumbled upon a cache of erotic fanfiction surrounding the sitcom F•R•I•E•N•D•S (Friends). The serendipity of the situation forced my hand, how could I not feed the network swaths of laugh tracks wrapped in comedic camaraderie? The data here is sourced from various internet locales that cater to this sort of thing.


FAQ:

  • Question: Can I get off on this stuff?
    • Yes.
  • Question: Do you get off on this stuff?
    • No.
  • Question: Where do the names of the stories come from?
    • I select a section of generated text from the story that tickles my fancy.
  • Question: I know a lot about this kind of stuff, can I contribute?