Title: t0txt
Date: 2022-05-15
Category: Projects
Summary: Minimal command line pastebin. Allows you to upload text notes from a bash pipe or web browser.
Short: t

[t0txt](https://txt.t0.vc) is a minimalist pastebin. You can upload text notes from the command line by using a bash alias or by submitting text through the web form.

You can find the [source code](https://github.com/tannercollin/t0txt) on Github.

The pastes you upload take the form of [txt.t0.vc/IMLV](https://txt.t0.vc/IMLV), where they are identified by four unique capital letters. This makes it easy to memorize the URL while moving it between devices.

I wrote t0txt in July 2019 and plan to continue hosting it indefinitely. I use it quite often for sysadmin and automation work, so I'm committed to keeping it alive. Here's an example use case:

```
$ echo "hello world!" | txt
https://txt.t0.vc/IMLV

$ curl https://txt.t0.vc/IMLV
hello world!
```

## Spam Issue
After running t0txt for a while, I noticed there were a large number of pastes only containing random links. The service was being hit by backlink spam bots who try to submit any web form they find hoping they can spread links around. They do this to try and improve the search ranking of their client's websites. I added a simple CAPTCHA with the question "Who owns this site?" that checks for a substring of "tanner". This seems to have eliminated most spam. 

I found a lot of txt.t0.vc links around the internet with pastes containing all sorts of spam. Cheap pharmaceuticals, blogs, online casinos, porn, and surprisingly lots of essay writing services. I wanted to clean these up because I didn't want the URL to be tarnished. I wrote a simple [script](https://github.com/tannercollin/t0txt/blob/master/misc/clean.py) that deletes pastes based on spam-words. It deleted 22,500 pastes out of the total 33,000 in the database.

The spam cleaning script iterates over all pastes. If the paste contains a word from the whitelist, it continues because it's probably related to one of my projects. If it contains a banned word from a list, it's marked for deletion. If it contains at least two words commonly associated with spam, it's also marked for deletion. A large percentage of the spam was in different languages I can't read, so I randomly chose words hoping to not get many false positives. Finally it counts the number of occurrences of "http" and compares it to the number of lines. If the count and ratio is above a threshold, it's marked for deletion.

## Don't Advertise your Pastebin
Pastebins are one of those projects not worth advertising. You should keep it within your circle of friends and grow by word-of-mouth and people seeing the links to your pastes. Additional users don't really get you anything except a larger database you have to back up. Advertising it will just bring spam which tarnishes the reputation of your domain and any subdomains on it.

I regret advertising t0txt, but the cat is already out of the bag so it doesn't really matter going forward. It would be interesting to make a pastebin where the paste's domain is different than the submission domain if you want to keep it somewhat hidden.