IRC Bot Science

Carriage return in title:

https://irc-bot-science.clsr.net/test
Correct responses:

test QUIT :Look at me, I'm an IRC bot with security holes! (ignore this if it's all in one line)
testQUIT :Look at me, I'm an IRC bot with security holes!(ignore this if it's all in one line)

Incorrect responses:

(quitting with the quit message Look at me, I'm an IRC bot with security holes!)

Solution: strip carriage return and newline characters from the title before printing it

Valid but uncommon tag formats:

https://irc-bot-science.clsr.net/tags
Correct responses:

this is a site <title>

Incorrect responses:

(same as in a page without a <title> tag)
this is a site <title>

Solution: use a proper HTML parser or a more robust regex (e.g. something like <title[^>]*>([^<]*)</title\s*> in case-insensitive mode), then decode HTML entities in it; also see hard mode, which probably requires a HTML parser

No <title> tag:

https://irc-bot-science.clsr.net/notitle
Correct responses:

(no response)
no title here
[no title] (or equivalent)

Incorrect responses:

(error message or crashing)

Solution: handle the case where a title tag cannot be found

Long title messages:

https://irc-bot-science.clsr.net/long
Correct responses:

testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest... (or other truncated length)

Incorrect responses:

(getting flood kicked)

Solution: always truncate the result before sending it in a message (maximum IRC message size, including source, command and args, is 512 bytes, so maximum length is somewhere around 450)

Large file size:

https://irc-bot-science.clsr.net/internet.gz
Correct responses:

(no response)
(any variant of representing the number 1263157894736842240 bytes, filetype application/x-gzip or the filename the-internet.gz)

Incorrect responses:

(stalling forever opening the page)
(saying that the filename is internet.gz)

Solution: if handling non-HTML pages, use at least 64-bit integers to store filesize, get filenames from Content-Disposition and don't read the page content

IP address:

https://irc-bot-science.clsr.net/ip
Correct responses:

IP address: (the bot's IP (v4 or v6) address)
IP address: (something that masks the address)

Incorrect responses:

(none, this is just to show that the bot IP will be publicly accessible)

Solution: there is no viable way to detect any representation of the IP (e.g. if the string from the above link is obscured, use this link instead and base64-decode the result)

CTCP messages:

https://irc-bot-science.clsr.net/ctcp
Correct responses:

ACTION is a shit bot

Incorrect responses:

performing the CTCP action * BotNick is a shit bot

Solution: strip ASCII SOH (byte 0x01) from the start and end of the message or prefix the title with some string; note that CTCP can be abused for more disruptive actions, such as sending VERSION to a channel

Infinite redirect:

https://irc-bot-science.clsr.net/redirect
Correct responses:

(no response)
[too many redirects] (or equivalent)

Incorrect responses:

(following the redirects forever)

Solution: have a limit on the number of followed redirects; also handle redirect to different URLs

1 GiB HTML page (but HEAD returns Content-Length: 42):

https://irc-bot-science.clsr.net/fakelength
Correct responses:

(no response)
[page too large] (or equivalent)

Incorrect responses:

(getting OOM killed)
congratulations, didn't OOM (should have stopped reading sooner)

Solution: only read some of the page (e.g. 16 KiB), since most sane pages will have the title at the beginning; also have a timeout in case the page loads too slowly

Page with title at the beginning, followed by a gigabyte of data:

https://irc-bot-science.clsr.net/large
Correct responses:

If this title is printed, it works correctly.

Incorrect responses:

(same as in the 1 GiB page; there is a title within the first 81 bytes, no need to read the whole page)

Solution: only read the start of the page (e.g. 16 KiB) and try to find the <title> tag in that, even if it wasn't the whole page

1 GiB of small headers:

https://irc-bot-science.clsr.net/longheaders
Correct responses:

(no response)
[page too large] (or equivalent)

Incorrect responses:

(getting OOM killed)
Reading a gigabyte of headers surely seems like a waste... (should have stopped reading sooner)

Solution: include headers in your timeout and/or read size limit

Extremely long header

https://irc-bot-science.clsr.net/bigheader
Correct responses:

(no response)
[page too large] (or equivalent)

Incorrect responses:

(getting OOM killed)
Congratulations, you just read a billion digits of pi in a header. (should have stopped reading sooner)

Solution: set your limits on actual data read, not just number of headers

Common IRC link title bot vulnerabilities

Carriage return in title:

Valid but uncommon tag formats:

No <title> tag:

Long title messages:

Large file size:

IP address:

CTCP messages:

Infinite redirect:

1 GiB HTML page (but HEAD returns Content-Length: 42):

Page with title at the beginning, followed by a gigabyte of data:

1 GiB of small headers:

Extremely long header

Compose a <title> message: