Common IRC link title bot vulnerabilities
Carriage return in title:
https://irc-bot-science.clsr.net/test
Correct responses:
test QUIT :Look at me, I'm an IRC bot with security holes! (ignore this if it's all in one line)
testQUIT :Look at me, I'm an IRC bot with security holes!(ignore this if it's all in one line)
Incorrect responses:
- (quitting with the quit message
Look at me, I'm an IRC bot with security holes!
)
Solution: strip carriage return and newline characters from the title before printing it
Valid but uncommon tag formats:
https://irc-bot-science.clsr.net/tags
Correct responses:
Incorrect responses:
- (same as in a page without a <title> tag)
this is a site <title>
Solution: use a proper HTML parser or a more robust regex (e.g. something like
<title[^>]*>([^<]*)</title\s*>
in case-insensitive mode), then decode HTML entities in it; also see
hard mode, which probably requires a HTML parser
No <title> tag:
https://irc-bot-science.clsr.net/notitle
Correct responses:
- (no response)
no title here
[no title]
(or equivalent)
Incorrect responses:
- (error message or crashing)
Solution: handle the case where a title tag cannot be found
Long title messages:
https://irc-bot-science.clsr.net/long
Correct responses:
testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttest...
(or other truncated length)
Incorrect responses:
Solution: always truncate the result before sending it in a message (maximum IRC message size, including source, command and args, is 512 bytes, so maximum length is somewhere around 450)
Large file size:
https://irc-bot-science.clsr.net/internet.gz
Correct responses:
- (no response)
- (any variant of representing the number
1263157894736842240 bytes
, filetype application/x-gzip
or the filename the-internet.gz
)
Incorrect responses:
- (stalling forever opening the page)
- (saying that the filename is
internet.gz
)
Solution: if handling non-HTML pages, use at least 64-bit integers to store filesize, get filenames from Content-Disposition and don't read the page content
IP address:
https://irc-bot-science.clsr.net/ip
Correct responses:
IP address: (the bot's IP (v4 or v6) address)
IP address: (something that masks the address)
Incorrect responses:
- (none, this is just to show that the bot IP will be publicly accessible)
Solution: there is no viable way to detect any representation of the IP (e.g. if the string from the above link is obscured, use
this link instead and base64-decode the result)
CTCP messages:
https://irc-bot-science.clsr.net/ctcp
Correct responses:
Incorrect responses:
- performing the CTCP action
* BotNick is a shit bot
Solution: strip ASCII SOH (byte 0x01) from the start and end of the message or prefix the title with some string; note that CTCP can be abused for more disruptive actions, such as sending VERSION to a channel
1 GiB HTML page (but HEAD returns Content-Length: 42):
https://irc-bot-science.clsr.net/fakelength
Correct responses:
- (no response)
[page too large]
(or equivalent)
Incorrect responses:
- (getting OOM killed)
congratulations, didn't OOM
(should have stopped reading sooner)
Solution: only read some of the page (e.g. 16 KiB), since most sane pages will have the title at the beginning; also have a timeout in case the page loads too slowly
Page with title at the beginning, followed by a gigabyte of data:
https://irc-bot-science.clsr.net/large
Correct responses:
If this title is printed, it works correctly.
Incorrect responses:
- (same as in the 1 GiB page; there is a title within the first 81 bytes, no need to read the whole page)
Solution: only read the start of the page (e.g. 16 KiB) and try to find the <title> tag in that, even if it wasn't the whole page
1 GiB of small headers:
https://irc-bot-science.clsr.net/longheaders
Correct responses:
- (no response)
[page too large]
(or equivalent)
Incorrect responses:
- (getting OOM killed)
Reading a gigabyte of headers surely seems like a waste...
(should have stopped reading sooner)
Solution: include headers in your timeout and/or read size limit
Extremely long header
https://irc-bot-science.clsr.net/bigheader
Correct responses:
- (no response)
[page too large]
(or equivalent)
Incorrect responses:
- (getting OOM killed)
Congratulations, you just read a billion digits of pi in a header.
(should have stopped reading sooner)
Solution: set your limits on actual data read, not just number of headers
Compose a <title> message: