Common IRC link title bot vulnerabilities

Carriage return in title:

https://irc-bot-science.clsr.net/test
Correct responses: Incorrect responses: Solution: strip carriage return and newline characters from the title before printing it

Valid but uncommon tag formats:

https://irc-bot-science.clsr.net/tags
Correct responses: Incorrect responses: Solution: use a proper HTML parser or a more robust regex (e.g. something like <title[^>]*>([^<]*)</title\s*> in case-insensitive mode), then decode HTML entities in it; also see hard mode, which probably requires a HTML parser

No <title> tag:

https://irc-bot-science.clsr.net/notitle
Correct responses: Incorrect responses: Solution: handle the case where a title tag cannot be found

Long title messages:

https://irc-bot-science.clsr.net/long
Correct responses: Incorrect responses: Solution: always truncate the result before sending it in a message (maximum IRC message size, including source, command and args, is 512 bytes, so maximum length is somewhere around 450)

Large file size:

https://irc-bot-science.clsr.net/internet.gz
Correct responses: Incorrect responses: Solution: if handling non-HTML pages, use at least 64-bit integers to store filesize, get filenames from Content-Disposition and don't read the page content

IP address:

https://irc-bot-science.clsr.net/ip
Correct responses: Incorrect responses: Solution: there is no viable way to detect any representation of the IP (e.g. if the string from the above link is obscured, use this link instead and base64-decode the result)

CTCP messages:

https://irc-bot-science.clsr.net/ctcp
Correct responses: Incorrect responses: Solution: strip ASCII SOH (byte 0x01) from the start and end of the message or prefix the title with some string; note that CTCP can be abused for more disruptive actions, such as sending VERSION to a channel

Infinite redirect:

https://irc-bot-science.clsr.net/redirect
Correct responses: Incorrect responses: Solution: have a limit on the number of followed redirects; also handle redirect to different URLs

1 GiB HTML page (but HEAD returns Content-Length: 42):

https://irc-bot-science.clsr.net/fakelength
Correct responses: Incorrect responses: Solution: only read some of the page (e.g. 16 KiB), since most sane pages will have the title at the beginning; also have a timeout in case the page loads too slowly

Page with title at the beginning, followed by a gigabyte of data:

https://irc-bot-science.clsr.net/large
Correct responses: Incorrect responses: Solution: only read the start of the page (e.g. 16 KiB) and try to find the <title> tag in that, even if it wasn't the whole page

1 GiB of small headers:

https://irc-bot-science.clsr.net/longheaders
Correct responses: Incorrect responses: Solution: include headers in your timeout and/or read size limit

Extremely long header

https://irc-bot-science.clsr.net/bigheader
Correct responses: Incorrect responses: Solution: set your limits on actual data read, not just number of headers

Compose a <title> message: