I’ve spent the last 3 months building a crawler to index the public parts of Telegram (https://telehunt.org). The native search is essentially a black box that favors the top 0.1% of bot almost invisible.
The Tech: I had to deal with rate limits and the lack of a global 'sitemap'. I’m currently using a hybrid approach of metadata scraping to keep the index fresh.
The Goal: It’s an experiment in making 'un-indexable' bot data discoverable.
You may be overestimating the number of bots that meaningfully exist. The vast majority of bots (and public channels) on the platform are nonfunctional and/or spam.
2 comments