Thu. Jul 10th, 2025

Forum

Deleting robots.txt...
 
Notifications
Clear all

Deleting robots.txt slowed down my computer - Help

4 Posts
2 Users
2 Reactions
6 Views
Bramble Bunny
(@bramble-bunny)
Posts: 2
New Member
Topic starter
 

I'd like some help setting up my robots.txt file. I accidently deleted mine and for some reason my system started running so slow till it became unusable. Can deleting the robots.txt file do that?

 
Posted : 08/07/2025 8:18 pm
rjohnson reacted
(@rjohnson)
Posts: 4
Member Admin
 

Ah yes, this one is a classic. It would help to know more about your system. What version of Ubuntu are you running? I'll assume your running an Apache 2 web server. Please let us know if this is not the case. Is this your machine at home? A VPS maybe? God forbid a work machine?

Either way the answer is yes. If you remove your robots.txt all the sudden web spiders and crawlers think they no longer have access to spider/crawl your site. Legitimate indexes will respect your robots.txt but unfortunately there are search engines indexing your site as well as an explosion in AI bots rampant on the Internet these days plus bad actors attempting to Hoover up any bit of information they can.

When you removed your robots.txt this impeded some of the spiders/crawlers from indexing your site and lot of the automated crawlers will create or send spiders from multiple machines all over the internet plus  bad actors who don't care about the robots.txt attempting to "index" your system.

Depending on the sizing of the system in question, it's possible the flood of traffic can slow down your system with high system loads as bots furiously attempt to locate your data. This is more of a problem on sites like small VPS's with minimum cores and memory.

For example, if you had single virtual vCPU (or CPU) and 1 gigabytes of memory, it's very possible, even likely the bots will overwhelm your system with network traffic to the point it becomes very slow to unusable.

The good news is once they've got they're hands on all the information they can get, they will back off and go back to a reasonable polling period which will bring things back to normal.

If you have a larger sized system, say 4 vCPU cores and 8 gigabytes of memory, chances are this system will see a bump in the system load but should easily handle it. Eventually things will calm down and the storm of bot traffic will go back to a reasonable polling period.

If you really want to see the traffic on the order of an DDOS (Distributed Denial of Service)  attack create a robots.txt that flat out disallows ALL crawlers.

# robots.txt
# disallow everyone
User-agent: *
Disallow: /

This tells all crawlers that they are not welcome. This is akin to completely removing the robots.txt file.

So, what can you do about it? That really depends. If you're on a home system with a decent firewall/router serving your site then you have nothing to worry about unless your system is exposed directly on the internet. More likely than not the firewall will take the brunt of connection traffic and silently drop those connections or they'll just eventually time out.

If you have a VPS directly connected to the Internet with a public IP address, get as many vCPU's, or CPU's for that matter and as much memory as you can to minimize the effect of the traffic storm spiders/crawlers create. Another option is to create a virtual interface on your Ubuntu system, set it to a non-routable address (like those 192.168.0.x or 10.0.x.x addresses) and then forward your internet traffic through the public interface.

Unfortunately we can't be more specific at the moment as we'd need to know more about your system. The hardware, the type of internet connection, and what type of software you're running that might garner the traffic. Like a web server or something.

Tell us a little bit more about your setup and we can give you specific commands, procedures and a sane robots.txt file. There's not much that can be done with micro VPS instances or old computers but quite a bit can be done if your system is decently sized.

If you really want to see those spiders go nuts find an internet list with all the IP addresses of the crawlers/spiders. Firewall them and then you'll really see them go crazy with traffic. Some of them will throw hundreds of crawlers/spiders at your site which is SURE to cause problems.

Let us know more about your setup and thanks for the question.

 
Posted : 08/07/2025 8:49 pm
Bramble Bunny
(@bramble-bunny)
Posts: 2
New Member
Topic starter
 

Thanks! I did some further research and the information I discovered was just about the same as your reply. Here's my system details:

VPS with 2 vCPU cores and 4 gigabytes of RAM
80 gigabyte hard drive space
Apache 2.4 and Wordpress

I do run Apache 2.x web server, SSH, SMTP ports 25 and 587 and a few other ports. The other ports are hardware firewalled and only accessible from my home computer. Is this enough information to do something?

 
Posted : 08/07/2025 9:10 pm
(@rjohnson)
Posts: 4
Member Admin
 

Okay I can work with that. I have similar setups myself and will show you what I use.

robots.txt

User-agent: *
Disallow: /wp-admin/
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Disallow: *.php$
Allow: /wp-admin/admin-ajax.php
Crawl-delay: 30
Sitemap:  https://zettabytes.org/sitemap.xml 

I basically allow whoever wants to spider/crawl my site but I blacklist a few directories and php code as that's useless for indexing purposes. The Crawl-delay is more of suggestion as spiders/crawlers can ignore it. The admin-ajax.php is allowed as it provides essential Wordpress functionality like dynamic updates without browser refresh, handles logged in as well as anonymous users, auto-saving, post locking, and much more. It's already available publically and presents no security disadvantage.

The sitemap.xml provides an index so the spiders/crawlers can do their thing efficiently.

If there's a particular spider/crawler you want to block:

User-agent: FacebookBot
Disallow: /

Just add that pair of lines for each bot you want to block. Keep in mind however it is possible for malicious bots to ignore robots.txt. You'll need to know the name of their User-agent which can be found easily with a Google search.

Are you actually receiving mail over the SMTP and submission ports? If your only sending mail the ports do not need to be open. If you're using ufw (Ubuntu's Uncomplicated Firewall) you can block them with:

ufw deny 25/tcp to any
ufw deny 587/tcp to any

Or you can add them to your hardware firewall if not using ufw. You don't need the ports open if only sending.

You can get a free sitemap.xml from https://www.xml-sitemaps.com/

 
Posted : 09/07/2025 7:33 am
Share: