So you are getting an error 404
or perhaps even worse, a 500
!
The error in your Apache2 logs looks something like this:
***.162.245.*** - - [03/Apr/2020:12:49:50 +0000] "GET /robots.txt HTTP/1.1" 404 89670 "-" "Mozilla/5.0 (compatible; SomeUserAgent/2.1; +https://example.com)"
In a perfect world, you’d only have a single site/domain on this host, so will know that the robots.txt
file would reside in the Apache root serving directory.
However, I just happen to (as you?) have a ton of VirtualHosts on this machine, so am not sure which robots.txt
file is missing..
First steps to see
The very first thing you should do is check the output of apachectl -S
.
This will tell you where everything is and how things are setup in general.
Adding debugging
Create a file:
/etc/apache2/conf-available/temp_debug.conf
Put this config in it:
LogLevel trace4
GlobalLog ${APACHE_LOG_DIR}/debug.log "%v:%p %h %l %u %t \"%r\" %>s %O file=%f"
You can read more about Apache log formatting here if you need more/different output.
Enabling and Disabling
Now you just need to enable it, so run this from the commandline:
sudo a2enconf temp_debug && sudo apachectl graceful
Now you will be able to see your new found logs being dumped to:
tail -f -n100 /var/log/apache2/debug.log
When you’re done and have resolved the problem, you can disable this log ingestion by doing the following:
sudo a2disconf temp_debug && sudo apachectl graceful
Reloading Apache
Remember to reload apache by doing the following both after enabling and disabling the configuration changes:
systemctl reload apache2