#6124 closed enhancement (wontfix)
Trac should ship with a default robots.txt file
Reported by: | Owned by: | Jonas Borgström | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | general | Version: | |
Severity: | normal | Keywords: | robots crawler robots.txt |
Cc: | ilias@… | Branch: | |
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
It would be convenient if trac shipped with a robots.txt file out of the box that is designed to stop search engines from indexing every possible page/revision/log combination. Googlebot for example will, first time around, attempt to view/index every possible page on the site, which due to GET query nature of trac means that it can easily make 40,000+ requests while attempting to index a site.
Therefore, to save administrators the hassle of firstly having many thousand (mostly unnecessary) requests being made by bots and secondly having to formulate their own robots.txt file it would be a wise move to ship one which prevented bots from fetching diffs/old source revisions (which are unlikely to ever make it into the index anyway).
Attachments (1)
Change History (16)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Cc: | added |
---|
comment:3 by , 17 years ago
I've uploaded an example robots.txt file, naturally it wouldn't be enabled by default (hence .default). If the user wants a robots.txt they would at least know what the format is and can simply rename it to the correct name in the correct folder.
I think the better way would be adding a rule in the conf/trac.ini file for each component. For example:
[ticket] default_component = unassigned indexing = disabled
would disable /newticket and /ticket/ from being indexed by search engines.
The downside would be that the user couldn't specify which search engines could index each folder (but arguably if they want that much granularity, they'll know how to use robots.txt).
comment:4 by , 17 years ago
Keywords: | crawler robots.txt added |
---|
I would love that feature too, I have many public projects so Googlebot kills my performance because it is constantly requesting zip-files of changes.
It would be nice if such an option in the INI file would add a
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
line to the HEAD of the page so bots would ignore it.
As a workaround I created a robots.txt file with some wildcards in it. With many public projects adding a few lines for each project is not really handy. Unlike defined in the documentation at least Google seems to accept wildcard entries to robots.txt.
In case you use mod_python bug #5584 shows how to get robots.txt to work.
follow-up: 6 comment:5 by , 17 years ago
stupid question, but where do I stick this robots.txt?
I've tried htdocs/
no go
comment:6 by , 17 years ago
Replying to anonymous:
stupid question, but where do I stick this robots.txt?
I've tried htdocs/
no go
I have it in htdocs/ and it works for me (trac 0.10.4). Are the permissions correctly set so that trac can read the file?
follow-up: 8 comment:7 by , 16 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
As explained in comment:1, the robots.txt
file must be placed at the root of your web site, and this is highly dependent on the specific installation. So Trac cannot install a default file by itself.
If Trac is at the root of the web site, the th:RobotsTxtPlugin can be used to serve the robots.txt
file. Please ask the author nicely if he can update the plugin for 0.11 :-)
follow-up: 9 comment:8 by , 16 years ago
Replying to rblank:
As explained in comment:1, the
robots.txt
file must be placed at the root of your web site, and this is highly dependent on the specific installation. So Trac cannot install a default file by itself.
Even if default robots.txt is not possibly, you should place an example robots.txt file with some further information, in order to make the installation easier.
Maybe you should just change the title, instead of closing as "wontfix"
follow-up: 10 comment:9 by , 16 years ago
Replying to ilias@…:
Even if default robots.txt is not possibly, you should place an example robots.txt file with some further information, in order to make the installation easier.
Feel free to add a section in TracInstall with an example robots.txt
.
comment:10 by , 16 years ago
Replying to rblank:
Replying to ilias@…:
Even if default robots.txt is not possibly, you should place an example robots.txt file with some further information, in order to make the installation easier.
Feel free to add a section in TracInstall with an example
robots.txt
.
Fell free to listen to your user base, and to rational change suggestions, like this here from "Reported by: freddie@…".
Or feel free to ignore them, like hundreds others.
follow-up: 12 comment:11 by , 16 years ago
I don't see why it's Trac's job to explain to users how to use something as specific as robots.txt. There's already a fair amount of documentation for basic Apache configuration, but at least that is directly related to getting Trac up and running.
I mean, it can't hurt for someone to add a sample to the wiki page, but it's hardly a priority. If it were Trac's job to help configuring robots.txt, why stop there? Maybe some users need help with their resolv.conf, or their /etc/network/interfaces (or whatever the RedHat equivalent is), or their main.cf for postfix. The Trac team can't be responsible for helping users with every single aspect of their system configuration. That's what things like jumpbox exist for.
comment:12 by , 16 years ago
Replying to ebray:
I don't see why it's Trac's job to explain to users how to use something as
Trac team can start to learn from his faults, like e.g. a 2 years delay in accepting an rational change request like this one: #3730.
Or it can continue to "Babble Instead of Evolve".
More details: http://case.lazaridis.com/wiki/TracAudit
It's really unbelievable how you ignore the user feedback.
follow-up: 14 comment:13 by , 16 years ago
Wow, you have your own Trac for whining about Trac. That's really quite special. It's a shame too, since many of them are valid concerns. Anyways, I should know better than to be responding to a troll, so I'll stop there.
comment:14 by , 16 years ago
Replying to ebray:
Wow, you have your own Trac for whining about Trac. That's really quite special. It's a shame too, since many of them are valid concerns. Anyways, I should know better than to be responding to a troll, so I'll stop there.
The "Troll Theory" subjecting my person is far out of date.
Even the dumbest persons have stopped with this cheap excuse for their own inability:
http://case.lazaridis.com/wiki/CoreLiveEval
Anyway, you should focus on the essence of this ticket: simplify installation of trac.
comment:15 by , 15 years ago
For the record here, to add a robots.txt to your configuration, you simply need this apache configuration line:
Alias /robots.txt /var/www/trac-robots.txt
or wherever you want to put your robots.txt file. The sample attached earlier is good, but you can also use this to simply block all robots from accessing everything in the context you are putting the alias (VirtualHost, server, etc):
User-agent: * Disallow: /
Given that Trac doesn't know where is will be installed, this isn't possible directly. An example one on the wiki isn't a bad idea. If you want to serve such a file directly from Trac (and your URL schema is setup to allow that), look at the RobotsTxt plugin over on trac-hacks.