it-swarm.com.de

Leiten Sie den Google-Crawler über .htaccess zu einer anderen robots.txt um

Ich habe den ganzen Tag nach der Antwort gegoogelt und konnte immer noch keine Antwort finden.

Ich habe eine virtuelle Unterdomäne www.static.example.com, die eine Spiegelseite von www.example.com ist. Das bedeutet, dass ich nur einen Stammordner für Subdomain und Domain habe.

Ich möchte Crawler zu anderen robots.txt Dateien - robots_static.txt umleiten, wenn sie .static in der URL sehen, in der ich die Indizierung über den Befehl /disallow verbiete. Ich möchte dies tun, weil ich Inhalte in den Google-Suchergebnissen dupliziert habe. Die Unterdomäne zeigt genau den gleichen Inhalt wie die Hauptdomäne.

Weiß jemand, wie ich erreichen könnte, dass Crawler robots_static.txt anstelle von robots.txt sehen?

Was ich bisher gefunden habe, ist Folgendes:

RewriteCond %{HTTP_Host} ^www.static.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots_static.txt [NC,L]

wenn ich jedoch die Webmaster-Tools einchecke, wird robots.txt weiterhin als meine Robots-Datei anstelle von robots_static.txt angezeigt, sodass alles zweimal gecrawlt und indiziert wird.

Was habe ich falsch gemacht? Vielen Dank

BEARBEITEN: Dies ist meine .htaccess Datei

##
# @package      Joomla
# @copyright    Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved.
# @license      GNU General Public License version 2 or later; see LICENSE.txt
##

##
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations.  It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file.  If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's.  If they work,
# it has been set by your server administrator and you do not need it set here.
##

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

## Mod_rewrite in use.

RewriteEngine On

RewriteEngine On
RewriteCond %{HTTP_Host} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_Host}/$1 [R=301,L]




RewriteCond %{HTTP_Host} ^www.static.*$ [NC]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC]
RewriteRule ^robots\.txt /robots_static.txt [NC,L]


## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Return 403 Forbidden header and show the content of the root homepage
RewriteRule .* index.php [F]
#
## End - Rewrite rules to block out some common exploits.

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
## End - Custom redirects

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

# RewriteBase /

RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC]
RewriteCond %{THE_REQUEST} !/system/.*
RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L]
RewriteCond %{THE_REQUEST} ^GET

## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %{REQUEST_URI} !^/index\.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %{REQUEST_FILENAME} !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
#
## End - Joomla! core SEF Section.

<FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$">
Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT"
Header set Cache-Control "public"
</FilesMatch>

<ifModule mod_headers.c>
    Header set Connection keep-alive
</ifModule>

########## Begin - Remove Etags
    #
    FileETag none
    #
    ########## End - Remove Etags
3
user3474818

Die Bots von Google möchten weiterhin /robots.txt von Ihrer Sub-Domain anfordern und nicht /robots_static.txt, was für sie keine Bedeutung hätte.

RewriteCond %{HTTP_Host} ^www\.static\..*$ [NC]
RewriteRule ^/robots\.txt$  /robots_static.txt [L]

Bei Anfragen nach /robots.txt von Ihrer www.static-Domain wird die Datei /robots_static.txt so bereitgestellt, als wäre es /robots.txt

7
Dave Lozier