|
Hi, Google thinks that we have a load of
duplicate URLs in our website. We re-launched this a week or so
ago and it indexed nearly 14K lines before throwing up this
issue. The industry we work in means that part numbers (and hence
URLs) can look similar even though they are different. I have
taken the top three of the sitemap (www.tencell.com/sitemap/file1.txt)
as an example, but it is show this issue with all nine of the sitemap
files even though they are all unique. Duplicate URL This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit. URL: http://www.tencell.com/15425-p-1.html Problem detected on: Jul 23, 2009 Errors - Duplicate URL This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit. URL: http://www.tencell.com/15426-p-2.html Problem detected on: Jul 23, 2009 Errors - Duplicate URL This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit. URL: http://www.tencell.com/30691-p-3.html Problem detected on: Jul 23, 2009 Errors - Duplicate URL This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit. URL: http://www.tencell.com/30692-p-4.html Problem detected on: Jul 23, 2009 What do you think? Why is it stating that these are duplicates even though they are not??!! Any advise would be appreciated. Nick
All answers
1
person
says this answers the question:
Yep, I have the same problem, as it appears many others have also. Not sure whats going on. I wouldnt make any changes until this is resolved. Obviously a Google issue.
HI Johnny, That sucks! At least
its good to know that this isn't a specific issue to my website.
That would also make sense as it was fine for the first 7-10 days of
indexing, it is only since yesterday that this has become apparent. So where do we go from here? Do i need to contact google/make a complaint? Nick
Id sit it out for the moment. There have been numerous posts in the sitemap forum http://www.google.com/support/forum/p/Webmasters/label?lid=401d0e67c19e20e9&hl=en I wont be making any changes to my site or my sitemap until I receive a response from Google support. I would suggest subscribing to this thread and waiting to hear from someone.
Hey, ok great. Will you let me know if you hear anything? Thanks!
We should both be notified once a Google representative posts a
solution or at least an answer within this thread. Until then Im just
going to sit tight and wait. I guess if we dont hear anything within
the next 24 hours, we will have to action this as required. Has anyone else reported this as being resolved yet???
Count me in too if you find out a solution!!
count me in too, my shopping cart changes pagination pages auto and i
can't access those or change them at all, they will end in ss2, ss3,
ss4 and so on. I really need G to give a solution their end as i am sure i am not the only one using this shopping cart
2
people
say this answers the question:
Same here. my home page is: www.mywebsite.com/ But working in frontpage, the page is published as www.mywebsite.com/index.html www.mywebsite.com/ is redundant with www.mywebsite.com/index.html because they are the same exact page. all Google search results goes to www.mywebsite.com/ and all internal and external links point to www.mywebsite.com/. my sitemaping service develops both www.mywebsite.com/ and www.mywebsite.com/index.html as listings in the sitemap. So I manually removed the www.mywebsitemcom/index.html as one of the listings in the sitemap. Google Webmaster tools is now happly. Don't know if I am correct, but it does solve the GWT error and the sitemap has been loaded and status says OK.
@ tencell. I hope you dont mind me saying but your site doesnt render correctly in IE7 - it is shifted right.
It is also very slow to load each page. I would try removing
the right hand scroller for a test to see if that fixes it - and if it
does - then make all the stuff for the right hand scroller load last.
I noticed the issue today. So i removed /index.html from the sitemap and still got error so Now i'm removing the domain.com and using /index.html i hope that works
1
person
says this answers the question:
I am having the same problem as of today (july 29) on 4 of 6 websites
And no, it's most certainly NOT an canonicalization problem.
Who the heck who said this question is answered???
Is there any way to find that out? Is that what the Forums are about?
Taking any answer as answered so valid questions Google can't--or
won't--answer just die on the vine? That's beyond childish.
1
person
says this answers the question:
i have the same problem.i built my site with yahoo page builder and
when i create the sitemap it has www.spiti-oikia.com/ and
www.spiti-oikia.com/index.html. i
ask a lot of times and i search at net...i remove the
www.spiti-oikia.com/index.html and i add the canonical tag at the
index.i submit again the sitemap and i have not any problems for
now.sorry for my english,i am from greece.
Hi! The same problem with me since today. I had
no problem till now but suddenly it shows"Duplicate URL. This URL is a
duplicate of another URL in the sitemap. Please remove it and
resubmit." I checked the directories and there are no duplicate files.
Don't know what to do? My sitemap is at
http://www.seoniti.com/sitemap.xml.
Yes, just had the same error mesage. Please advise. "This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit."
1
person
says this answers the question:
Check the URL's in your sitemap file. There should be quite a few duplicates there. Remove them and resubmit your sitemap. Hopefully this will work. I had the same problem on my website: http://www.seoniti.com/ I
manually checked my sitemap file and found few duplicate URL's there.
After removing the duplicates, I uploaded and re-submitted the new
sitemap file. It worked. My website is small so I could do it
manually, but if your website is big, you will have to somehow get
those duplicate URL's removed from your sitemap. Probably till yesterday, Google was sympathetic to those issues. Not anymore now. So check your files before re-submitting. Good Luck.
2
people
say this answers the question:
"Check the URL's in your sitemap file. There should be quite a few duplicates there." BS.
Why with no changes are we all receiving this error message all of a
sudden. If we had duplicates in there, dont you think we would have
identified them and removed them before dozens of us bothered posting
this issue here. Does anyone have a definitive answer to this problem? Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ????
Hi Johnny, Go to your sitemap.xml file in your root directory and remove all duplicate looking URL's from there. That will work. Good luck. Ashish
Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ???? so you do have duplicate URLs?
I have the same problem http://tech.gate.io this is a tech blog and wiki site sitemap under http://tech.gate.io/sitemap.txt I have no duplicates in the file the only problem perhaps is, that blog1 is the whole blog and startpage, and the blogposts are indexed one by one as well so some of my blogposts are duplicated on the main site every blog owner must be facing this problem too, or is it just a bug?
1
person
says this answers the question:
I am a nuby and I have VERY limited knowledge about all this but I am
getting the same Duplicate URL message. I looked at my sitemap and
can't see any duplication. Everything was fine till a couple of days
ago. What effect does this have on adding new pages (which I do every
day) etc to my site?
1
person
says this answers the question:
There IS no duplication, glendene. It's a Google MISTAKE
that they simply keep ignoring, hoping we'll all go away and leave them
to keep screwing with our indexed page results in various parts of the
country and the globe to favor people that pay them for search results.
Google's been making thousands of mistakes this year--huge and small. The common denominator is that this
year they're not admitting to ANY of them. Apparently from this
point forward in Google's corporate history, all Google mistakes,
blunders, screwups, and crappy programming are the fault of the
webmasters employing Google for search traffic or monitoriing their
sites through Google's newly abominable Webmaster Tools.
But as you've probably discovered, along with the rest of us: Google's Webmaster Tools sitemap configuration module is the only
place in their newly revised Webmaster Tools that shows the actual
number of your site's pages that they've included in their indexes.
There is no other accurate means of assessing your progress in
having your pages returned to Google's index after yet another one of
their index-dumping miscalculations, is by monitoring that sitemap
module in Webmaster Tools.
Dnyhagen, you can always fall back on site:www.sitename.com to see roughly what's been indexed
1
person
says this answers the question:
Uh, no, 1918, as you WELL KNOW. The ONLY accurate
measure FROM GOOGLE is GOOGLE's measure of the thousands of perfectly
valid--and historically valuable--pages they've dumped en masse from
their index from time to time recently only to trickle them back into
their indexes at the rate a 5 to 6 reindexed pages a month. All
this rubbish about the sitemap subsection of Webmaster Tools being
useful--or meaningless for that matter--is pure rubbish. You know
it and Google knows it.
Should you need any proof, simply note
the utter and complete absence of any response from Google over this
latest Webmaster Tools fiasco--in a long recent string of Webmaster Tools fiascos.
Then
of course we could all draw the conclusion that if, as Google reps
repeat here time and time again, the sitemap subsection of Webmaster
Tools is meaningless, that Google's programmers were either:
1. Idiots for incorporating an utterly meaningless module into Webmaster Tools in the first place. 2. At the least, disengenuous, for telling us that the sitemap module serves any valid purpose whatsoever. 3. Covering Google's corporate patoot by going through the motions of appearing to correct a host of their incompetent missteps from the recent past. 4. Duped just like the rest of us
into believing that Google is honestly attempting to make their search
results more valid and valuable to searchers without either regional
throttling to skew the results, favoring millions of AdSense and AdWord
subscribers, or giving the impression of favoritism to either Web 2.0
clients, social networking sites, or blog sites over traditional
websites.
Google's continued silence of this issue--and a host
of other recent issues--simply raises more questions about Google's
current competence, integrity and validity than it answers.
Sorry to be the one of the few to notice that The Emperor is butt nekid of late, but that's just me talking . . .
1
person
says this answers the question:
I just opened up my sitemap.xml and realized I had my
http://www.straydogmarketing.com AND
http://www.straydogmarketing.com/index.html I deleted the .index and now it works!
I've heard that if you post a complaint on Twitter, a company may
respond more quickly to get the problem resolved. Those of you
who Twitter might want to consider posting the Google Glitch
there. Reading through some of the posts here, this Google
problem appears to be a very serious one for a lot of companies out
there. I can wait it out, but it sounds like a lot of other businesses
are getting totally frustrated and, more importantly, losing revenue.
1
person
says this answers the question:
MT6999999 - That's exactly what I did. I deleted the URL
and left the index page, and it worked. I got rid of the red X
anyway and got the green check mark. After Google explains what's
going on, I think we'll all be putting that line back in the
Sitemap. It didn't feel right to remove it. Plus, it's
always been in the Sitemap. As for now though, it seems to have
corrected the problem. Google probably won't find us at all now!!
1
person
says this answers the question:
Heh, Bingo! davebarley! You're almost certainly correct, Sir!
All that remains now is for Google to send out a form letter to every
sitemap.xml generating program supplier, that Google's made a teensy
change to their Webmaster Tools that will now entail hundreds of them
to re-write their sitemap.xml preparation programs to account for this
latest 'beneficial tweak' that Google's made to their Webmaster Tools
Sitemap module.
Simple, no?
Hi all. Looks like i am not the only one who is having problems after all! As
i created the sitemap files for all 440,883 lines through excel and the
useful concatenate feature I can easily find duplicated URLs. In
this case, there are no duplicated URLs that i can find on the
sitemap! There are similar URLs, but that is because (for
example) there might be various colours of the same LED, so a part
number might be 10 characters long, but only the 10th digit may change
to signify blue instead of red. So nobody has heard anything
from Google? Nothing from John Mu? He normally has
something to say on stuff like this...... I suppose the question should be, what CAN we do about this?
Hello everyone, For what it's worth. My
site www.funphoto.com.au has been around for several months with no
sitemap issues. Suddenly, I have been recieving the same error as
others here. I have now removed www.funphoto.com.au/index.html and it
seems to have fixed the problem. Regards, Vince
Hi Vadamo, Some others on here have said
they have had success using this method. Unfortunately for me, I
have only listed the product URLs and nothing else. No company
info, blog, index page, latest news etc. So that can't work for
me! Nick
From looking at your sitemap.xml you're directing it to
the numerous text files you're employing for sitemaps. I'm sure you've
already gone through this exercise, but have you made absolutely
certain that none of the extremely long aggregations of part numbers
you're listing as pages aren't duplicated somewhere in one of those
text files?
I noted that you've at some point or another
employed, for example, a ''1-'' prefix for many of your parts that seem
in line with the none '1-'' prefixed similar part numbers? Perhaps you
could throw together a little excel function to double check?
I can sympathize with your problem. Your sitemap text files look to be a daunting preparation issue.
Are they all really pages you need to point Google to? (e.g., as opposed to simply letting Google find them on their own.)
I don't mean to oversimplify the your very valid issue. Just talking out loud here . . .
Just a note . . . I reversed what I did to correct the duplicate URL
problem. I deleted the index page from the Sitemap and left the
URL in. Now, as someone suggested in the forum, I am going to do
a 301 redirect on my index page. He said that it's a good idea
just in case someone is linking to or searching for your URL with the
index.html included. MT6999999: This seems to be the
method that most people are suggesting in the forum.
Anyone a little angry about this if you delete
your main page or index pages your dropping back links which are
driving your websites postion on the search engine This is bad bad BAD
And now I'm more angry when faced with which to delete from my site map
I deleted the index page but google is not happy enough with that they
want to take about 80 backlinks off me by having me delete my main url Man I'm Fuming
schmidtpainting, you're submitting both http://www.schmidtpainting.net/index.html and http://www.schmidtpainting.net/
in your Sitemap file. This doesn't make sense since it's really the
same page. If you submit only one of them, the warning will go away.
Please note that this message in Webmaster Tools does not signal a
change in the way we've been processing your Sitemap file.
Cheers John
Hi TenCell, can you confirm that you haven't changed
your Sitemap text files since seeing this message? I think that's the
case, but just want to confirm it to be certain. I've passed your
Sitemap files on to the team and will get back to you once I know more.
Thanks for your patience.
Cheers John
1
person
says this answers the question:
Hi Funksen, you've got the URL " http://tech.gate.io/blogpost12"
listed twice in your Sitemap file. If you remove one of those lines,
the message will disappear. You should probably be seeing this URL in
the message in your Webmaster Tools account as well. Cheers John
Anyone else lose all their indexed URLs the day after
making this duplicate index.html vs. canonical fix by removing one or
the other? I had 1200 indexed URLs before the duplicate removal.
Now only 2.
@ John As already asked above.... Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ???? And why all of a sudden are we having these issues when no changes were made?
@ Dnyhagen "Anyone else lose all their
indexed URLs the day after making this duplicate index.html vs.
canonical fix by removing one or the other? I had 1200 indexed URLs
before the duplicate removal. Now only 2." Yuck, Im not touching anything.
None of that seems to have worked for me: I changed the 301-redirect on the server to point index.html to www.digitaldeliftp.comI ensured that my sitemap location in robots.txt was the www.digitaldeliftp.com location. I regenerated my sitemap.xml with all www.digitaldeliftp.com prefixes I removed the 'duplicate' www.digitaldeliftp.com/index.html that coffeecup software has been incorporating into my sitemap.xml for over two years now without consequence. I
resubmitted my sitemap.xml, got the green check mark and I'm still back
to 2 URLs indexed from the previous 1200 that took me the last six
months to get back to from the over 2600 I'd had indexed historically
going back as long as two years. Must still be doing something wrong, obviously. But what?
@Dnyhagen: While this message is newly visible in
Webmaster Tools, the whole processing of Sitemaps files has not changed
recently. If something else happened at the same time as the message
showed up in your account, they generally aren't related. I'll
check out your other threads in a bit to see if there's something I can
add there. @Johnny Ramone: If both http://www.example.com/ and http://www.example.com/index.html
show the same content, there's no reason to submit both via Sitemaps
(since we'll pick one and hide the other one in search results anyway).
Personally, I'd pick the shorter URL, but the choice is ultimately up
to you. Cheers John
@ John Im concerned that my indexing will be effected. If I leave it, will everything continue to work without issue? Im not keen on losing months worth of work here.
@Johnny Ramone: Sure, leave it like that if you prefer.
However, the advice to pick a canonical version is pretty good and has
been around for a LONG time :-). For example, Matt Cutts mentioned it
back in 2006 in http://www.mattcutts.com/blog/seo-advice-url-canonicalization/ Personally,
I'd choose a canonical URL and remove the duplicate, but I can
understand that you might want to be careful and wait to see what all
happens. Search engines are strange "black boxes" sometimes and taking
a step back to wait for things to settle down is almost never bad
advice. When you're ready, if you want, you can follow up your
choice of URLs by using the rel=canonical link element. This gives us
another signal to let us know which version you prefer to have indexed.
You can find out more about it in our blog post at http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.htmlCheers John
JohnMu, You are stating the obvious here. We all know that they are
flagging the index file as a duplicate. This has NEVER been an
error before. My sitemap is create by the Google Sitemap
Generator on some sites the first edition and on other sites the second
edition. SUDDENLY Webmaster tools is showing this as an
error. It is a new condition. It is NOT because these sites
have just started putting index.html in the sitemap. It IS
because Google suddenly started flagging the output of THEIR OWN
sitemap generator as an error. Stop telling people to delete the
index from their sitemaps because they will lose rank due to missing
backlinks. Webmasters should WAIT for google to fix the
condition.
Hi budster You're right - this message is new.
However, the processing in the background has not changed. We've always
treated these URLs as duplicates, but we're now exposing this to our
users so that they can learn from this as well. Removing duplicates
from a Sitemap file always makes sense provided you leave one copy. It
will not affect your page's rankings in a negative way because we've
always filtered out the duplicate version anyway.
Cheers John
I am a programmer and have been for over 35 years. Are your
telling me that your employer expects us to change every sitemap on the
internet? It is normal for Web servers to automatically display
an index page when the URL states only the FQDN without a page in the
URL! The sitemap checking software SHOULD filter the duplicate
for at least index.html, index.php, index.asp maybe any page with a
basename of index matching the FQDN. Sure expose other
duplicates. It's a totally different issue.
When mentioned my programming experience I
had meant to mention it's easier to change either the Google Sitemap
Generator or the Sitemap checking software than for us to add a "drop"
index line to every siteconfig.xml file on every website on every web
server all over the world.
pssstttt. . . budster. . . . you might want keep
reality checks on the down low here in the forums. The skins here are
as delicate as precious young orchid buds for one, and any hint or mere
mention of how things work in the real business and professional world
outside of Google is frowned upon. To prepare yourself to post
your next comment, just remember to genuflect in the direction of
Mountain View, keep real world common sense to yourself, and never ever
ever suggest that Google may have overreached a teensy bit from
time to time. The last thing you want to do here is get shunned for
being honest and candid. Think GroupThink and you'll get along just dandy.
@ John, I haven't changed the sitemap as
i thought that this situation would get resolved sooner or later,
especially seeing as though it is happening all over. Webmaster
tools are now showing that the first of my sitemap files is fine, and
the red cross on the remaining eight has changed to an exclamation
mark. It is still stating that there are duplicates in the
sitemap, however i have gone though these with a fine tooth comb and
there are all individual and unique to each part. Also, on a
related issue, is it normal for the first sitemap to contain the most
indexed URLs, and the last containing barely any? I'm sure that
you said there was no hierarchy on a previous thread. Mine goes
from over 6k indexed lines on the first 'map, to 5K8 on the second
'map, down to 1K1, 72, 14 and then zero on the remaining 'maps. Any help appreciated!
Hi TenCell It's good to hear that things are looking
up with regards to your Sitemap files :-). It's possible that the
engineers will remove this message for the moment anyway, since it
seems to have confused a lot of people (the processing in the
background will remain the same though). Regarding the
indexing of your URLs, that's a bit more complicated, especially when
the site has a size like yours. One element which is always important
though is that URLs should be well-connected with links, either from
other pages within your site or from other websites. With that in mind, I took a random URL from your seventh Sitemap file and took a quick look at it: http://www.tencell.
com/88970422-p-300125.html -- apart from the fact that it looks like
there is very little unique and compelling content on the page (I know
this is a problem with some article databases, perhaps the part number
alone is the most important part, you'd know best), it appears that
this URL is not linked anywhere from your website, nor does it link to
other articles on your site (eg. related components, groups or
sub-groups). This makes it very hard for us the judge the relevance of
a single URL like that and in the end, it makes it hard for us to judge
how important it is that this particular URL should be indexed like
that. So my advice would be to not rely completely on Sitemap
files but also to make sure that your content can be found with links
within your website -- and that content on your website links out to
similar content within your site. Also, adding a bit of unique and
compelling content to the article pages is generally a good idea, if
there are ways that you could do that :-). Hope it helps! John
HI John, Thank you for your
response. Unfortunately in our industry (electronic components
market) descriptions for products are very similar for the entire
series, with the only difference being (for example) colour. This
will only make a minute change to the part number, example 5-1457895-6,
with the -6 changing to a -7 to signify light brown instead of
grey. I will talk to the designers about grouping the parts by
series & manufacturer, but this is a problem that i feel isn't
going to go away. It looks like the sitemap is showing all green ticks now, so fingers crossed! regards, Nick
thank you JohnMu, you are absolutely right, at first, every site was listed as duplicate entry, which must be a bug after resubmitting the sitemap.txt, just blogpost12 is marked as duplicate, which is my fault cheers
Thanks for the heads up Dnyhagen. Sorry JohnMu! I didn't
change anything tough and my sitemaps are all back to Green
Checkmarks. Seems that someone realized that there is only one
index.html and that serving it by default when not asking for a certain
page IS normal for every webserver out there. Marking it qas a
duplicate when there is only one of them was a huge mistake and it's
been fixed. :D
Even though this issue has annoyed me no end, I
must say that Ive been laughing my head off at some of the posts
submitted to this support forum. In particular from Dnyhagen! I decided to log in today to see where things are at after having made NO CHANGES at my end, and guess what? A GREEN TICK! Nice one ...*grumble
Yeah and did you read where it was change back because WE were confused. OMG!
Of course its US - not THEM. 6 million webmasters couldnt be right at the same time could they?
Give me the source, just that one condition, I can fix it and
send it back in less than 20 minutes including testing. It will
flag real duplicates and filter any file with a basename of "index.*"
that matches the http://xyz.com/ URL I don't care what language
the code is in. Real duplicate files would still be marked as
such. Tencell would still have to put Green or Blue in the page
though. By the way the sitemap info in webmaster tools says that
you don't need one for sites having links to every page. Sitemaps
are for locating content not available through linkage. E.g.
content findable only via internal search function.
HI Budster, I wish it was my call to do
that, it would make indexing far easier! Due to the fact we have
450k URLs, i have to rely on the manufacturer's own description which
can be pretty poor. Not sure it would be realistic to go through
all of these individually! All customers will search for the
specific part number and that is what i have based the URLs on.
Each URL has the manufacturer part number in it. Perhaps a bit
crude, but i'm not sure how else to do it! This is why so many of
the URLs look similar even though they are not..... Nick
Off Topic Warning: Hey TenCell, I know it's a small world in your
industry. Would you by any chance know Mike from I-DEAL
Components? He is one of my best friends.
|
JohnMu
Google Employee
1:53 PM
|
Hi TenCell I followed up with the team regarding
your Sitemap files and it does look like there was an issue on our side
there. Sorry about the confusion. It seems this was only temporary
though so it would have cleaned itself up after the next processing
(which we pushed forwards a bit to get this error out of your
dashboard). Regarding indexing of your Site, one thing you
could do is to provide overview pages for your parts. These pages could
link to the individual part pages and contain a bit more information
about the parts that they link to. In an attempt to keep things simple,
it might even make sense to only submit "content-rich" pages via
Sitemaps, especially if you are manually generating your Sitemap files
at the moment. It's always hard working out the best configuration for
a site like yours, so I'd keep a good eye on the users and try to make
it as easy as possible for them, they're frequently the best sources of
inspiration (it works here in the forum for us as well :-)). @everyone:
We've currently disabled these messages since it did confuse quite a
number of users. We are however still processing Sitemaps files the way
we've been doing so in the past, which means that we will still
generally regard "/" and "/index.htm" as duplicates internally (but we
won't bother you with messages - at least for the moment). I
would still recommend only submitting either "/" or "/index.htm" in the
Sitemap file if both URLs lead to the same content. It won't make or
break your website, but clean canonicalization is always a good
practice. For more information on this canonicalization stuff, I'll
re-post the links from above + a few others which I think are pretty
good: - http://www.mattcutts.com/blog/seo-advice-url-canonicalization/- http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html- http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html- http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.htmlCheers John
A very minor point: "Search engines
can do things like keeping or removing trailing slashes".. If
your web browser requests a URL without the trailing slash Apache will
send back a redirect to the URL with the trailing slash. If the
search engine removes the slash then it is doubling number of requests
wasting bandwidth on the internet. A more pertinent point. It
would help us immensely if you would fix Google Sitemap Generator to
only submit the best canonical URL by default! Having it behave
badly is the root problem here isn't it? Don't forget that Beta
2009 version. Remember I'm not talking about REAL duplicate pages
but the normal default index pages used by 99% of the websites in the
world. The one percent aberrations can fix it themselves. I
would very happily reinstall G.S.G. and be on my merry way.
|
JohnMu
Google Employee
2:36 PM
|
Hi budster Which Sitemap generator do you mean - the
older Python one or the newer one for Apache/IIS? I imagine changing
this behavior in the Python one is fairly easy (though to be honest, I
haven't done anything with it for quite some time now). If this is for
the newer Apache/IIS one, I'd recommend posting about the issue in
detail (with [anonymised] examples, if you have any) in the Google
Group for it. Otherwise, feel free to just open a bug if you don't see
a matching one already. Thanks! John
Well put, budster. One can't help but wonder, what with the incredible resource that these webmaster forums could be, that simple reality checks--from any level contributor--are predominantly met with either disdain or outright derision. JohnMu, God bless him, seems the most grounded
contributor in the entire forum. At the very least we're all finally on
the same page. Thanks for your contributions, clarifications, and
sanity checks--both of you.
A very minor point: "Search engines
can do things like keeping or removing trailing slashes".. If
your web browser requests a URL without the trailing slash Apache will
send back a redirect to the URL with the trailing slash. If the
search engine removes the slash then it is doubling number of requests
wasting bandwidth on the internet. Solution for users of the
Beta Google Sitemap Generator! In the admin console under
Dashboard > Default Sitemaps Sitemap Types ->Web
Sitemap URL filter Excluded URL patterns you can ADD
/index.html I haven't tried it yet. Don't blame me if
it ruins your business or costs you a bazillion dollars in
revenue. I dont know if it will affect siteconfigs already in
existence either. I am thinking it will at lease make the G.S.G
from "behaving badly" for future created websites.
Sorry John, I posted the global find before I
looked to see any replies. I was referring to the Beta 2009
generator. If the index file is going to be regarded as a
duplicate simple because it is also served as a default page when none
is requested then it should not be included in the sitemap by the
generator itself. My opinion is that it is not a
duplicate. There is only one index page correct? If the
sitemap generator is including it under two URLs in the sitemap, where
is the problem? Certainly not with the webmaster. The
sitemap generator and the page parser should agree as to what is
proper. If Google sitemap generator by default includes both URLs
then the google sitemap verification procedure should not flag it as
broken. If you change one then you should change the other
no? Just my opinion. At least I now see a global solution
that might help us if Google insists on flagging an index as a
duplicate in the future. I had in the past always felt something
was fundamentally wrong with the G.S.G. (both of them) adding both the
/ and /index to the sitemap. I just thought Well, It's google,
they must know what they are doing. Not being sarcastic here
really. Oh well, Thanks for your feedback. Hope it all
works out.
You know I never thought about it but my G.S.G is
configured to find the files using a directory search. I dont
know why it adds / as a page in my sitemap.
"pssstttt. . . budster. . . . you might
want keep reality checks on the down low here in the forums. The skins
here are as delicate as precious young orchid buds for one, and any
hint or mere mention of how things work in the real business and
professional world outside of Google is frowned upon." My home
page which has had a solid steady page rank for years just dropped to
unranked. Now I know what you mean by frowned upon. I had
no idea how unscrupulous... Oh yeah, I'm sorry.
Google is the best! Love ya really! Thank God for Google
Webmaster Tools and Page Rang generator! Oh my ISP went from PR=4
ro PR=1 too. Guess I better be prepared for the rest of my
sites to dissappear.
Subscribe
Tell us how we're doing:
Please answer a few questions about your experience to help us improve our Help Center.
|
|
|