bud@google.com | My Account | Sign out
Google

Webmaster Help Feed Feed

Google Help > Webmaster > Discussions > Sitemaps > Duplicate URL's?

Question: Duplicate URL's? Report abuse

TenCell
Level 1
7/29/09
Hi,

Google thinks that we have a load of duplicate URLs in our website.  We re-launched this a week or so ago and it indexed nearly 14K lines before throwing up this issue.  The industry we work in means that part numbers (and hence URLs) can look similar even though they are different.

I have taken the top three of the sitemap (www.tencell.com/sitemap/file1.txt) as an example, but it is show this issue with all nine of the sitemap files even though they are all unique.

Duplicate URL
This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit.
URL: http://www.tencell.com/15425-p-1.html
Problem detected on: Jul 23, 2009
Errors -
Duplicate URL
This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit.
URL: http://www.tencell.com/15426-p-2.html
Problem detected on: Jul 23, 2009
Errors -
Duplicate URL
This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit.
URL: http://www.tencell.com/30691-p-3.html
Problem detected on: Jul 23, 2009
Errors -
Duplicate URL
This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit.
URL: http://www.tencell.com/30692-p-4.html
Problem detected on: Jul 23, 2009


What do you think?  Why is it stating that these are duplicates even though they are not??!!

Any advise would be appreciated.

Nick

All answers

johnny ramone
Level 1
7/29/09
1 person says this answers the question:
Yep, I have the same problem, as it appears many others have also.

Not sure whats going on. I wouldnt make any changes until this is resolved.

Obviously a Google issue.
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
7/29/09
HI Johnny,

That sucks!  At least its good to know that this isn't a specific issue to my website.  That would also make sense as it was fine for the first 7-10 days of indexing, it is only since yesterday that this has become apparent.

So where do we go from here?  Do i need to contact google/make a complaint?

Nick
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
7/29/09
Id sit it out for the moment. There have been numerous posts in the sitemap forum

http://www.google.com/support/forum/p/Webmasters/label?lid=401d0e67c19e20e9&hl=en

I wont be making any changes to my site or my sitemap until I receive a response from Google support.

I would suggest subscribing to this thread and waiting to hear from someone.
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
7/29/09
Hey, ok great.  Will you let me know if you hear anything?

Thanks!
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
7/29/09
We should both be notified once a Google representative posts a solution or at least an answer within this thread. Until then Im just going to sit tight and wait. I guess if we dont hear anything within the next 24 hours, we will have to action this as required.

Has anyone else reported this as being resolved yet???
Do you think this answers the question?
Yes
No
Report abuse
bigfathaacke
Level 1
7/29/09
Count me in too if you find out a solution!!
Do you think this answers the question?
Yes
No
Report abuse
Dayna134
Level 1
7/29/09
count me in too, my shopping cart changes pagination pages auto and i can't access those or change them at all, they will end in ss2, ss3, ss4 and so on.
I really need G to give a solution their end as i am sure i am not the only one using this shopping cart
Do you think this answers the question?
Yes
No
Report abuse
StevieD_Web
Level 1
7/29/09
2 people say this answers the question:
Same here.

my home page is:     www.mywebsite.com/

But working in frontpage, the page is published as www.mywebsite.com/index.html

www.mywebsite.com/ is redundant with www.mywebsite.com/index.html  because they are the same exact page.

all Google search results goes to www.mywebsite.com/ and all internal and external links point to www.mywebsite.com/.

my sitemaping service develops both www.mywebsite.com/ and www.mywebsite.com/index.html as listings in the sitemap. 

So I manually removed the www.mywebsitemcom/index.html as one of the listings in the sitemap.

Google Webmaster tools is now happly.

Don't know if I am correct, but it does solve the GWT error and the sitemap has been loaded and status says OK.
Do you think this answers the question?
Yes
No
Report abuse
squibble
Level 3
7/29/09
@ tencell.  I hope you dont mind me saying but your site doesnt render correctly in IE7 - it is shifted right.
 It is also very slow to load each page. I would try removing the right hand scroller for a test to see if that fixes it - and if it does - then make all the stuff for the right hand scroller load last.
Do you think this answers the question?
Yes
No
Report abuse
MT6999999
Level 1
7/29/09
I noticed the issue today. So i removed /index.html from the sitemap
and still got error
so Now i'm removing the domain.com and using /index.html
i hope that works
Do you think this answers the question?
Yes
No
Report abuse
1918
Level 4
7/29/09
1 person says this answers the question:
Do you think this answers the question?
Yes
No
Report abuse
tonyfm
Level 1
7/29/09
I am having the same problem as of today (july 29) on 4 of 6 websites
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/29/09
And no, it's most certainly NOT an canonicalization problem.
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/29/09
Who the heck who said this question is answered???  Is there any way to find that out? Is that what the Forums are about? Taking any answer as answered so valid questions Google can't--or won't--answer just die on the vine?  That's beyond childish.
Do you think this answers the question?
Yes
No
Report abuse
h_tsopelas
Level 1
7/29/09
1 person says this answers the question:
i have the same problem.i built my site with yahoo page builder and when i create the sitemap it has www.spiti-oikia.com/ and www.spiti-oikia.com/index.html.
i ask a lot of times and i search at net...i remove the www.spiti-oikia.com/index.html and i add the canonical tag at the index.i submit again the sitemap and i have not any problems for now.sorry for my english,i am from greece.
Do you think this answers the question?
Yes
No
Report abuse
ashukothari
Level 1
7/29/09
Hi! The same problem with me since today. I had no problem till now but suddenly it shows"Duplicate URL. This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit." I checked the directories and there are no duplicate files. Don't know what to do? My sitemap is at http://www.seoniti.com/sitemap.xml.
Do you think this answers the question?
Yes
No
Report abuse
gerryIII
Level 1
7/29/09
Yes, just had the same error mesage. Please advise.

"This URL is a duplicate of another URL in the sitemap. Please remove it and resubmit."
Do you think this answers the question?
Yes
No
Report abuse
ashukothari
Level 1
7/30/09
1 person says this answers the question:
Check the URL's in your sitemap file.

There should be quite a few duplicates there. Remove them and resubmit your sitemap. Hopefully this will work.

I had the same problem on my website: http://www.seoniti.com/

I manually checked my sitemap file and found few duplicate URL's there. After removing the duplicates, I uploaded and re-submitted the new sitemap file. It worked.

My website is small so I could do it manually, but if your website is big, you will have to somehow get those duplicate URL's removed from your sitemap.

Probably till yesterday, Google was sympathetic to those issues. Not anymore now. So check your files before re-submitting.

Good Luck.
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
7/30/09
2 people say this answers the question:
"Check the URL's in your sitemap file. There should be quite a few duplicates there."

BS. Why with no changes are we all receiving this error message all of a sudden. If we had duplicates in there, dont you think we would have identified them and removed them before dozens of us bothered posting this issue here.

Does anyone have a definitive answer to this problem?

Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ????
Do you think this answers the question?
Yes
No
Report abuse
ashukothari
Level 1
7/30/09
Hi Johnny,

Go to your sitemap.xml file in your root directory and remove all duplicate looking URL's from there. That will work.

Good luck.
Ashish
Do you think this answers the question?
Yes
No
Report abuse
corse32
Level 1
7/30/09
Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ????

so you do have duplicate URLs?
Do you think this answers the question?
Yes
No
Report abuse
Funksen
Level 1
7/30/09
I have the same problem

http://tech.gate.io

this is a tech blog and wiki site

sitemap under

http://tech.gate.io/sitemap.txt

I have no duplicates in the file
the only problem perhaps is, that blog1 is the whole blog and startpage, and the blogposts are indexed one by one as well
so some of my blogposts are duplicated on the main site

every blog owner must be facing this problem too, or is it just a bug?
Do you think this answers the question?
Yes
No
Report abuse
1918
Level 4
7/30/09
Hi Tencell,

Check here(http://www.google.com/support/forum/p/Webmasters/thread?tid=1b84082a9efb1209&hl=en) and use the snippet of python code John Mu provides to find any duplicates.

Hope this helps.
Do you think this answers the question?
Yes
No
Report abuse
glendene
Level 1
7/30/09
1 person says this answers the question:
I am a nuby and I have VERY limited knowledge about all this but I am getting the same Duplicate URL message. I looked at my sitemap and can't see any duplication. Everything was fine till a couple of days ago. What effect does this have on adding new pages (which I do every day) etc to my site?
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/30/09
1 person says this answers the question:
There IS no duplication, glendene. It's a Google MISTAKE that they simply keep ignoring, hoping we'll all go away and leave them to keep screwing with our indexed page results in various parts of the country and the globe to favor people that pay them for search results. Google's been making thousands of mistakes this year--huge and small.  The common denominator is that this year they're not admitting to ANY of them.  Apparently from this point forward in Google's corporate history, all Google mistakes, blunders, screwups, and crappy programming are the fault of the webmasters employing Google for search traffic or monitoriing their sites through Google's newly abominable Webmaster Tools. 

But as you've probably discovered, along with the rest of us: Google's Webmaster Tools sitemap configuration module is the only place in their newly revised Webmaster Tools that shows the actual number of your site's pages that they've included in their indexes. There is no other accurate means of assessing your progress in having your pages returned to Google's index after yet another one of their index-dumping miscalculations, is by monitoring that sitemap module in Webmaster Tools.
Do you think this answers the question?
Yes
No
Report abuse
1918
Level 4
7/30/09
Dnyhagen, you can always fall back on site:www.sitename.com to see roughly what's been indexed
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/30/09
1 person says this answers the question:
Uh, no, 1918, as you WELL KNOW. The ONLY accurate measure FROM GOOGLE is GOOGLE's measure of the thousands of perfectly valid--and historically valuable--pages they've dumped en masse from their index from time to time recently only to trickle them back into their indexes at the rate a 5 to 6 reindexed pages a month.  All this rubbish about the sitemap subsection of Webmaster Tools being useful--or meaningless for that matter--is pure rubbish.  You know it and Google knows it.

Should you need any proof, simply note the utter and complete absence of any response from Google over this latest Webmaster Tools fiasco--in a long recent string of Webmaster Tools fiascos.

Then of course we could all draw the conclusion that if, as Google reps repeat here time and time again, the sitemap subsection of Webmaster Tools is meaningless, that Google's programmers were either:

1. Idiots for incorporating an utterly meaningless module into Webmaster Tools in the first place.
2. At the least, disengenuous, for telling us that the sitemap module serves any valid purpose whatsoever.
3. Covering Google's corporate patoot by going through the motions of appearing to correct a host of their incompetent missteps from the recent past.
4. Duped just like the rest of us into believing that Google is honestly attempting to make their search results more valid and valuable to searchers without either regional throttling to skew the results, favoring millions of AdSense and AdWord subscribers, or giving the impression of favoritism to either Web 2.0 clients, social networking sites, or blog sites over traditional websites.

Google's continued silence of this issue--and a host of other recent issues--simply raises more questions about Google's current competence, integrity and validity than it answers.

Sorry to be the one of the few to notice that The Emperor is butt nekid of late, but that's just me talking . . .
Do you think this answers the question?
Yes
No
Report abuse
Rachelle_H
Level 1
7/30/09
1 person says this answers the question:
I just opened up my sitemap.xml and realized I had my http://www.straydogmarketing.com AND http://www.straydogmarketing.com/index.html

I deleted the .index and now it works!
Do you think this answers the question?
Yes
No
Report abuse
davebarley
Level 1
7/30/09
I've heard that if you post a complaint on Twitter, a company may respond more quickly to get the problem resolved.  Those of you who Twitter might want to consider posting the Google Glitch there.  Reading through some of the posts here, this Google problem appears to be a very serious one for a lot of companies out there. I can wait it out, but it sounds like a lot of other businesses are getting totally frustrated and, more importantly, losing revenue.
Do you think this answers the question?
Yes
No
Report abuse
davebarley
Level 1
7/30/09
1 person says this answers the question:
MT6999999 - That's exactly what I did.  I deleted the URL and left the index page, and it worked.  I got rid of the red X anyway and got the green check mark.  After Google explains what's going on, I think we'll all be putting that line back in the Sitemap.  It didn't feel right to remove it.  Plus, it's always been in the Sitemap.  As for now though, it seems to have corrected the problem.  Google probably won't find us at all now!!
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/30/09
1 person says this answers the question:
Heh, Bingo! davebarley! You're almost certainly correct, Sir!

All that remains now is for Google to send out a form letter to every sitemap.xml generating program supplier, that Google's made a teensy change to their Webmaster Tools that will now entail hundreds of them to re-write their sitemap.xml preparation programs to account for this latest 'beneficial tweak' that Google's made to their Webmaster Tools Sitemap module.

Simple, no?
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
7/31/09
Hi all.  Looks like i am not the only one who is having problems after all!

As i created the sitemap files for all 440,883 lines through excel and the useful concatenate feature I can easily find duplicated URLs.  In this case, there are no duplicated URLs that i can find on the sitemap!  There are similar URLs, but that is because (for example) there might be various colours of the same LED, so a part number might be 10 characters long, but only the 10th digit may change to signify blue instead of red.

So nobody has heard anything from Google?  Nothing from John Mu?  He normally has something to say on stuff like this......

I suppose the question should be, what CAN we do about this?
Do you think this answers the question?
Yes
No
Report abuse
vadamo
Level 1
7/31/09
Hello everyone,

For what it's worth. My site www.funphoto.com.au has been around for several months with no sitemap issues. Suddenly, I have been recieving the same error as others here. I have now removed www.funphoto.com.au/index.html and it seems to have fixed the problem.

Regards, Vince
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
7/31/09
Hi Vadamo,

Some others on here have said they have had success using this method.  Unfortunately for me, I have only listed the product URLs and nothing else.  No company info, blog, index page, latest news etc.  So that can't work for me!

Nick
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
7/31/09
From looking at your sitemap.xml you're directing it to the numerous text files you're employing for sitemaps. I'm sure you've already gone through this exercise, but have you made absolutely certain that none of the extremely long aggregations of part numbers you're listing as pages aren't duplicated somewhere in one of those text files? 

I noted that you've at some point or another employed, for example, a ''1-'' prefix for many of your parts that seem in line with the none '1-'' prefixed similar part numbers? Perhaps you could throw together a little excel function to double check?

I can sympathize with your problem.  Your sitemap text files look to be a daunting preparation issue.

Are they all really pages you need to point Google to?  (e.g., as opposed to simply letting Google find them on their own.)

I don't mean to oversimplify the your very valid issue.  Just talking out loud here . . .
Do you think this answers the question?
Yes
No
Report abuse
davebarley
Level 1
8/1/09
Just a note . . . I reversed what I did to correct the duplicate URL problem.  I deleted the index page from the Sitemap and left the URL in.  Now, as someone suggested in the forum, I am going to do a 301 redirect on my index page.  He said that it's a good idea just in case someone is linking to or searching for your URL with the index.html included.   MT6999999:  This seems to be the method that most people are suggesting in the forum.
Do you think this answers the question?
Yes
No
Report abuse
schmidtpainting
Level 1
8/2/09
Anyone a little angry about this if you delete your main page or index pages your dropping back links which are driving your websites postion on the search engine

This is bad bad BAD
Do you think this answers the question?
Yes
No
Report abuse
schmidtpainting
Level 1
8/2/09
And now I'm more angry when faced with which to delete from my site map I deleted the index page but google is not happy enough with that they want to take about 80 backlinks off me by having me delete my main url

Man I'm Fuming
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/2/09
schmidtpainting, you're submitting both http://www.schmidtpainting.net/index.html  and http://www.schmidtpainting.net/ in your Sitemap file. This doesn't make sense since it's really the same page. If you submit only one of them, the warning will go away. Please note that this message in Webmaster Tools does not signal a change in the way we've been processing your Sitemap file.

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/2/09
Hi TenCell, can you confirm that you haven't changed your Sitemap text files since seeing this message? I think that's the case, but just want to confirm it to be certain. I've passed your Sitemap files on to the team and will get back to you once I know more. Thanks for your patience.

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/2/09
1 person says this answers the question:
Hi Funksen, you've got the URL "http://tech.gate.io/blogpost12" listed twice in your Sitemap file. If you remove one of those lines, the message will disappear. You should probably be seeing this URL in the message in your Webmaster Tools account as well.

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
8/2/09
Anyone else lose all their indexed URLs the day after making this duplicate index.html vs. canonical fix by removing one or the other? I had 1200 indexed URLs before the duplicate removal.  Now only 2.
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
8/2/09
@ John

As already asked above....

Am I to remove, www.mydomain.com OR www.mydomain.com/index.html ????

And why all of a sudden are we having these issues when no changes were made?
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
8/2/09
@ Dnyhagen

"Anyone else lose all their indexed URLs the day after making this duplicate index.html vs. canonical fix by removing one or the other? I had 1200 indexed URLs before the duplicate removal.  Now only 2."

Yuck, Im not touching anything.
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
8/2/09
None of that seems to have worked for me:

I changed the 301-redirect on the server to point index.html to www.digitaldeliftp.com

I ensured that my sitemap location in robots.txt was the www.digitaldeliftp.com location.

I regenerated my sitemap.xml with all www.digitaldeliftp.com prefixes

I removed the 'duplicate' www.digitaldeliftp.com/index.html that coffeecup software has been incorporating into my sitemap.xml for over two years now without consequence.

I resubmitted my sitemap.xml, got the green check mark and I'm still back to 2 URLs indexed from the previous 1200 that took me the last six months to get back to from the over 2600 I'd had indexed historically going back as long as two years.

Must still be doing something wrong, obviously.  But what?
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/2/09
@Dnyhagen: While this message is newly visible in Webmaster Tools, the whole processing of Sitemaps files has not changed recently. If something else happened at the same time as the message showed up in your account, they generally aren't related.  I'll check out your other threads in a bit to see if there's something I can add there.

@Johnny Ramone: If both http://www.example.com/ and http://www.example.com/index.html show the same content, there's no reason to submit both via Sitemaps (since we'll pick one and hide the other one in search results anyway). Personally, I'd pick the shorter URL, but the choice is ultimately up to you.

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
johnny ramone
Level 1
8/2/09
@ John

Im concerned that my indexing will be effected. If I leave it, will everything continue to work without issue?

Im not keen on losing months worth of work here.
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/2/09
@Johnny Ramone: Sure, leave it like that if you prefer. However, the advice to pick a canonical version is pretty good and has been around for a LONG time :-). For example, Matt Cutts mentioned it back in 2006 in http://www.mattcutts.com/blog/seo-advice-url-canonicalization/

Personally, I'd choose a canonical URL and remove the duplicate, but I can understand that you might want to be careful and wait to see what all happens. Search engines are strange "black boxes" sometimes and taking a step back to wait for things to settle down is almost never bad advice.

When you're ready, if you want, you can follow up your choice of URLs by using the rel=canonical link element. This gives us another signal to let us know which version you prefer to have indexed. You can find out more about it in our blog post at  http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
8/2/09
@ JohnMu thanx
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
8/2/09
JohnMu, You are stating the obvious here. We all know that they are flagging the index file as a duplicate.  This has NEVER been an error before.  My sitemap is create by the Google Sitemap Generator on some sites the first edition and on other sites the second edition.  SUDDENLY Webmaster tools is showing this as an error.  It is a new condition.  It is NOT because these sites have just started putting index.html in the sitemap.  It IS because Google suddenly started flagging the output of THEIR OWN sitemap generator as an error.  Stop telling people to delete the index from their sitemaps because they will lose rank due to missing backlinks.  Webmasters should WAIT for google to fix the condition.
JohnMu
Google Employee
8/2/09
Hi budster
You're right - this message is new. However, the processing in the background has not changed. We've always treated these URLs as duplicates, but we're now exposing this to our users so that they can learn from this as well. Removing duplicates from a Sitemap file always makes sense provided you leave one copy. It will not affect your page's rankings in a negative way because we've always filtered out the duplicate version anyway.

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
8/2/09
I am a programmer and have been for over 35 years.  Are your telling me that your employer expects us to change every sitemap on the internet?  It is normal for Web servers to automatically display an index page when the URL states only the FQDN without a page in the URL!  The sitemap checking software SHOULD filter the duplicate for at least index.html, index.php, index.asp maybe any page with a basename of index matching the FQDN.  Sure expose other duplicates.  It's a totally different issue.
budster
Level 1
8/2/09
When  mentioned my programming experience I had meant to mention it's easier to change either the Google Sitemap Generator or the Sitemap checking software than for us to add a "drop" index line to every siteconfig.xml file on every website on every web server all over the world.
Dnyhagen
Level 2
8/2/09
pssstttt. .  . budster. . . . you might want keep reality checks on the down low here in the forums. The skins here are as delicate as precious young orchid buds for one, and any hint or mere mention of how things work in the real business and professional world outside of Google is frowned upon.  To prepare yourself to post your next comment, just remember to genuflect in the direction of Mountain View, keep real world common sense to yourself, and never ever ever suggest that Google may have overreached a teensy bit from time to time. The last thing you want to do here is get shunned for being honest and candid. Think GroupThink and you'll get along just dandy.
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
8/3/09
@ John,

I haven't changed the sitemap as i thought that this situation would get resolved sooner or later, especially seeing as though it is happening all over. 

Webmaster tools are now showing that the first of my sitemap files is fine, and the red cross on the remaining eight has changed to an exclamation mark.  It is still stating that there are duplicates in the sitemap, however i have gone though these with a fine tooth comb and there are all individual and unique to each part.

Also, on a related issue, is it normal for the first sitemap to contain the most indexed URLs, and the last containing barely any?  I'm sure that you said there was no hierarchy on a previous thread.  Mine goes from over 6k indexed lines on the first 'map, to 5K8 on the second 'map, down to 1K1, 72, 14 and then zero on the remaining 'maps.

Any help appreciated!
Do you think this answers the question?
Yes
No
Report abuse
JohnMu
Google Employee
8/3/09
Hi TenCell
It's good to hear that things are looking up with regards to your Sitemap files :-). It's possible that the engineers will remove this message for the moment anyway, since it seems to have confused a lot of people (the processing in the background will remain the same though).

Regarding the indexing of your URLs, that's a bit more complicated, especially when the site has a size like yours. One element which is always important though is that URLs should be well-connected with links, either from other pages within your site or from other websites.

With that in mind, I took a random URL from your seventh Sitemap file and took a quick look at it: http://www.tencell. com/88970422-p-300125.html -- apart from the fact that it looks like there is very little unique and compelling content on the page (I know this is a problem with some article databases, perhaps the part number alone is the most important part, you'd know best), it appears that this URL is not linked anywhere from your website, nor does it link to other articles on your site (eg. related components, groups or sub-groups). This makes it very hard for us the judge the relevance of a single URL like that and in the end, it makes it hard for us to judge how important it is that this particular URL should be indexed like that.

So my advice would be to not rely completely on Sitemap files but also to make sure that your content can be found with links within your website -- and that content on your website links out to similar content within your site. Also, adding a bit of unique and compelling content to the article pages is generally a good idea, if there are ways that you could do that :-).

Hope it helps!
John
Do you think this answers the question?
Yes
No
Report abuse
TenCell
Level 1
8:04 AM
HI John,

Thank you for your response.  Unfortunately in our industry (electronic components market) descriptions for products are very similar for the entire series, with the only difference being (for example) colour.  This will only make a minute change to the part number, example 5-1457895-6, with the -6 changing to a -7 to signify light brown instead of grey.  I will talk to the designers about grouping the parts by series & manufacturer, but this is a problem that i feel isn't going to go away.

It looks like the sitemap is showing all green ticks now, so fingers crossed!

regards,

Nick
Do you think this answers the question?
Yes
No
Report abuse
Funksen
Level 1
9:13 AM
thank you JohnMu, you are absolutely right,

at first, every site was listed as duplicate entry, which must be a bug
after resubmitting the sitemap.txt, just blogpost12 is marked as duplicate, which is my fault

cheers
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
11:49 AM
Thanks for the heads up Dnyhagen.  Sorry JohnMu!  I didn't change anything tough and my sitemaps are all back to Green Checkmarks.  Seems that someone realized that there is only one index.html and that serving it by default when not asking for a certain page IS normal for every webserver out there.  Marking it qas a duplicate when there is only one of them was a huge mistake and it's been fixed. :D
johnny ramone
Level 1
11:54 AM
Even though this issue has annoyed me no end, I must say that Ive been laughing my head off at some of the posts submitted to this support forum. In particular from Dnyhagen!

I decided to log in today to see where things are at after having made NO CHANGES at my end, and guess what?

A GREEN TICK!

Nice one ...*grumble
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
12:03 PM
Yeah and did you read where it was change back because WE were confused.  OMG!
johnny ramone
Level 1
12:09 PM
Of course its US - not THEM.

6 million webmasters couldnt be right at the same time could they?
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
12:21 PM
Give me the source, just that one condition,  I can fix it and send it back in less than 20 minutes including testing.  It will flag real duplicates and filter any file with a basename of "index.*" that matches the http://xyz.com/ URL  I don't care what language the code is in.  Real duplicate files would still be marked as such.  Tencell would still have to put Green or Blue in the page though.  By the way the sitemap info in webmaster tools says that you don't need one for sites having links to every page.  Sitemaps are for locating content not available through linkage.  E.g. content findable only via internal search function.
TenCell
Level 1
12:43 PM
HI Budster,

I wish it was my call to do that, it would make indexing far easier!  Due to the fact we have 450k URLs, i have to rely on the manufacturer's own description which can be pretty poor.  Not sure it would be realistic to go through all of these individually!  All customers will search for the specific part number and that is what i have based the URLs on.  Each URL has the manufacturer part number in it.  Perhaps a bit crude, but i'm not sure how else to do it!  This is why so many of the URLs look similar even though they are not.....

Nick
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
1:07 PM
Off Topic Warning:  Hey TenCell, I know it's a small world in your industry.  Would you by any chance know Mike from I-DEAL Components?  He is one of my best friends.
JohnMu
Google Employee
1:53 PM
Hi TenCell
I followed up with the team regarding your Sitemap files and it does look like there was an issue on our side there. Sorry about the confusion. It seems this was only temporary though so it would have cleaned itself up after the next processing (which we pushed forwards a bit to get this error out of your dashboard).

Regarding indexing of your Site, one thing you could do is to provide overview pages for your parts. These pages could link to the individual part pages and contain a bit more information about the parts that they link to. In an attempt to keep things simple, it might even make sense to only submit "content-rich" pages via Sitemaps, especially if you are manually generating your Sitemap files at the moment. It's always hard working out the best configuration for a site like yours, so I'd keep a good eye on the users and try to make it as easy as possible for them, they're frequently the best sources of inspiration (it works here in the forum for us as well :-)).

@everyone: We've currently disabled these messages since it did confuse quite a number of users. We are however still processing Sitemaps files the way we've been doing so in the past, which means that we will still generally regard "/" and "/index.htm" as duplicates internally (but we won't bother you with messages - at least for the moment).

I would still recommend only submitting either "/" or "/index.htm" in the Sitemap file if both URLs lead to the same content. It won't make or break your website, but clean canonicalization is always a good practice. For more information on this canonicalization stuff, I'll re-post the links from above + a few others which I think are pretty good:
- http://www.mattcutts.com/blog/seo-advice-url-canonicalization/
- http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
- http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html
- http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html

Cheers
John
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
2:24 PM
A very  minor point:
"Search engines can do things like keeping or removing trailing slashes"..  If your web browser requests a URL without the trailing slash Apache will send back a redirect to the URL with the trailing slash.  If the search engine removes the slash then it is doubling number of requests wasting bandwidth on the internet.
A more pertinent point.  It would help us immensely if you would fix Google Sitemap Generator to only submit the best canonical URL by default!  Having it behave badly is the root problem here isn't it?  Don't forget that Beta 2009 version.  Remember I'm not talking about REAL duplicate pages but the normal default index pages used by 99% of the websites in the world.  The one percent aberrations can fix it themselves.  I would very happily reinstall G.S.G. and be on my merry way.
JohnMu
Google Employee
2:36 PM
Hi budster
Which Sitemap generator do you mean - the older Python one or the newer one for Apache/IIS? I imagine changing this behavior in the Python one is fairly easy (though to be honest, I haven't done anything with it for quite some time now). If this is for the newer Apache/IIS one, I'd recommend posting about the issue in detail (with [anonymised] examples, if you have any) in the Google Group for it. Otherwise, feel free to just open a bug if you don't see a matching one already.
Thanks!
John
Do you think this answers the question?
Yes
No
Report abuse
Dnyhagen
Level 2
2:45 PM
Well put, budster. One can't help but wonder, what with the incredible resource that these webmaster forums could be, that simple reality checks--from any level contributor--are predominantly met with either disdain or outright derision. JohnMu, God bless him, seems the most grounded contributor in the entire forum. At the very least we're all finally on the same page. Thanks for your contributions, clarifications, and sanity checks--both of you.
Do you think this answers the question?
Yes
No
Report abuse
budster
Level 1
2:47 PM
A very  minor point:
"Search engines can do things like keeping or removing trailing slashes"..  If your web browser requests a URL without the trailing slash Apache will send back a redirect to the URL with the trailing slash.  If the search engine removes the slash then it is doubling number of requests wasting bandwidth on the internet.

Solution for users of the Beta Google Sitemap Generator!  In the admin console under Dashboard > Default Sitemaps  Sitemap Types ->Web  Sitemap URL filter Excluded URL patterns
you can ADD /index.html   I haven't tried it yet.  Don't blame me if it ruins your business or costs you a bazillion dollars in revenue.  I dont know if it will affect siteconfigs already in existence either.  I am thinking it will at lease make the G.S.G from "behaving badly" for future created websites.
budster
Level 1
3:07 PM
Sorry John, I posted the global find before I looked to see any replies.  I was referring to the Beta 2009 generator.  If the index file is going to be regarded as a duplicate simple because it is also served as a default page when none is requested then it should not be included in the sitemap by the generator itself.   My opinion is that it is not a duplicate.  There is only one index page correct?  If the sitemap generator is including it under two URLs in the sitemap, where is the problem?   Certainly not with the webmaster.  The sitemap generator and the page parser should agree as to what is proper.  If Google sitemap generator by default includes both URLs then the google sitemap verification procedure should not flag it as broken.  If you change one then you should change the other no?  Just my opinion.  At least I now see a global solution that might help us if Google insists on flagging an index as a duplicate in the future.  I had in the past always felt something was fundamentally wrong with the G.S.G. (both of them) adding both the / and /index to the sitemap.  I just thought Well, It's google, they must know what they are doing.  Not being sarcastic here really.  Oh well, Thanks for your feedback.  Hope it all works out.
budster
Level 1
3:23 PM
You know I never thought about it but my G.S.G is configured to find the files using a directory search.  I dont know why it adds / as a page in my sitemap.
budster
Level 1
8:49 PM
"pssstttt. .  . budster. . . . you might want keep reality checks on the down low here in the forums. The skins here are as delicate as precious young orchid buds for one, and any hint or mere mention of how things work in the real business and professional world outside of Google is frowned upon."  My home page which has had a solid steady page rank for years just dropped to unranked.  Now I know what you mean by frowned upon.  I had no idea how unscrupulous...   Oh yeah, I'm sorry.  Google is the best!  Love ya really!  Thank God for Google Webmaster Tools and Page Rang generator!  Oh my ISP went from PR=4 ro PR=1 too.   Guess I better be prepared for the rest of my sites to dissappear.

Post reply

Add references:

Subscribe

Subscribe to the Feed feed for this thread

Tell us how we're doing: Please answer a few questions about your experience to help us improve our Help Center.