How can I make sure that Google knows my content is original?

How can I make sure that Google knows my content is original?

Today’s question comes
from Kunal Pradhan. And I’m from eastern Kentucky,
so I apologize that I’m horrible with names sometimes. The question is, “Google crawls
site A every hour and site B once in a day. Site B writes an article,
site A copies it, changing the time stamp. Site A gets crawled first
by Googlebot. Whose content is original
in Google’s eyes and will rank highly? And if it’s A, then how does
that do justice to site B?” So I could get into a lot of
really interesting stuff about how to crawl the web. If you really want to know about
a signal, the Nyquist rate says you want to sample at
two times that frequency. But the fact is, you can always
change a web page. So the whole idea, the
conception of being able to crawl the entire web and having
a perfect copy at every instant, is a little bit flawed,
because at any time we can only go and fetch a certain finite number of pages. If we tried to fetch them all,
and our architecture could almost support that, then the
web might crash from all of those requests. And we try to crawl in a
relatively polite way. We also try to prioritize based
on things like the page rank of a particular page, or
maybe a site might have a lot of PageRank. So the question is essentially,
if A is getting crawled a lot but the original
article starts on B, what if A rips off B? Well, there are ways that
you can help to guard against that. So, for example, if you do a
Tweet, people will see it, people may link to it, and we
may follow those links faster than we’ll discover it
on the other site. Another thing that you can do is
you can hook up things like Pub SubHubbub, which will
ping various places. There is a very limited amount
in which we will use Pub SubHubbub to help improve
our crawl, and that might change over time. And that’s a great way to sort
of asynchronously say hey, there’s a new article or there’s
a new blog post. But let’s go ahead and
play with this hypothetical scenario. If A has copied your article
and changed the time stamp, that’s a little bit deceptive,
it’s as if they’re claiming that they have written it. So you can do a couple things. Number one, if you are the
author of that article, you can always do what’s known as a
Digital Millennium Copyright Act sort of notice, where you
send in this DMCA request; and you can find the information
at And basically what you’re saying
is this site copied me, but I’m the original author. So this site can either
counter-notify, which means they dispute that. They say I wrote this page,
which has some penalties to it if they’re lying. Or they can not dispute it and
the stuff disappears off of the other site. So if someone’s ripping
you off, you can always do a DMCA notice. You can also– for example, if
it’s an auto-generated site and they’re ripping off or
scraping a bunch of people– you can also do a spam report,
because that’s not a high-quality site; that’s not
the sort of thing that we want to have within our index. But let’s just play it all the
way out to the corner case. It is, in theory, possible that
we will find an article on one site before we find
it on the other site. And so it is definitely the case
that we try hard to find out who is the original creator
of a particular piece of content, but I wouldn’t
claim that we’re perfect. We do as much as I can think of
to try to figure out what are the ways that people
can indicate that they wrote the content. And in fact, in Google News,
we just introduced a couple new tags– almost as an
experiment to see how well it works– to sort of say,
here’s the original author of this content. So there are approaches that
we’re exploring to sort of figure out if there are
other ways to do that. But at least for the time
being, in theory it is possible to have an article. In practice, it tends to not
happen that often, and you do have ways that you can get
around that or ways that you can take action, from a DMCA
request all the way up to a spam report. Hope that helps.


  1. Post
    Yann Ropars

    Matt, thanks for this insightful video. How is possible for someone to see if a piece of content e.g. a paragraph is published somewhere else across the web? Is google make some tools available for this?

  2. Post
  3. Post

    Matt I glad JohnMu had Google's DMCA form fixed. I will sign up for Twitter an set up pubsubhub, but is that going to help if my site has a Panda penalty? Google has crossed the line of Fair Use by removing search results for the original content creator with Panda as soon as new posts are scraped.

    No wonder you are hiring.

  4. Post
  5. Post
    Gary S Berger MD

    Looking good Matt, there is always a few gems.

    What are the new tags I like the meta tag author and can see how these can be used to look at spam reports, if we all do a spam report and googlebot sees many tag discrepancies associated with the oft reported site.

    I use pingomatic to announce my new posts, recently I have been persuaded to use facebook and now you would have me tweet 🙂

  6. Post
  7. Post
  8. Post
  9. Post
    Maarten Luther

    @GoogleWebmasterHelp Sorry Matt, but Google isn't very good at spotting original content. I wrote an article (job vacancy) on my website, waited for Google to index it, after indexing I posted the same article on a job site, including a link to the original version. Guess what happened? The original got buried and the duplicate on the job site is ranking. The job site is PR5, mine PR6. So explain to me, because all the things you mention here are simply not true.

  10. Post
    Maarten Luther

    @GoogleWebmasterHelp Also, what's happened with reporting stolen/duplicate content form? Filling out my daily DMCA request isn't the way to go. Stealing content takes a second. Filling out forms and legal shit takes hours. If Google is so good at spotting duplicates, why such a hassle? Why make it easy for thieves (no consequences) and so hard for orginal copywriters? I got people stealing my customers' content without even bothering to change links and company names! But they're still ranking!

  11. Post
    Marius Rusu

    Same thing for me. I started a web page which has VERY high quality content (people and thanking me over and over again for what I ofer). I have PR0. A different web site which just copies content from other sites (nothing original at all; even copied some of my articles) has a PR2 and over takes me in google search…

  12. Post
  13. Post
  14. Post
    Timon Weller

    Best advise, always include your yourself in your articles.. Your own name i mean, if someone copies it then well it would be pretty obvious who it came from.. Also interlink in all your articles so your links make it difficult for scrapers as well.. If one really wants to be successful online these days, one really has to sell themselves as well.. In content, in video and anywhere…

  15. Post
    Timon Weller

    Yeah, Google needs to improve once more i agree, since panda i have noticed way more scraper sites and feed sites stealing content and ranking.. I hope they are on to that, it used to be a link said it all, now sites with no links are ranking even with little content as well… Drives me crazy when i see it.. I just wish they never changed anything, there search was working, now it appears broken..

  16. Post
    Steve Coan

    "A relatively polite way?" i have had this problem especially through YouTube Videos. and Link farms picking up my content before its crawled. DMCA? will cost you money or you could wait 1 month with a kickback (or link-back to DMCA). lets go to the 'spam report" where do i submit a spam report to Google? Please ad this to your links on this video. Thanks Matt

  17. Post
    Steve Coan

    I have to agree with Marius, the more i write the more content i get stolen and have other sites rank higher in my niche that even have free domain names. They copied my website, spammed and even hacked into edu sites to leave their garbage repeat content links and now they rank higher than me? oh please. Bohol Web design look it up in search. I have over 50 hand written articles and over 200 subscribers on a .com and still a free domain site trumps me because of hacked back-links? yeah ok

  18. Post
  19. Post
  20. Post
  21. Post
    David Cox

    I have a website that formats public domain works (and some copyrighted works with author's permission) into a Bible program format. What is Google's take on a post for a download of one of these books, when the Title, Author (original author, like 300+ years ago), and the table of contents (and a download link)? Other websites like mine are doing the same thing with the same material. Actually neither I nor the others are original authors. How does Google handle this kind of thing? Additionally, there are probably hundreds of online book stores like Amazon which have the same work, as well as Kindle, ePub, etc. sites. So how does Google sort out the ranking of my site versus all the rest? Does Google see my site as spam?

  22. Post
  23. Post
  24. Post
  25. Post
    L.j. Garner

    So basically, all of his hypotheticals still say that the content thief will get authorship credit and rank higher.

  26. Post
  27. Post
    Annie Douglas

    annie i thank if you have it in you to wright what you wright come from with in so you dont have to make it up whin you love doing this you donthave tomake it up this is god give i can wright all day with cop it becoust this is my love annie thanks

  28. Post

    what is terrible is when you discover a site which has copy an original content rank better than you …..

  29. Post
    Sherrie Vitello

    If you're going to be a bloggers be sure to use your own content. Google loves bloggers that have their own content. You not only keep from getting google smacked but you can also rank higher, create your own brand and original content can rank higher in the search engines. Thanks for the video its very informative.

  30. Post
    Majharul Hossain

    Does Google+ help to know who is original author? I mean to say if site B share the content on G+ just after publishing the content. Does G+ can send signal to the Google that who is the original?

  31. Post
  32. Post
  33. Post

    The answer here didn't help at all. Maybe because the question wasn't asked correctly. What would be more interesting to know, would not be how to fix the issue, but how to avoid it in the first place. You mentioned using social accounts to increase the traces of time stamps could help Google recognize which site produced the original content. Great, but else can be done? Please give us more ideas about how to maximize the chances of being considered the site which produced the content.

    – Of course, what may be an ugly truth is that Google doesn't always care, so if a site ranked really badly (startup) produces the content, and then a hugely visisted website steals it, maybe google thinks it's better to show this content to the bigger audience. I'm not sure google cares as much about the original creator of the content, but rather, who has the biggest audience to show it to..

    What do you think Mr. Cutts, is there any truth to that?

  34. Post
    Tech HUB

    but if we talk about reality , there are unlimited websites in the whole world who do copy content daily and nobody claim that? Fox example If i am writing a blog about nokia 6, surely I'll find some matter from other websites, wikipedia or images. You can do write a blog by own. lol

  35. Post
  36. Post
  37. Post

Leave a Reply

Your email address will not be published. Required fields are marked *