Wanted: human editors. Scrapers and robots need not apply

My world is awash in crap data.

Several times a week, I open my snail mail box to find bulk-mail solicitations for some member of one of my websites, but sent to the site’s street address. Every month or so, I’ll get a series of calls to my business phone (which is listed on my website), but the caller will ask for a name I’ve never heard. For the rest of that week, I’ll get dozens of similar calls, from different people calling on behalf of some work-at-home scheme, all asking for the same fake name.

And whenever I’m stuck searching for information via Google or Bing, I inevitably have to scroll past link after link to scraped websites – pages written not by any human being, but slapped together by scripts created to blend snippets from other webpages into something that will fool Google’s or Bing’s algorithm into promoting them.

If Google really wants to make its search engine results pages more meaningful, forget about adding links from my Google+ friends. How about creating a scraper-free search engine, instead?

I have no doubt that the reason why I get all those misaddressed letters and wrong-number phone calls is that some fly-by-night “data” company scraped together a database by mashing up names, street addresses and phone numbers it crawled on various websites. That database gets laundered through some work-at-home company, which sells it to customers suckers via the Internet as a “lead list” for commission sales.

It’s bad enough to take phone calls from these poor chumps, who think that they’ve taken a step toward earning some honest income. But I’m stunned when I see the bogus-name letters coming to my office from established colleges and non-profit institutions, who clearly also have bought crap mailing lists.

(FWIW, all my phone numbers are on the National Do-Not-Call Registry, and I’m opted out of commercial snail mail with the Direct Marketing Association, so no legitimate data company should be selling my contact information to businesses and organizations I’ve not dealt with before.)

Maybe it’s too much to hope for a solution that frees me from having to throw away all these unwanted letters and beg off these unwanted phone calls. (Not to mention saving the people contacting the expense of pursuing bogus leads.) But maybe I can hope for a scraper-free Internet experience instead.

I know it’s possible, because there used to be a scraper-free search engine – one that searched just hand-picked Web sites created by actual human beings. It was called Yahoo!, and if they’re smart, the latest crew of new managers at Yahoo! could do far worse than trying to recreate a 2012 version of their Web directory, then using it to populate a Google-killing search engine.

For an example of the garbage polluting search engines today, this site came up high in the SERPs when I searched recently for my wife’s name and the name of her website.

Scraper site screen grab

If you know anything about the violin, you should be ROTFL now. For those who aren’t violin fans, allow me to explain that Ivan Galamian, one of the great violin pedagogues of the 20th century, has been dead for over 30 years. While we would have loved to have someone of his stature working for us at Violinist.com, only an idiot scraper script would think he works for us now.

It kills me that good websites, blogs and journals written by thoughtful correspondents get pushed down in the SERPs – and overlooked by potential fans – because of this garbage.

I want a search engine that knows better – that excludes Web domains populated by scraped data and instead searches online sites written by actual human beings. I wouldn’t limit such a search engine to sites written by paid, professional staff. There’s too much rich content to be found in the conversations of others. But blogs, discussion boards and rating-and-review sites included in this search engine should be composed of information submitted by human beings, not scraped from other websites and edited together by bots.

The original Yahoo! lost when start-up rival Google indexed more pages than Yahoo, giving Google an edge over its established competition. But I – and, I suspect, many others – don’t care about the size of a search engine’s database any longer. Google’s right on in its attempt, announced today, to build a more human-driven search engine. But I’m not convinced that adding Google+ links to the SERPs is enough of a change to make a difference in quality.

First, not enough people use Google+. Its 18-and-over-only age limit also disqualifies the millions of teen-agers who help drive the digital conversation. And I fear that Google’s new “Search Plus Your World” approach simply will encourage spammers to flood Google+ with even more bogus accounts and friend requests, in order to boost their reach into the Google SERPs those new “friends” see.

It’s great to use social media to help bring more people into the process of selecting which websites should be indexed in a search engine. But, ultimately, at this point organizations still need more aggressive in-house human oversight in back-checking the results.

Google lost its quality control over its SERPs long ago. Whether it’s search engine results or business lead lists, there’s too much crap data on the market today. That illustrates the continued need for more, and better, human leadership of data cultivation. There’s a market need out there. So who’s going to step forward to fulfill it?

About Robert Niles

Robert Niles is the former editor of OJR, and no longer associated with the site. You may find him now at http://www.sensibletalk.com.

Comments

  1. 182.68.84.158 says:

    I totally agree with your post. Observe anything odd about your Google+ profile? Does it status amazingly well in Google