Traffick: Search Engine Enlightenment | Grab the feed: traffick.com/atom.xml

I'm at Mesh today.

Funniest anecdote of the day: Craigslist CEO Jim Buckmaster highlighted the "girl with the blue urn" in the Missed Connections area of Craigslist. She "spilled her grandmother" on a "sad looking boy" in Spadina subway station.

Posted by Andrew Goodman
| | Permalink

�

Tuesday, May 29, 2007

Apparently Michael Seaton isn't quite up there with Michael Keaton (the name Google suggests I really mean when I type "Michael Seaton blog,"), but the director of online marketing for Scotiabank just got a little old media ink.

Patricia Best at the Globe and Mail noted that while Seaton ("the director of online marketing for The Bank of Nova Scotia")'s digs at competitor BMO were "by no means revelatory or even original," but his competitive post was "a departure from normal practice in the gentlemanly and closed club of Canadian banking."

Hmm, maybe it's that gentlemanly reserve that got some of them in this mess in the first place.

Best's mention was published a gentlemanly ten days after Seaton's post.

Labels: blogs, bmo, scotiabank

Posted by Andrew Goodman
| | Permalink

�

Monday, May 28, 2007

I love this iGoogle. It reminds me of Go2Net, circa 2000.

Labels: portals

Posted by Andrew Goodman
| | Permalink

�

Sunday, May 27, 2007

As anyone with a logfile analyzer for their own site knew already, Google has continued to take market share away from the next two search engine companies. comScore's numbers seem strangely two to three years behind reality, as always. Which probably means the trend has already leveled off!

But also, as usual, we'd love to see the breakdown among all the "Google properties," "Yahoo properties," and "Microsoft properties," and an explanation of what counts as a search. Not quite enough to pay $100k for the data, though.

Labels: search engine market share

Posted by Andrew Goodman
| | Permalink

�

Thursday, May 24, 2007

Jeff Braverman at NutsOnline.com (a client) has gotten mixed up in something huge (David Braverman is pictured at left). We (my colleague, the humble genius Scott Perry) even ran an AdWords ad group for it briefly, sending visitors to this landing page. Now the viewership is linking to the page from a wide variety of blogs and forums.

OK, I admit it, I'm having trouble piecing this together as I don't watch the show... but did I hear right? Did they show a "cliffhanger" episode, and then cancel the show?

This show is being cancelled. There's a line including the word "NUTS" in the last episode. Irate/loyal fans have by this time sent 7 tons of nuts to CBS.

All the Google News that's fit to print.

WebProNews gives Jeff some nice coverage.

In some kind of poorly-proportioned effort to give "closure" to fans, an online finale idea has been floated by CBS. Hmm, stay tuned.

UPDATE: As of Saturday we're up to 10 tons of nuts to CBS.

Labels: jericho, nuts

Posted by Andrew Goodman
| | Permalink

�

Here's a Google Maps feature that should please everybody's dad.

For who doesn't remember a time they were taken "off the beaten path" by a loved one who just had to try a new way to get there...

As one of my alma maters' mottos went: Tentanda Via.

Labels: google maps

Posted by Andrew Goodman
| | Permalink

�

Wednesday, May 23, 2007

Is is just me or do investment banking types like their summers off? There seems to be an uptick in massive mergers and acquisitions of late in general. In our online ad industry, four major unclaimed pieces of the ad services and ad inventory pie have been snapped up. Yahoo took control of Right Media; Aquantive, which owns Valueclick (which owns Commission Junction), agency Avenue A/Razorfish, and bid management tool Atlas, was bought by Microsoft for so much money ($6 billion) that it counts as Microsoft's largest ever acquisition; mega-agency WPP bought 24/7 Real Media (among other assets and agency services, it's owner of bid management technology Decide DNA); and starting the whole domino effect in the first place was Google's $3.1 acquisition of DoubleClick (which has a number of interesting assets in the ad serving and bid management field, but also owns an agency, Performics).

We're a long way from Google picking up tiny Sprinks (an ad system that mostly served customers placing ads on About.com) for low millions. But this example might help us better understand what's going to happen next. Google replaced Sprinks inventory with its own, eliminated Sprinks' unique methodology from the marketplace, and more or less gave the employees their walking papers (no doubt politely and amicably).

So my reaction to the recent acquisitions - particularly DoubleClick and Aquantive - was that it would throw our industry into short-term chaos, on a number of fronts. The diversified nature of the acquired companies meant that people and products would be moving around some more before they came to rest and re-formed altered relationships with customers.

In each case, I think the key question to ask is, what part(s) of the acquired company are the acquirers really buying? In spite of statements to the contrary, the acquiring companies do have plans to sell, eliminate, or drastically reorganize big chunks of the companies they've acquired. This is plain.

I checked out industry reaction, both by polling some industry insiders for their viewpoints, and by reading some of the commentary. Here's a selection:

In MediaPost, Mark Simon pointed out that a bid management tool like Atlas (remember, owned by Aquantive which is now going to be owned by Microsoft) is used to manage a large number of high-spend search accounts. Atlas has a lot of detailed data about search campaigns, particularly those run on Google. Great competitive intelligence, right? Too great. Simon followed the logic to argue that there's no way Google won't block API access from bid tools owned by major competitors. I would take this a step further to try to imagine exactly how Google will reorient its policies so they don't seem discriminatory. Let's make that the next bullet point...

I believe, along with some others, that Google will study and redesign some of what they've acquired from DoubleClick (the DART Search bid management system) to begin offering sophisticated bid management in-house. Listen, they were on a path to implementing this anyway, and it was going to hurt a lot of third-party bid management software firms because we'd be able to get this functionality free within AdWords. That now looks like a certainty within six to twelve months; as with Urchin-Analytics, improved versions likely hit the market (at no cost to you) in 18-24 months.

I'll shift gears and offer my take again. WPP and 24/7 is an odd one. They need the agency services side. The inventory and network parts seem out of place. The bid management tool will be ineffective or actively blocked by Google and Yahoo. So this means WPP drastically overpaid for an agency add-on.

Danny Sullivan told me that he sees conflict-of-interest problems:

"I simply don't see how either Google or Microsoft think they are going to be able to hang on to interactive marketing companies that are involved with gaining placement with their search listings. It is simply not compatible with trust for searchers or advertisers. Even if information isn't exchanged, the perception will be that it is.

Overall, I feel like the acquistions are a grand rush to build up interactive ad networks to rival, in particular, the contextual ad network that Google has already built and is mining. I especially understand the desire with Google and Microsoft to gain better tools (Yahoo actually purchased a real network). But in the rush to get the tools, they've gained a lot of other baggage they'll have to deal with, whether they like it or not."

Richard Zwicky of Enquisite considers the WPP acquisition to be particularly notable because it's indicative of a belated (panicked?) acceptance by traditional ad agencies of the interactive ads space and even the search ads space. Richard also drew attention to the Interpublic acquisition of Reprise Media and the "outstanding team there." Looking to the future, Zwicky predicts a rise in M&A; activity targeting boutique online services firms, especially in search. But he believes boutique agencies need to become diversified boutique agencies, not mere one-trick ponies stuck on, for example, SEM or SEO.

Matt Van Wagner of Find Me Faster thinks "Microsoft will have to spit Razorfish back out or they will be in conflict with many of their advertisers."

John Krystynak of GotAds told me that Microsoft's purchase of Aquantive was monumental stupidity. On his blog he pointed out that they paid 14X revenues for what is essentially an agency, and "agencies aren't exactly known for pristine revenue reporting." He says that if he were an Aquantive customer, he'd be looking for a new agency right now. (But John, what about Valueclick and Atlas? Those are the non-agency parts.)

In a detailed post too long to paraphrase, Linda Burlison outlines potential chaos in bid management and media buying across the various competitors; this is particularly evident especially now that so many conflicts of interest have been created by these various acquisitions.

Posted by Andrew Goodman
| | Permalink

�

Monday, May 21, 2007

rkgblog writes about Jimmy Wales critiquing Google's lack of transparency.

I've just been briefly reviewing the history of Google and competitors in capsule form, as I write the new edition of the book. Kept me up late thinking about it, to be honest.

There is something so compelling about an open source search engine: maybe search can actually get better if it goes in that direction - tapping into distributed developer expertise. In non-public or low-scale settings, search engines like Nutch and its cousin Lucene SOLR have so much promise. And why not? It becomes "our" search engine that allows "us" to customize, while not being beholden to a particular overlord.

Some of that vibe, though, was what led to the Open Directory Project many years ago -- and what happened?

On balance, it looks to me that Nutch et al. (open machine algorithm) and Wiki-something are two very different approaches to the problem. Open source search in the traditional sense is open to a community of developers, and freely licensable. Wikified search is bound to be open in that looser, sometimes chaotically obscure or corrupt way somewhat analogous to the (problems and opportunities of) old ODP. Importantly, the Wiki concept still relies too much on people to produce content. This will not necessarily scale. It's useful for some things, hopeless for others. Another problem is that Wikipedia users won't necessarily be better at the production side than users distributed across many involved online communities. They might be worse.

This is a draft of some thoughts that might go into a book (below). A few older bits still need cleaning up. What are your thoughts?

--

Beginning life in 1998 as GnuHoo and then NewHoo, the Open Directory Project (ODP) was conceived as a competitor to the Yahoo Directory. The work was to be done by volunteer editors, and the end product was to be licensed to any portal or site that wanted to take advantage of the information. Doesn�t sound like much of a business? Well, it turned out to be a pretty good deal for the founders. The directory�s popularity led to its acquisition by Netscape, which was later acquired by AOL.

AOL became the Open Directory�s major distributor, but the directory was also licensed (at no charge to the publisher) in many other places around the Web. Google began using ODP data fairly early on, calling it the Google Directory. An innovative feature was Google�s use of an �overlay� technique, ranking results in a given ODP category in order based on the site�s Google PageRank score. This was illustrated with a green bar (on a scale of 0 to 10, similar to the way the info is displayed by searchers using the Google toolbar). This could have been a very useful feature indeed had there been more consistency to the underlying content in the directory. The so-called Google Directory still exists, but it has been completely de-emphasized in the Google Search user interface.

A couple of key Open Directory players, founder Rich Skrenta and marketing exec Chris Tolles, eventually moved on to a new venture: Topix, a sophisticated news search engine that competes directly with Google News. Topix is now 75% owned by three major media companies: Gannett, Knight-Ridder, and Tribune.

The ODP came under criticism for many of the same reasons Wikipedia is maligned in some quarters today: a lack of �professional� editorial quality control. The lack of transparency of site submission procedures to the website-owning public, and the huge variations in the degrees of disclosure of editors� biographical information meant, for me, that this so-called open directory was far from it.[i]

The construction of a comprehensive high-quality human-edited directory remains an elusive (and perhaps now irrelevant) task. The ODP founders were correct in their assumption that a distributed model for vetting editorial recommendations was the only possible way to get a comprehensive categorized directory to scale with the growth of the Web. But they also oversold the value of human contributions insofar as even tens of thousands of these couldn�t scale adequately to cover the enormous explosion of online information � not as compared with improved search algorithms and search interfaces, to say nothing of the massive acceptance of the concept of online collaboration and a wider range of tools to support this. In the past I had come across a couple of alternatives to ODP; two notable ones were put forward by Steve Thomas (Wherewithal, Inc.) and Dave Winer (RSS pioneer). Both would rectify the problem of a fixed category structure being controlled solely by the category owner. They�d allow for collaborative taxonomy, so to speak. Ho, hum. Many of these seemingly radical critiques of ODP have become staples of today�s so-called Web 2.0 movement. Thus many of these early debates have been surpassed by growing acceptance of the need to develop technologies to subtly handle �upstream� of self-organizing editorial output from many users, rather than a top-down (if seemingly democratic) categorization scheme.

Contributing to the organization and sharing of information has to seem fun or worthwhile, and much of the ODP community moved onto other passions. A spinoff site called ChefMoz � also a good idea � found little appeal in the broader public and proved that ersatz claims of �officialdom� for an open-source, .org-based human review site were grandiose; sites (mere websites!) such as Chowhound and Yelp now achieve considerably greater popularity pursuing virtually the same goals. The emergence of a range of ODP offspring proved that it was never really the open directory. It was a human-powered directory that chose to call itself �open.� (A similar realization will no doubt dawn on users of and contributors to Wikipedia, too. It won�t prove to be the be-all and end-all.)

�Humans do it better,� the ODP slogan, was proved wrong in the sense that algorithmic approaches such as Google Search won mass acceptance from users over and above hand-categorized, ostensibly quality-controlled directories. That said, new methodologies of tapping the so-called �wisdom of crowds� (such as Digg) have meant that the machine algorithmic approach isn�t the only winner in the marketplace for ranking and rating online content. And certainly, algorithms can�t create content as tens of thousands of Wikipedia participants have managed to do in their improbable construction of a huge online resource.

In the meantime, then, a whole world of user-built information sources has exploded on the scene, with Wikipedia and Digg leading the pack. Many of the pathologies (and opportunities) that bedevil (and excite) Wikipedia and Digg users today were endemic to ODP. In hindsight, this makes Skrenta and Tolles pioneers to a greater extent than perhaps they realized.

The Google Difference: A Third-Generation Algorithm

If Google hadn�t moved to fill the void left by its struggling predecessors, someone else would have. Scientists in various research projects were working on new ideas about how to rank the importance of web pages vis-�-vis a given user query. What Google did was to popularize some of the best emerging ideas about how to design a large-scale search engine at a time when others were losing momentum. Some of these ideas are so central to the task of ranking pages in today�s web environment that they were adopted in some form or another by all of Google�s main competitors (including Inktomi, AltaVista, and FAST).

The working paper that explains Google�s PageRank methodology, �Anatomy of a Large-Scale Hypertextual Web Search Engine,� is frequently cited.[ii] But the field of information retrieval technology is rich with ongoing experimentation by hundreds of well-funded scientists, some well known, some not. Some scientists take a slightly different approach to the problem tackled by Page and Brin, organizing the Web into topic-based �communities.� Teoma, a search engine acquired by Ask Jeeves (now Ask.com), is the most public example of this approach.[iii] The two approaches tend to provide somewhat different results, but they are clearly cousins of a similar generation of thinking about the �hyperlinked environment,� and both have been a boon to researchers seeking that elusive piece of information online. In practice, algorithms such as Google�s and Ask�s today are really meta-algorithms, looking for �signals� on a wide and shifting spectrum of measures of quality and relevancy, while attempting to filter out or devalue huge volume of junk, spam results. Today�s search engines might be clever enough to measure website usage patterns, background business data, and more. (One potential signal, the age of a website, is now seen as so matter-of-fact that search marketers have a nickname for the apparent difficulty in getting well-indexed in Google if you�re a new website owner: �The Google Sandbox.�) In addition to all that, there are attempts to determine user intent in search queries, to serve up personalized results or even different types of results (news search, maps, financial charts, weather) based on the user�s history or the nature of the query. In today�s mature world of search, no one methodology is billed as �the� best way of arriving at the ultimate ranking of results on a given search query. But arguably, Google consolidated its lead in search based on the mythology that its PageRank system was an invention that led to brilliantly accurate search results.

In any case, the idea behind PageRank was brilliant and intuitive when it was brought to market in 1998. The governing principle revolves around a map of the linking structure of the Web. Pages that have a lot of other important pages pointing to them are deemed important. �PageRank can be thought of as a model of user behavior,� wrote Brin and Page. �We assume there is a �random surfer� who is given a web page at random and keeps clicking on links, never hitting �back� but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank.�

This was a significant advance over previous generations of web search. Although most major engines had experimented with a variety of ranking criteria, many of them had depended heavily on basic keyword matching criteria. Not only did this make good information hard to find because so many pages were locked in a virtual tie for first place, it made it easier for optimizers to feed keyword-dense pages into the search engine in a bid to rank their commercially oriented pages higher. Although this game of keyword optimization is quite effective to this day in ranking pages well on unpopular queries (even on Google Search), it seems to work rather poorly on common queries.

The ascendance of PageRank means that on a Google Search for auto insurance comparison, for example, it�s likely that a well-known site will rank well here rather than some random site that just happens to contain those keywords. When I tried the query, I saw a number of leading insurance comparison sites, and very little �junk.� This dovetails with the notion that authoritative recommendations do indeed confer authority as far as Google�s algorithm is concerned. But it won�t take you long to find a few head-scratchers in the mix. It�s difficult to get a monolithic sense of which types of pages rank well. But few would dispute the fact that a high volume of quality links pointing to one�s site is a great way of getting Google Search to treat you well. PageRank isn�t dead, it�s just part of a bigger mix of factors than ever before.

The ability to break all these �virtual ties� among similar search results was a breakthrough for search engines. Almost all major search technologies today are significantly more sophisticated than those from the mid-1990s. I recall a time when many websites used a free licensed version of Excite Search for their internal site search. The technology was weak, often providing a clutter of irrelevant results. If search was this bad in closed corporate environments, it was definitely in need of improvement if it was to help users sort through the enormous clutter of pages available on the Web. For searching relatively fixed data sets, such as finding pages within a single website, today�s technology is significantly improved over yesteryear�s. The open source movement has even brought us libraries of sophisticated search engine code (such as Lucene SOLR), meaning that a powerful small-scale search engine can be customized at a reasonable cost.

A public web crawler in the same family, Nutch, has gained notice as well. A free, open-source web search technology in 2007 is nearly as sophisticated as industry-leading search engines from a decade ago valued in the hundreds of millions of dollars, but they�re still far from beating Google at its own game. Why? Nutch � like many other search technologies � doesn�t scale as well. In the understatement of the search engine century to date, the Nutch founders write: �Much of the challenge in designing a search engine is making it scale. Writing a Web crawler that can download a handful of pages is straightforward, but writing one that can regularly download the Web's nearly 5 billion pages is much harder.�[iv]

It doesn�t stop there. Taking those billions of pages, now you�ll have to assess them all and determine how much authority each link on each page should be allowed to �pass on� to other websites and pages. Because some site owners will be up to no good (premeditated linking schemes), or simply because fortunes change, the map of how much authority (or, what type of authority) is conferred by all hyperlinks on record is going to need to be updated regularly. A web search engine must also be able to sort out �duplicate� (often stolen or �scraped�) content from the original content, so it doesn�t end up giving visibility to the wrong source. The calculation of link structures and associated authority weights alone � let alone getting the underlying approach to how to do the calculation right � is beyond the capacity of any small-scale search engine infrastructure.

Beyond massive computing power and indexing technology, then, Google�s advantage continues to rely in part on the ability of PageRank and other related technologies to sort out valuable information from information that �dumbly matches� the user�s query. Want proof? Do a search on your favorite topic at Technorati.com, the blog search engine. It�s powered by Nutch. I�m betting you�ll find quite a number of �spammy� results in the mix, in spite of some recent tinkering with a weak cousin to PageRank, an �authority score.� What�s surprising is that Google�s own Blog Search also appears easier to flood with duplicate content and spammy sites than its main search index.

To be clear, the calculations involved in determining PageRank are just the beginning when it comes to determining how high a page ranks for a given user�s query on Google....

[i]. �Why the Open Directory Isn�t Open,� Traffick.com, March 30, 2000.

[ii]. Sergey Brin and Lawrence Page, �Anatomy of a Large-Scale Hypertextual Web Search Engine,� Stanford University Department of Computer Science, 2000. Jon Kleinberg, widely considered to be the leading contributor to this generation of search technology, has published many important papers on search, including �Authoritative Sources in a Hyperlinked Environment,� 1998.

[iii]. For a user-friendly overview, see Mike Grehan�s interview with Paul Gardi, �Inside the Teoma Algorithm,� July 2003, archived at e-marketing-news.co.uk.

[iv] Mike Cafarella and Doug Cutting, ACM Queue 2:2 (April 2004).

Labels: google, jimmy wales, open source, wikipedia

Posted by Andrew Goodman
| | Permalink

�

View Recent Posts

�