Saturday, July 10, 2010

The Taxonomy of Bugs

The Importance of bugs:

The importance of bug depends on frequency, correction cost, installation cost, and consequences.

A reasonable metric for bug importance is

Importance($) = frequency * (correction_cost + installation_cost + consequential_cost)

Structural Bugs:

Control and Sequence Bugs

Control and sequence include paths left out, unreachable code, improper nesting of loops, loop-back or loop termination criteria incorrect missing process steps, duplicate processing, unnecessary processing, rampaging GOTO's, ill-conceived switches, spaghetti code and worst of all, pachinko code.

Logic Bugs:

Bugs in logic, especially those related to misunderstanding how case statements and logic operators behave singly and in combinations, include nonexistent cases, improper layout of cases.

Processing bugs:

Processing bugs include arithmetic bugs, algebraic, mathematical function evaluation, algorithm selection, and general processing. Many problems in this area are related to incorrect conversion from one data representation to another.

Initialization bugs:

Initialization bugs are common, and experienced programmers and testers muck look for them.

Typical Initialization bugs are as follows : forgetting to initialize working space, registers, or data areas before first use or assuming that they are initialized elsewhere; a bug in the first value of a loop-control parameter; accepting an initial value without a validation check.

Data bugs:

Data bugs include all bugs that arise from the specification of data objects, their formats, the number of such objects, and their initial values.

Contents, Structure, and Attributes:

Data specification consists of three pars:

Contents: - The actual bit pattern character string, or number put into a data structure. Content is a pure bit pattern and has no meaning unless it is being interpreted by a hardware or software processor.

Structure: - The size and shape and numbers that describe the data object, that is, the memory locations used to store the content.

Attributes: - The specification of meaning, that is, the semantics associated with the contents of data object.

Thursday, July 8, 2010

The Search -New Business Model

The Internet - New Business Model:

Bill Gross , founder of the company with the most anticipated IPO in the history of Wall Street, the genius who rewrote the rules of business and rewired the way culture understood itself. Had Bill Gross not given up his argument ,had he just followed his gut, there might not even be a Google. Brin and Page might have sold out to Yahoo or Excite or Microsoft, or merged with Ask Jeeves or gone the way of Alta Vista.

Gross National Products - GNP , Inc

Gross started in a linear fashion, building companies one at a time. He'd grow them till he got bored or distracted; then he'd sell them. Gross hacked up a new high-fidelity speaker design and launched GNP, Inc, to sell creations. GNP, Inc., grew to claim number seventy-five on Inc, magazine's 1985 list of the 500 Fastest-Growing Companies. When he graduated, he sold the speaker business to his college partners and started a software company that presaged much of the rest of his life's work. The company, GNP Development ,allowed computer users to type natural language commands that the computer would translate into the arcane code needed to execute the specific tasks.

Magellan:

Back in the 1980's there was no Web to index, but there was the personal computer hard drive. PC's held a mere 20 or 40 megabytes of data that time, most were already a mess of lost files and hopeless organizational structures. To solve this problem Bill Gross invented Magellan. Magellan was an early version of what is now known as a file manager, a way to search all your files on your hard disk instantly. Sounds simple, but in the mid- 1980s, this was pretty revolutionary idea.

Knowledge Adventure:

Bill Gross another company took off, becoming the world's third-largest children's software publisher. Even in this Gross was working on piece of the search problem. But Gross , in 1996 Knowledge Adventure was sold to Cendant for $100 million.

IdeaLab:

Inspired by Spielberg, Gross decided his dream job was to start a company that allowed him to start company that allowed him to start many companies in parallel - a business incubator of sorts, an idea factory. IdeaLab was born a business incubator, but given its birth at the onset of the Internet boom, it quickly became far more than that. For a brief moment, IdeaLab was a major hub not only of the Internet industry, but of cutting-edge business theory to boot. In 1998 and 1999, many of IdeaLab's companies went public in spectacular fashion, and on paper, Gross and his investors got very, very rich. As long as the updraft was continuing, it worked. But the updraft ended, the capital markets stopped funding concept plays , and by the middle of 2001, IdaLab inventors were left holding a shattered portfolio, eventually filed suit, demanding that Gross liquidate IdeaLab and all its holding ,so they could at least get some of their money back.

GoTo.com: A New Model for the Web

Overture remains Bill Gross's greatest financial success - a company he built and sold not for $10 million, or even $100 million, but for well over a billion dollars. Overture was a hit, yes but it might have been Google, or at least it could have tried to be, when GoTo launched Google was still an obscure graduate school project.

Before Google, most search engines employed simple keyword-based algorithms to determine ranking. In this they indexed the words on a particular page, then matched those words to search phrases. It worked great for small, controlled data sets, and as AltaVista proved ,it worked quit well for the early Internet. But once spammers realized they could capture traffic for high-traffic keywords like "cars" by hiding those keywords all over their sites (often in small white letters on a white background, for example),the model quickly broke down. This is why, by late 1998, the majority of results matching a search for "cars" on Lycos were porn sites.

Gross had more than a dozen other Internet-related IdeaLab companies in various stages of execution, and all of them needed a good traffic , for this he developed GoTo.com an eye toward solving that problem i.e. identifying the good traffic and crap one and it become the GoTo's mission.
Gross's eureka moment: It's the quality, Any business would be willing to pay a lot more than seven to ten cents a click for the right traffic.
Gross's core insight: Together with his IdeaLab team, Gross looked at human-edited approaches and also tried for better algorithms, but was convinced that any approach to search driven by algorithms would ultimately be out mastered by spammers. So Gross turned to it's original idea: to kill spam. Gross's core insight, the one that now drives the entire search economy, is that the search term, as typed into a search box by an Internet user, is inherently valuable - it can be priced. But you can't charge the Internet user for searching. But what if you could charge the advertiser?

Gross's time-honored approach: Gross built not one but two entirely audacious ideas to GoTo's initial business proposition for advertisers: first was the concept of performance based model - one which advertisers paid for a visitor only when a visitor clicked through an ad and onto advertisers sites. Instead of demanding money upfront from advertisers, the way AOL or Yahoo did, GoTo's model guaranteed that advertisers had to pay only when their ads were clicked upon.

Second, and even more audacious ,was how Gross priced his new engine: one cent per click, an extraordinary discount to the market. He knew his price wan seven to ten times less than what every Internet marketer was paying at the time.

GoTo.com Launch:

In February 1998, Gross introduced GoTo.com at the famed TED (Technology, Entertainment, Design) conference in Monterey, California. Gross was one of the first to see a world where millions upon millions of search queries created the perfect advertising marketplace and like a missionary. When GoTo.com launched (four months after TED in June 1998), it spotted just fifteen advertisers. But with in six months it had hundreds, and by 1999 its advertisers numbered in the thousands. Gross had created a platform that let his advertisers build his business , by middle of 1999 advertiser grew to nearly eight thousand.

GoTo developed two lines of business: its main site, GoTo.com; and syndication business, which had lower margins. In the middle of 1999, at a time when Google had arguably no business model to speak of, Gross had already positioned GoTo as the company to beat in paid search.

AOL Deal with GoTo.com :

The terms were reasonably simple: GoTo would pay AOL a whopping $50 million to syndicate GoTo's search listings on AOL's site. GoTo would make its profit on the traffic AOL sent through the GoTo listings. The AOL deal was huge for us, says Ted Meisel, a McKinsey consulting veteran who took over as CEO of GoTo in May 1999.

GoTo.com to Overture:

In September 2001, GoTo.com formally changed its name to Overture in favor of the syndication business. But all along Gross was worried they were making a mistake. We were worried about channel conflict and we overreacted, "Gross says ruefully". "We thought that if we didn't phase out the GoTo.com site, our partners wouldn't renew. But the truth was, as long as we were making them money, they didn't care."

In 2001 TED conference, Gross met with Larry Page and Sergey Brin to suggest the two companies merge into partnership that would once again realize Gross's dream of creating ultimate search destination. But Page and Brin turned a cold shoulder to Gross's overture. The reason given: Google would never be associated with a company that mixed paid advertising with organic results. Several months after the talks stalled, Google introduced AdWords, its answer to Overture.

The Search Economy:

To profit from search and control it's own destiny, a company requires three elements,
       First, it must have high quality organic search results, also known as algorithmic, or editorial,search.
       Second, company needs a paid search network.
       Third ,it needs to own its own traffic - the consumer's search queries against which editorial and paid results can be displayed.

Microsoft and Yahoo realized as 2002 came to a close was that this was the only element that either of them truly owned.

Overture also owned only one of these three magic elements - the paid search network. Both Yahoo and Microsoft began to pencil out strategies for acquiring Overture. Both Yahoo's Terry Semel and Microsoft's Bill Gates had guns at Overture's head. Either one could say , "Take my offer, or I' ll go to Google and your stock will tank." Overture had agreed to a $1.63 billion acquisition by Yahoo...Even after this Bill Gross isn't finished dreaming the next dream. His companies have sold for $1 million, then $10 million ,then $100 million , and now more than a billion dollars (Yahoo deal), but he's still not satisfied. Bill has a big idea, that is next paradigm in search, It's the next economic model and next relevance model.

SNAP: Gross answer to next paradigm search

A new breed of search engine that ranks sites by factors such as how many times they have been clicked on by prior searches, among many other things. SNAP has developed a pay-for-performance scheme that goes pay-pay-click one better: advertisers can sign up to pay only when a customer converts - in other words, when the customer actually buys a product or performs search as specific action deemed valuable by the advertiser, like giving-up e-mail address or registering for more information. The motivating factor to Gross start all over again, search engine spam solution.

Monday, June 28, 2010

The Search Notes-Search before Google

Early Search Engines:
Archie, pre-Web search:

The honor of being the first Internet search engine goes to Archie, a pre-Web search application created in 1990 by a McGill University student named Alan Emtage. Archie sourced Internet-based archives ad built an index of each file it found. Based on the Internet's file transfer protocol standard, Archie's architecture was similar to most modern Web search engines-it crawled resources, built an index, and had a search interface. But the pre-Web era was not a very user-friendly time. Typical users would query the engine by connecting directly to an Archie server via a command-line interface. They would Archie via keywords though to be in a matching file's could be found. They then connected to that machine, and rummaged around till they found what they are looking for.

WWW Wanderer:

Belongs to a researcher at the Massachusetts Institute of Technology, Matthew Gray. He wrote the Wanderer to systematically traverse the Web and collect sites.

AltaVista.com:

Louis Monier, a researcher at DEC's Western Lab in Palo Alto, California, suggested building a search engine: it could load the entire Internet (the massive database) onto an Alpha computer, then build a program showcasing Alpha's speed. Presto-Alta Vista was born, a proof point to DEC's hardware dominance. Monier wanted to make sure AltaVista stayed pure - the best search on the Web. "A pencil," Monier called it - a tool that did one thing very, very well -- it's exactly the approach that catapulted Google to the top of the heap four years later. By 1997, AltaVista was truly king of Search. Serving more than 25 million queries a day and on track to make $50 million in sponsorship revenue, the company was in a three-way heat with Yahoo and AOL as most important destination on the Web.

Journey of AltaVista:

In January 1998, AltaVista moved in the hands of Compaq a Houston-based personal computer giant with absolutely no knowledge of the consumer Internet.

In less than two years Compaq sold Alta Vista to CMGI, a high-flying Internet holding company, for $2.3 billion in June 1999 then CMGI sold this to paid search innovator Overture Services, Inc, in 2003. Later Overture itself was later sold to Yahoo, which restored AltaVista to its original look: a search box, a blinking cursor, and scads of white space. But before that Monier, creator of the first Goolge, moved from the Alta Vista with 30 members of team and his experience, is now working at eBay, helping that commerce giant redesign.

Lycos.com

Lycos was created in May 1994 by CMU's Dr. Michael Mauldin, working under a grant from Defense Advanced Research Projects Agency(DARPA).

History behind the name "Lycos": It was derived from Lycosidae,the Latin word for the wolf spider family,whose members actively seek their prey rather than catching it in a web.

How it works:

Lycos deployed a spiderlike crawler to index the Web, but it used more sophisticated mathematical algorithms to determine the meaning of page and answer user queries. The crux of Lycos search technique was analysis of anchor text, or the description of outbound links on a Web page, to get a better idea of the meaning of the existing page. Lycos's introduced Webpage summaries in search results ,rather than a simple list of links.

For a short period in 1999,Lycos became the most popular online destination in the world. But in May 2000, Lycos was sold to Terra, a Spanish telecom giant. Four years later, Terra sold Lycos to a South Korean company.

Similarly like AltaVisa Lycos moved into couple of hands and today remains a top-twenty destination.

Excite:

Founded in 1994 by six Stanford University alumni. Excite began life under the name Architext. The company's original goal was to create search technology for large databases within corporations, but Vinod Khosla(person who funded Excite with $1.5 million ) encouraged the company to focus on the consumer Web.

Innovations from Excite:

Personalization - MyExcite was among the first services to allow users to create custom Web pages with news ,business information, and regional weather reports. In 1997, Excite became the first of the major portals to offer free e-mail.

Journey of Excite:

In 1998 every major search engine was in play ,and every one looking for merges and acquisitions. Excite had discussion with Yahoo, Google,AOL,Microsoft, and Lycos. According to both Khosla and Bullington, Excite was extremely close to closing the deal with Yahoo but another bidder came knocking on the Excite's door. When @Home, a broadband company owned by several major companies, made a richer offer to combine Excite with its @Home broadband Internet service. It's quite similar to Compaq was to Alta Vista and naturally Excite ended up in a very messy Chapter 11 proceeding.(bankruptcy)

Unbeknownst to them all, there was a giant vacuum left in search.

Yahoo Kick Off:(Yang and Filo)

Yahoo got its start when two bored Phd candidates at Stanford hacked together a system that helped them win a fantasy basketball league. In 1993, Mosaic, the first Web browser, launched and Yaung started obsessively surfing the web , noted all web lists which interested to him. Filo took note of Yang's passion and write some software which collects the list and puts in one web page.

YAHOO -Yet Another Hierarchical Officious Oracle ,came by reveres engineering by way of acronym. Yang and Filo adopted directory approach to navigation -sorting links into categories like Arts, Science, Business, and so on.

Tim Koogle, Yahoo's first CEO, knew he was onto something when he met Yang and Filo in 1995. I saw great guys who were clearly in need of adult supervision ,Koogle tells. Both Filo, and Yang readily admit their lack of business expertise at the time, and welcomed experience of Koogle, who was a former Motorola executive.

Google Is Born:

Page read a biography of Nikola Tesla, one of history's most prodigious inventors. Telsa discovered or developed the foundational technologies for an astonishing array of innovations,from wireless communication and X rays to solar cells and the modern power grid.But despite his extraordinary invention,Telsa remains a minor figure in particular when compared to Thomas Edison, a man who Telsa worked for. The twelve-year-old Page was stuck with this fact:regardless of how brilliant and world-changing Tesla's work had been, the inventor received little long-term fame. After reading Telsa biography Page tells, I realized I wanted to invent things, but I also wanted to change the world. I wanted to get them out there, get them into peoples hands so they can use them, because that's what really matters.

The two others working with Page and Brin were Scott Hassan and Alan Steremberg, graduate assistants who had been assigned to the project,Google. But Hassan and Steremberg ended up separating from the project before Google really took off. But even those missing Beatles started successful Internet companies. Hassan went on to found eGroups.com with Larryy's brohter,Carl Page, and later sold it to Yahoo for more than $500 million. Steremberg had already launched The Weather Under ground, a popular weather site, while an undergraduate at Michigan and still runs today.

The first version of Google released on the Stanford Web site in August 1996. Graduate students usually lack the money to buy new computers; Page and Brin were no exceptions. Instead they begged and borrowed - a hard drive from the network lab, an idle CPU from the CS loading docks. Using Page's dorm room as machine lab, they fashioned a computational Frankenstein from spare parts, then jacked the whole thing into Stanford's broadband campus network. After filling Page's room with equipment, the young students converted Brin's room into an office and programming center.

Initial Investment for Google Inc:

By late 1998, Google was serving more than ten thousand queries per a day, and it was clear to Page and Brin that the service would quickly outgrow their ability to beg resources to support it. David Cheriton (who heads Stanford's Distributed Systems Group) suggested Page and Brin meet with Andy Bectolsheim, a founder of Sun who was active in early-stage investments. Both of them met Andy and given demo, and Andy asked a lot of questions. Then he said: 'Well, I don't want to waste time. I'm sure it'll help you guys if I just write a check. Page and Brin weren't ready for such an offer, but when Bechtolsheim went out to his car to get his checkbook, they pondered how much to ask for and what valuation. But when Bechtolsheim returned, they told him their suggested valuation. Page picks up the story: "We told him our valuation, and he said 'Oh, I don't think that's enough, I think it should be twice that much." Brin and Page were stunned, but of course, they agreed, and Bechtolsheim asked who the check should be made out to. The founders hadn't settled on a name, so Bechtolsheim suggested Google Inc.,after the service's name. They agreed, and minutes later, Page and Brin had a check for $100,000.

The Early Years -Google first Office

Page kept the check in his dorm room desk for several weeks, as the founders went about forming the company and setting up back accounts. On September 7, 1998, Google Inc. was formally incorporated, with Page as CEO and Brin as president. When Brin and Page hired their first employee-fellow student Craig Silvestein- they realized they needed to find office space. They found a temporary answer in Susan Wojciki, a friend of Sergey's girlfriend.

Tuesday, May 25, 2010

The Search-Notes

What is Search?
Link by link ,click by click ,search is building possibly the most lasting, ponderous ,and significant cultural artifact in the history of humankind: the Data base of Intentions.
The Data base of Intentions is simply this: the aggregate results of every search ever entered ,every result list ever tendered , and every path taken as result. What we call simply as Search.

Who ,What ,Where ,Why, When, and How:
John Battelle words: As a cub reporter ,I was taught to answer five questions about any topic before writing about it: who ,what ,where ,why and when. If you crammed answers to all those questions into your lead paragraph ,then you'd essentially done your job. Author also said he quickly learned to add a sixth how?. I hope these words holds good for every one.

A Search engine Consists :A search engine consist of three major pieces
The Crawl
The Index
The run time system or query processor
The crawler is a specialized software program that hopes from link to link on the World Wide Web , scarfing up the pages it finds and sending them back to be indexed.

The more sites they crawl , and the more frequently they crawl them ,the more complete the index is. When the index is more complete ,the search results pages (SERPs) that are returned for particular query have a greater chance of being relevant. The process of grokking the index is referred to as analysis. Google's PageRank algorithm is an example of analysis: It looks the links on a page, the anchor text around those links , and the popularity of the pages that link to another page and factors them together to determine the ultimate relevance of a particular page to your query. Google in fact ,looks at more than one hundred factors to determine a sites relevance to your keywords.

The query processor which is the interface and related software that connects user's queries to the index.
Once the crawl data is analyzed ,indexed , and tagged ,it's dumped into what's called a runtime index - a data base ready to serve results to users. The runtime index form something of a bridge between the back end of an engine (the crawl and index) and the front end(it's query server and user interface).

Atomic phrases:Phrases that have their own sets of results at the smallest levels ,search engines are capable of tell the difference by parsing a list of atomic phrases.
As per John (2007) Google alone has more than 175,000 computers dedicated to the job.

The power of search lies :
We do ask lot of the same questions , but we ask far more that are unique , and therein lies the power of search.

Google Whacking:
In the early days of Google , a popular sport among the search watchers was to find a query that had exactly one result. This game even has a name - Google Whacking.

Where & Why -Search:
Navigational Query: The practice of typing in a word you know so as to yield a site you wish to visit called as navigational query.
Why Search: We are searching for more than one answers. Not only are we searching for that which we know. We increasingly searching to find that which we do not know.
Web blindness: A sense that was know there's stuff we might want to find, but have no idea how to find it.