
Archive for May, 2008



It looks like your dreams of laptop salvation via the Van Der Led Jisus will have to wait a little, as the company has pushed back the release of its diminutive PC to the 28th of July (at least according to the product page). However, if you want to spend a tiny bit more money, they’ve got a new model that nets you a lot more bang for the buck. Enter the Jisus V2, or as we like to call it — the second coming of Jisus. The new version features a pink leather (!?) casing, a 10.2-inch 1024 x 600 LED display, a VIA C7-M 1.6GHz CPU, 1GB of RAM, a VIA Chrome9 graphics chipset, an 80GB hard drive, 802.11a/b/g, and Bluetooth 2.0 support. All this magic will supposedly be available come June 20th for an extremely affordable €349.99 (or about $546).
[Thanks, Takashi]



In a move of unsurprising proportions, StyleTap announced today that it will be bringing its Palm OS emulator to the iPhone and iPod touch… officially. This basically means that every one of those precious Palm apps you couldn’t live without will now be easily accessible via Apple’s devices, thus seriously threatening the argument for keeping your Treo 600. Gregory Sokoloff, CEO of the company, said that the response to a video posted in February of a demo version of the software convinced them to take the plunge. Palm, now might be a good time to stun us with your new OS.



The folks at Konami have been trolling the forums again, kids, and this time they’re fighting back. Reps from the game-maker have issued a statement on the company forum addressing the furor over the inflated cost of the forthcoming, limited edition Metal Gear Solid 4 bundle. According to Konami, the bumped up price ($600 for a 40GB version of the PS3, plus Dual Shock controller and MGS4 game) is due to the special material used to create the gunmetal gray system and controller casing. The company seems to take issue with the fact that some are calling it simply a “paint job,” and points out that the run of units is only 10,000, and the package also includes a version of the game with an extra disc of additional content. To put it another way: this is the deal of a lifetime!
[Via Wired]



We’ve got your summer crapcamcorders right ‘ere — three new RCA Small Wonder cams to make your Flips flop. Check ‘em out:
- Small Wonder MyLife, EZ200 (black) - 1.5-inch flip-out display, microSDHC with with 1GB microSD that does up to 30 / 60 minutes in high quality and web quality modes, $89 and due this summer
- Small Wonder Pocket, EZ205 (white) - 1.5-inch flip-out display, microSDHC slot with 1GB microSD that does up to 30 / 120 minutes in high quality and web quality modes, $100 and due this summer
- Small Wonder Traveler, EZ210 (green) - 2.4-inch QVGA display, SDHC with 2GB SD card that does up to 60 / 240 minutes in high quality and web quality modes, $150 and due this summer



It struck us the other day as we were going over some back posts that since the launch of the ASUS Eee PC (and the numerous products that have followed, from the MSI Wind to the HP Mini-Note), to the best of our knowledge we’ve all yet to have any normalized, agreed-upon name for these kinds of devices. Yes, they’re technically “ultraportables” (which we usually define as being any relatively small laptop and under four pounds), but to say an Eee PC is in the same class as, say, the Lenovo X300, the VAIO TZ, or the MacBook Air would be kind of misleading.
We’ve heard “low-cost ultraportable” and “laptop-lite” used to describe these kinds of machines, but it seems best to let you decide: leave your best suggestion for what we all should all call this emerging product category, and we’ll put it up to vote next week. We’ll officially be throwing “netbook” into the ring. (Intel may have initially coined the term, but that we don’t think it should necessarily have to refer only to Intel-based products.)





Hamrick Software on Thursday announced the release of VueScan 8.4.73, the latest version of its scanning software for Mac OS X and Windows. It comes in Standard and Professional editions for $39.95 and $79.95 respectively.
VueScan drives 750 different flatbed and film scanners, including many models whose manufacturers have abandoned them. It can help you scan faded slides and prints, automatically adjusts images to optimum color balance and manages batch scanning commands.
New to the 8.4.73 release is support for multi-core processors, which Hamrick Software says can result in scans up to 50 percent faster. The largest performance improvements are for infrared cleaning, grain reduction and descreening, specifically.
The Professional version adds unlimited free upgrades, advanced IT8 color calibration and support for raw scan files.
VueScan requires Mac OS X 10.3 or later.



SAN FRANCISCO–The inner workings of Google just became a little less secret.
The search colossus has shed only occasional light on its data center operations, but on Wednesday, Google fellow Jeff Dean turned a spotlight on some parts of the operation. Speaking to an overflowing crowd at the Google I/O conference here on Wednesday, Dean managed simultaneously to demystify Google a little while also showing just how exotic the company’s infrastructure really is.

Google fellow Jeff Dean
(Credit: Stephen Shankland/CNET News.com)
On the one hand, Google uses more-or-less ordinary servers. Processors, hard drives, memory–you know the drill.
On the other hand, Dean seemingly thinks clusters of 1,800 servers are pretty routine, if not exactly ho-hum. And the software company runs on top of that hardware, enabling a sub-half-second response to an ordinary Google search query that involves 700 to 1,000 servers, is another matter altogether.
Google doesn’t reveal exactly how many servers it has, but I’d estimate it’s easily in the hundreds of thousands. It puts 40 servers in each rack, Dean said, and by one reckoning, Google has 36 data centers across the globe. With 150 racks per data center, that would mean Google has more than 200,000 servers, and I’d guess it’s far beyond that and growing every day.
Regardless of the true numbers, it’s fascinating what Google has accomplished, in part by largely ignoring much of the conventional computing industry. Where even massive data centers such as the New York Stock Exchange or airline reservation systems use a lot of mainstream servers and software, Google largely builds its own technology.
I’m sure a number of server companies are sour about it, but Google clearly believes its technological destiny is best left in its own hands. Co-founder Larry Page encourages a “healthy disrespect for the impossible” at Google, according to Marissa Mayer, vice president of search products and user experience, in a speech Thursday.
To operate on Google’s scale requires the company to treat each machine as expendable. Server makers pride themselves on their high-end machines’ ability to withstand failures, but Google prefers to invest its money in fault-tolerant software.
“Our view is it’s better to have twice as much hardware that’s not as reliable than half as much that’s more reliable,” Dean said. “You have to provide reliability on a software level. If you’re running 10,000 machines, something is going to die every day.”
Breaking in is hard to do
Bringing a new cluster online shows just how fallible hardware is, Dean said.
In each cluster’s first year, it’s typical that 1,000 individual machine failures will occur; thousands of hard drive failures will occur; one power distribution unit will fail, bringing down 500 to 1,000 machines for about 6 hours; 20 racks will fail, each time causing 40 to 80 machines to vanish from the network; 5 racks will “go wonky,” with half their network packets missing in action; and the cluster will have to be rewired once, affecting 5 percent of the machines at any given moment over a 2-day span, Dean said. And there’s about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover.

A look at a custom-made Google rack with 40 servers from a modern data center. Infrastructure guru Jeff Dean showed the snapshot at the Google I/O conference.
(Credit: Stephen Shankland-CNET News.com/Jeff Dean-Google)
While Google uses ordinary hardware components for its servers, it doesn’t use conventional packaging. Google required Intel to create custom circuit boards. And, Dean said, the company currently puts a case around each 40-server rack, an in-house design, rather than using the conventional case around each server.
The company has a small number of server configurations, some with a lot of hard drives and some with few, Dean said. And there are some differences at the larger scale, too: “We have heterogeneity across different data centers but not within data centers,” he said.
As to the servers themselves, Google likes multicore chips, those with many processing engines on each slice of silicon. Many software companies, accustomed to better performance from ever-faster chip clock speeds, are struggling to adapt to the multicore approach, but it suits Google just fine. The company already had to adapt its technology to an architecture that spanned thousands of computers, so they already have made the jump to parallelism.
“We really, really like multicore machines,” Dean said. “To us, multicore machines look like lots of little machines with really good interconnects. They’re relatively easy for us to use.”
Although Google requires a fast response for search and other services, its parallelism can produce that even if a single sequence of instructions, called a thread, is relatively slow. That’s music to the ears of processor designers focusing on multicore and multithreaded models.
“Single-thread performance doesn’t matter to us really at all,” Dean said. “We have lots of parallelizable problems.”
The secret sauce
So how does Google get around all these earthly hardware concerns? With software–and this is where you might think about dusting off your computer science degree.

A Google data center, circa 2000. Note the fan on the floor to cool servers.
(Credit: Stephen Shankland-CNET News.com/Jeff Dean-Google)
Dean described three core elements of Google’s software: GFS, the Google File System, BigTable, and the MapReduce algorithm. And although Google helps with a lot of open-source software projects that helped the company get its start, these packages remain proprietary except in general terms.
GFS, at the lowest level of the three, stores data across many servers and runs on almost all machines, Dean said. Some incarnations of GFS are file systems “many petabytes in size”–a petabyte being a million gigabytes. There are more than 200 clusters running GFS, and many of these clusters consist of thousands of machines.
GFS stores each chunk of data, typically 64MB in size, on at least three machines called chunkservers; master servers are responsible for backing up data to a new area if a chunkserver failure occurs. “Machine failures are handled entirely by the GFS system, at least at the storage level,” Dean said.
To provide some structure to all that data, Google uses BigTable. Commercial databases from companies such as Oracle and IBM don’t cut the mustard here. For one thing, they don’t operate the scale Google demands, and if they did, they’d be too expensive, Dean said.
BigTable, which Google began designing in 2004, is used in more than 70 Google projects, including Google Maps, Google Earth, Blogger, Google Print, Orkut, and the core search index. The largest BigTable instance manages about 6 petabytes of data spread across thousands of machines, Dean said.
MapReduce, the first version of which Google wrote in 2003, gives the company a way to actually make something useful of its data. For example, MapReduce can find how many times a particular word appears in Google’s search index; a list of the Web pages on which a word appears; and the list of all Web sites that link to a particular Web site.
With MapReduce, Google can build an index that shows which Web pages all have the terms “new,” “york,” and “restaurants”–relatively quickly. “You need to be able to run across thousands of machines in order for it to complete in a reasonable amount of time,” Dean said.
The MapReduce software is increasing use within Google. It ran 29,000 jobs in August 2004 and 2.2 million in September 2007. Over that period, the average time to complete a job has dropped from 634 seconds to 395 seconds, while the output of MapReduce tasks has risen from 193 terabytes to 14,018 terabytes, Dean said.
On any given day, Google runs about 100,000 MapReduce jobs; each occupies about 400 servers and takes about 5 to 10 minutes to finish, Dean said.
That’s a basis for some interesting math. Assuming the servers do nothing but MapReduce, that each server works on only one job at a time, and that they work around the clock, that means MapReduce occupies about 139,000 servers if the jobs take 5 minutes each. For 7.5-minute jobs, the number increases to 208,000 servers; if the jobs take 10 minutes, it’s 278,000 servers.
My calculations could be off base, but even qualitatively, that’s enough computing horsepower to make the mind boggle.
Fault-tolerant software
MapReduce, like GFS, is explicitly designed to sidestep server problems.
“When a machine fails, the master knows what task that machine was assigned and will direct the other machines to take up the map task,” Dean said. “You can end up losing 100 map tasks, but can have 100 machines pick up those tasks.”
The MapReduce reliability was severely tested once during a maintenance operation on one cluster with 1,800 servers. Workers unplugged groups of 80 machines at a time, during which the other 1,720 machines would pick up the slack. “It ran a little slowly, but it all completed,” Dean said.
And in a 2004 presentation, Dean said, one system withstood a failure of 1,600 servers in a 1,800-unit cluster.
Next-generation data center to-do list
So all is going swimmingly at Google, right? Perhaps, but the company isn’t satisfied and has a long to-do list.
Most companies are trying to figure out how to move jobs gracefully from one server to another, but Google is a few orders of magnitude above that challenge. It wants to be able to move jobs from one data center to another–automatically, at that.
“We want our next-generation infrastructure to be a system that runs across a large fraction of our machines rather than separate instances,” Dean said.
Right now some massive file systems have different names–GFS/Oregon and GFS/Atlanta, for example–but they’re meant to be copies of each other. “We want a single namespace,” he said.
These are tough challenges indeed considering Google’s scale. No doubt many smaller companies look enviously upon them.









