| February 17, 2009 8:00 PM PST | |
by Geoff Koch
The URL coursework.stanford.edu had humble enough beginnings in the late 1990s.
“It started as a research project and I was the only developer working on it,” said Scott Stocker, a former Master’s student who today is director of Web communications at Stanford.
The online course management tool is used by around 600 professors each quarter to post assignments, foster online discussion and administer quizzes. Stocker, who has long since moved on and up the Stanford IT hierarchy, left behind a full-time staff of four to manage the application he built from scratch.
Scaling, whether on a single university server or a massively distributed e-commerce application, is a challenge that just about every coder will encounter at some point in a career. And mobility only compounds the scaling issue, as new phone and PDA-powered users begin banging on Web applications designed for desktop and laptop browsers.
From academia to industry, hands-on programmers are using a handful of best practices to address the explosion of scaling issues. Many are still employing an eat-your-vegetables kind of coding common sense. But a rising tide of tools, applications, and documentation may soon make it easier to mobilize Web applications for new and different kinds of users.
Stocker’s users were professors who, despite having impressive résumés and jobs at tech-steeped Stanford, had varying degrees of Web competency. The application had to be flexible (to be useful both to the Web pros and knaves on the faculty), robust (if it crashed or was buggy, no one would use it) and inexpensive (Stocker was a staff of one funded by a small grant from the Andrew W. Mellon foundation, to develop new learning management tools).
The approach, as is fairly common in academia, was to get the ball rolling with open standards and open source.
The application’s lower level guts were all open source, as Stocker built on top of the Linux* operating system and mySQL* database. Though it was a pre-J2EE* world, Stocker still choose Java*, and relied heavily on servlets and JSPs*.
Unlike Common Gateway Interface (CGI) programs, Java servlets are persistent, standing by in memory to fulfill multiple requests once they’re started. Beyond the benefits of separating a Web page’s logic from its static elements, JSPs aren’t restricted to any specific platform or server.
As coursework.stanford.edu moved from this-might-actually-work to mission critical, Stanford’s main IT shop eventually stepped in to support Stocker’s creation. Stanford IT is a Sun/Solaris*/Oracle* database environment, but the open source Java APIs plugged in easily enough to this backend.
“It’s just the nature of Java Web applications that make them easy to scale,” said Stocker. “The [Java] platform is TCP/IP based. It doesn’t matter if the database is on a separate computer; the JDBC protocol allows for easy communication with the database over TCP/IP.”
Stocker has moved on to building and managing Stanford’s Web-based events calendar*. Groups around campus use the application to advertise events large and small alike – from a lecture by writer-in-residence Bharati Mukherjee to an invitation to Stanford’s annual ballroom dance competition.
What about making the application more inviting to increasingly mobile users? For now, Stocker is not coding any differently, in part because of the wired-to-the-eyeballs nature of the Stanford campus.
University-sponsored surveys indicate that well over 90 percent of students have their own computers, the vast majority of which are laptops. More and more of these laptops access the Internet via the 1,800-and-counting Wi-Fi hotspots on campus.
In this environment of near ubiquitous connectivity, Stocker believes there’s little he needs to do differently to accommodate mobile users. He’s adding Really Simple Syndication (RSS) to the calendar later this year, and presumably at least a few mobile users on and off campus may subscribe to the RSS feeds on their Web-enabled phones.
For others, however, mobility has caused profound changes in thinking about Web programming. Take the U.S. federal government, hardly known as a tech innovator. Working with Intel, the feds mobilized the grants.gov Web site – the single access point for over 900 grant programs offered by the 26 federal grant-making agencies.
Instead of Web forms, the site now serves up grant applications as stand-alone documents. No network connections are required except during the actual download and submission phases of the grant application process.
This is good news for scientists who want to peck away at their grant applications while taking breaks from their field work, but the architecture is also a boon for desk jockeys and cubicle dwellers.
The old Web-based forms were problematic because users often had no way to fill out the form over the course of multiple sessions. This caused headaches for users who needed to collaborate on grant applications with colleagues, or who found themselves missing a required piece of information.
The new intelligent documents enable offline use, easy circulation among colleagues, and – with help from lightweight XML schema using SOAP and other Web services protocols – a way to minimize the traffic spikes around the time of major application deadlines.
“We haven't considered such asynchronous services yet, but it is something we might consider adding in the future,” Stocker said.
Those considering homespun alternatives to the large online booksellers often happen upon Powells.com*. (Type “buy books online” into Google and Powells.com comes up third, behind mega-sites Amazon and Barnes & Noble.) The homepage for the self-described “legendary independent bookstore” is perhaps best described as comfortably cluttered, not unlike Powell’s four-story real world location near downtown Portland.
There are fonts of different sizes and colors, book cover photos mixed with mildly cartoonish graphics, a souvenir shop, sale section and an A-to-Z listing of shelves for virtual browsing. It definitely is not Google’s lean User Interface (UI), and this is entirely by design, says Darin Sennett, Powell’s director of Web stuff – a title that reflects the store’s iconoclastic culture.
“We can’t get away with a sleek minimalist design; the site has to be luxurious, ” said Sennett. “Just as in real life, where there’s a walk-in experience in our store, there has to be a walk-in experience on our site.”
Powell’s may be the largest independent bookstore online, but its site is small next to Amazon, which does millions of dollars of business annually.
“We’ve never had a big wad of investor cash, so we’ve had to work within profitability,” said Sennett. “We’ve been choosing food over special effects from the very beginning.”
Like Stocker, Sennett has dealt with lean budgeting by relying heavily on open standards and open source. He ticks off a familiar litany of fixtures atop which the site sits – Sun machines, the Solaris* operating system, Apache Web server software, the mySQL database and lots of Perl* and PHP* scripts.
Scaling for Sennett is more about end users and experimentation than the latest and greatest application architecture. He says he doesn’t worry so much about keeping up with technology but instead just tries to sell books online better and more efficiently each day.
This is not to imply that Powell’s is anti-programming progress. Sennett is proud of the fact that Powells.com went online before Amazon and that it offered a shopping cart feature on the site before “shopping cart” had ever entered the Web lexicon.
Sennett’s current scaling challenge is to implement Web standards on the site that, unlike old fashioned HTML, can potentially adapt to different types of output. A site redesign is underway, and one of Sennett’s programmers is at work emulating the current site structure in cascading style sheets (CSS).
It's a big project, and the Web team is moving slowly despite the mounting stacks of trade magazine articles urging developers to embrace Web standards. One of Sennett’s favorites is “Retooling Slashdot with Web Standards*,” [http://www.alistapart.com/articles/slashdot/] published by Web trade magazine A List Apart* in November 2003.
“Obviously, we didn’t rush toward [Web-standard] CSS-based publishing,” Sennett said. “We weren’t feeling a huge demand.”
Reading the press releases from the various firms hyping mobility, it’s tempting to think that the market is on the cusp of a huge demand for mobile content and services of all stripes. Sennett’s thoughts on mobility and the associated scaling issues for Web site operators are more measured.
Powells.com sells eBooks in Microsoft, Adobe* and Palm* formats. The site has been offering eBooks since the days of the ill-fated* Gemstar eBook* devices.
Sennett won’t provide specifics, only saying that demand for eBooks is growing slowly, despite the fact they’re much cheaper – many are in the $5 to $7 range – than their ink and paper analogs.
eBooks aren’t limited to mass market titles. The format is increasingly used by companies looking for a low-cost way to distribute lengthy documentation and white papers, and these companies may be getting more mileage out of their PDF files than the big New York publishers.
Take "The Business Value Roadmap to Mobilized Software Solutions"*, published by freeMARKETpress. The book’s PDF chapters end with surveys to gather information about the readers, including feedback on how valuab le the content is to them and where they learned about the book. The surveys also allow readers to submit their contact information and that of colleagues for further contact by the book's publishers.
"Yes, I could see that being a great add in to an e-book, provided that encryption issues could be solved," said Sennett. Unlike documentation freebies served up by tech companies, commercial book publishers encrypt their e-books as part of their copyright protection efforts.
The Powells.com programmers are by no means done experimenting with mobile offerings. Audio books are increasingly popular*, and Sennett suggests this may be the year that Powell’s offers audio downloads from its site.
For now, new RSS feeds of all the database-driven content on Powells.com is a first step toward standards that will make it easier to scale the site’s content, to mobile users or otherwise.
Harris Hutkin is up to his eyeballs in content. Employed by Time, Inc., a subsidiary of the largest media company in the English speaking world, he’s awash in everything from magazine articles to movie trailers. As a senior mobile product manager, Hutkin’s task is scaling this content to a full panoply of portable platforms.
The Web world according to Hutkin can be split between the old school and new school.
“The old school is those people who originally discovered the Web,” he said. “They built their HTML pages manually, mixing content and applications, and relying on lots of custom coding."
"The new school is just about every company that’s formed since the bubble burst," Hutkin continued. “Web infrastructure for these companies is marked by use of standards, such as XML. Data is separate from design.”
The new school appears to be in a much better position to address mobile users. Standards-based architectures are more efficient at producing HTML or Wireless Markup Language (WML) pages. The whole mobile thing just isn’t that big of a deal for these shops, Hutkin says, since WAP 2.0 was defined to use standard compliant data (XHTML) and style sheets (WAP CSS).
It’s getting to a point where even the old-schoolers are starting to adopt style sheets and standards, Hutkin believes. Powells.com slow move to emulate its site structure in CSS as it readies for its site redesign may be one example of this.
What kind of scaling problems are old-schoolers encountering? Consider the case of a hypothetical media company that’s been publishing online for years. Until recently, if an online article had a table of data in it, the Webmaster might simply embed the table right in the HTML page, along with formatting to make the table look good.
Today, if someone tries to access that data on a mobile device or a different browser, the table will not look the way it was intended, and the user experience breaks down. Style sheets can avoid this problem.
Even something as simple as a headline can cause fits and starts, both for different platforms and for the all-important attribute that the article be found easily by a Google search.
A few years ago, the headline for a story might have been coded:
<font size=4><b>Michael Jackson #1 Again This Week</b></font>
This would have rendered the headline perfectly well on that article. However, Google would have had a hard time finding the headline and the font size=4 headline wouldn’t look good on a small device. Now, most sites will surround the headline with the <H1> tag, which can be defined in the style sheet so it looks good on various devices and which allows Google to find and store the headline. H1 tags carry more weight then regular text in the Google PageRank process, according to several threads* posted in Google Answers.
There’s no easy answer to dealing with vast amounts of legacy pages where simple text and data are intermingled with HTML design elements. One approach is to write scripts to look for old HTML code in the data. The scripts can replace the HTML with procedural calls (procs), strip it out altogether, or notify an editor to the poorly formed content.
“By having these procs in the data, instead of the normal HTML tags, the template that builds the page can process the proc depending on where the data needs to be displayed, (Web, mobile device, and such others), and generate an appropriate file that can be read by the reader (browser, RSS reader, and such others),” Hutkin wrote in a follow-up e-mail interview.
Another scaling challenge comes from the fact that, increasingly, readers of Web pages aren’t just human beings. Computers are looking at this stuff, too, as all the XML-based RSS syndication indicates. This machine-readability is relevant when thinking about search engine optimization.
By now there’s a fairly well known bag of tricks to pull from to get found by Google, and these tricks are the same whether the searching is taking place by desktop PC, laptop, or WAP-enabled phone. The time sink for developers isn’t getting their page found in the first place, but actually delivering the content to mobile devices. With a proliferation of mobile devices, there’s no easy way around this besides just ignoring the issue and letting mobile browsers render what’s delivered without concern for usability on the device.
“Are you prepared to offer all the content on your site in WML format, for instance?” Hutkin asked. “If your pages don’t render well across devices, then you may have to consider producing a version of the site in WML 1.x, since 1.x is the least common denominator.”
Even this least common denominator approach isn’t cheap, as WML pages still need to be built. For content sites, generating an article page isn’t difficult. Create a WML template, drop in the copy, and you’re done, Hutkin says. But getting users to those article pages is another story.
“Unless you have a standards-compliant home page, you’re going to have to build a special home page, and any other navigational pages for WAP users to navigate your content,” Hutkin said. “This could be a time consuming process.”
So how does the mobility cost-benefit equation work out tod ay at Time? For an answer, consider that the company just signed a contract with U.K.-based Flytxt*, an SMS messaging platform provider. Hutkin says that despite the phone companies pushing richer content vision of the near future, he believes that Time needs to go where the users are right now – SMS.
Soon, however, mobile marketing campaigns based on 160-character text messages may seem quaint, especially as momentum for feature phones continues to grow*. Platform-friendly PDF documents – versions of Acrobat Reader* already exist for Palm OS*-, Symbian OS*- and Pocket PC*-based devices – could easily replace simple text messages.
Adobe and SAP are already developing joint solutions* (PDF 160KB) that integrate PDF-based forms with SAP business back-ends. This solution generates electronic forms as PDF files that may incorporate static text, logos, and so on, as well as personalized data from SAP data stores. Business owners route those interactive forms to users, who can use them either online or offline, and submit data from them back to the SAP back-end using lightweight SOAP and XML.
“If you believe that one of the benefits of mobile is that you’re ‘always on’ then the notion of occasionally connected computing becomes less important,” said Hutkin. “Of course, this isn’t a reality today (think air travel). In today’s world, where a mobile device’s UI is clunky and not always ‘on,’ then the idea of distributing an application download that stores data until the user is back ‘online’ and can transmit again makes sense.”
- Read how a growing number of large providers are developing mobilized solutions that allow rich, human-friendly documents to communicate with business back-ends in the Intel Software Network article “Mobilized Applications and Solutions: Using Intelligent Documents as Enterprise Front-Ends.” Editor’s note: the company Sand Hill Systems referred to in this article is no longer in business.
- Read how to improve network performance through the use of processes/threads and interrupts affinity in the Intel Software Network article “Improved Linux* SMP Scaling: User-directed Processor Affinity.”
- Read why insisting on the outdated notion of pervasive, persistent, “always on” technology misses the mark in “Always Available Computing: Best Practices for Empowering Today's Mobile Work Force*.”
- Read how Slashdot might look different with style sheets in “Retooling Slashdot with Web Standards*,” published by Web trade magazine A List Apart.
Geoff Koch is a science and technology journalist in Lansing, Mich. His articles on writing code for cellular or handheld devices include Culture: The Next Big Thing in Code and One Plea for More Open Cell Phone Platforms.
For more complete information about compiler optimizations, see our Optimization Notice.

