Applying Information Visualization Techniques to Web Navigation

Preliminary Thesis Proposal

Mark Brautigam
1 March 1997

0. Abstract

This paper is a very early draft of a thesis proposal to do research in the application of various information visualization techniques to the problem of navigating and finding information on the World Wide Web (WWW). In its current state, this paper is more of an outline than a complete statement. This paper is intended to include a comprehensive review if the Information Visualization literature, followed by proposals for future research in applying Information Visualization techniques to the problems of browsing and querying the WWW. Information that does not yet appear in this paper includes:

critical comparison of all of the techniques
more?

0.1 Outline

Introduction
Information Visualization Techniques
Current Applications
1. Information Visualizers
2. Existing Web Applications
Proposed Web Applications
References

1. Introduction

This section presents the following topics:

Why Information Visualization?
a rationale for studying Information Visualization
What is Information Visualization?
a description of what Information Visualization is
Using Perception
a brief description of the human perceptual system and how Information Visualization techniques can accommodate it

1.1 Why Information Visualization?

This section discusses the rationale for looking at Information Visualization as a solution to assimilating information. It discusses the usual suspects:

the explosion of information available on the WWW
larger hard disk sizes mean more information available quickly
the mismatch between computer displays and the human perceptual system
the mismatch between computer controls and human motor functions

What information visualization provides that other interfaces do not:

a means of easily seeing trends in the data
a means of easily seeing outliers
a means of seeing jumps in the data (gaps)
a means of easily identifying maxima and minima like largest, smallest, most recent, oldest, etc.
a means of identifying boundaries (not the same as maxima or jumps)
a means of easily identifying clusters in the data
a means of finding structure in heterogeneous information
a means of seeing an enormous amount of data on one display screen
a means of seeing a particular item of interest within the context of an enormous amount of contextual data

1.2 What is Information Visualization?

Visual Information Seeking Mantra:

Overview first, zoom and filter, then details-on-demand. (Schneiderman)

Information Visualization is:

presenting information in such a way that as much as possible can be assimilated by the human perceptual system instead of relying on the human cognitive system
presenting detailed information about a specific topic while also presenting a complete overview of all information available (the fisheye concept)
The term was introduced by Card et al. in their seminal paper [CARD91]. They intended to draw on background in the area of Scientific Visualization. They saw parallels between the two visualization fields in terms of their purpose (extracting salient dimensions from multidimensionala data) and their methods (using advanced 3D graphics and animation techniques to present data. Since the time of that paper. Information Visualization has come to be regarded less as a branch or offshoot of Scientific Visualization and more as an offshoot of Human-Computer Interaction (HCI). In particular, in keeping with HCI's roots in both computer science and cognitive psychology, Information Visualization now carries the HCI torch of tailoring computer-to-human communication technques to the human perceptual system.

"Information visualization uses computer graphics and interactive animation to stimulate recognition of patterns and structure in information. It does so by exploiting the human perceptual system in ways similar to Scientific Visualization, which allows scientists to perceive patters in large data collections....Information visualization works on the structure of information inherent in large information spaces." [ROBE91b]

"The basic problem is how to utilize advancing graphics technology to lower the cost of finding information and accessing it once found." [ROBE93]

"Visualization enables people to use a natural tool of observation and processing—their eyes as well as their brain—to extract knowledge more efficiently and find insights." [GERS95b]

"Information, then, need not be inherently spatial. But because we live and perceive in a physical world, it is easier to convey the information to the observer if the information is represented by being mapped to the familiar physical space." [GERS95b]

"While the term "information visualization" is coming into use, the goal is really "information perceptualization." The latter implies a richer use of many senses, including sound and touch, to increase the rate at which people can assimilate and understand information." [CARD96]

1.3 Using Perception

We speak of offloading the task of information assimilation from the perceptual system to the cognitive system. This means that we tailor the information so that the eye can quickly distinguish salient features before the brain begins to process it. The perceptual system operates in a time range of 10 to 100 milliseconds. The cognitive systems operated in a time range of hundreds of milliseconds to several minutes. If we can tailor the information such that the perceptive system can process, we can speed the task of human information assimilation by several orders of magnitude.

To do this, we must present the information using features that the eye can distinguish quickly. These features include, but are not limited to, the following:

color
size
shape
location / position
others?

We use these features, and look for others that aid the human perceptual system in distinguishing salient information.

Humans can recognize the spatial configuration of elements in a picture and notice relationships among elements quickly. This highly developed visual system means people can grasp the content of a picture much faster than they can scan and understand text.

"Interface designers can capitalize on this by shifting some of the cognitive load of information retrieval to the perceptual system. By appropriately coding properties by size, position, shape, and color, we can greatly reduce the need for explicit selection, sorting, and scanning operations." [SHNE94]

The eye is not equally sensitive to detecting horizontal and vertical lines (or features). the apparent location of vertical lines is often displaced, whereas the apparent location of horizontal lines is not....people can better perceive the relative positions of horizontal, in contrast to vertical lines.

Problems with 3D coding: from [FADIVA Shneiderman]

clutter
obscuring (occlusion)
disorientation (eg. web browsing)
being inside the data
scalability (ie can we use it for very large data spaces?)

2. Information Visualization Techniques

This section presents four different categories of Information Visualization techniques:

Focus+Context
Zooming and Filtering
User Interface Widgets
Perceptual Impedance Matching

More techniques mentioned by Shneiderman [FADIVA workshop]

overviews (same as f+c)
landmarks
traversal (same as UI widgets)
flatten by dimension (eg. Worlds Within Worlds?)

2.1 Focus and Context

Furnas (1986) investigated the fisheye view, a kind of lens that magnifies a small area of a display, allowing the periphery of the display to remain visible while receding into the background. Others later expanded on this technique to create a series of techniques that allow a user to view a small central focus while maintaining the visibility of a larger context. This focus+context concept is exemplified by the following IV visual techniques:

"Detailed views of particular parts of an information set are blended in some way with a view of the overall structure of the set." [LAMP95]

"Any presentation technique that displays a large information space (the context) with some portion of it in more detail (the focus)." [citrin - not listed yet]

Focus+Context techniques:

Fisheye Views
Cone Tree
Multiple Views
SeeSoft
Perspective Wall
Butterfly Citation Browser
Hyperbolic Tree Browser
Fractal Views
Variable Zoom
Data Sphere
No Information Yet!
- Table Lens
- Document Lens
- Spiral Calendar

2.2 Zooming and Filtering

Sometimes the quantity of information available makes it undesirable to display all of it. This might occur for any of the following reasons:

the plot of the information points has a severe occlusion problem
the quantity of information points is such that the system cannot plot all the points in a reasonable amount of time
the data has so many dimensions that it is impractical to display all of it at once in a 2- or 3-dimensional display
the user knows he or she is interested in only a particular subset of the data

In these cases, we want to filter the information in some way. If this filtering takes the form of selecting a subset of the data along a range of numerical values of one or more dimensions, we call this kind of filtering zooming. Filtering and zooming work by recuding the amount of context in the display; this distinguishes them from the focus+context techniques, which attempt to retain all the contextual information even if it must drawn so small as to make it virtually invisisble.

The following information visualization techniques make use of filtering and zooming:

Starfield Display
TreeMaps
Focus Interactive Table
TileBars
Pad++
No Information Yet!
- Cityscape flyby (only briefly mentioned in [GERS95b], w/o reference)
- Themescape (only briefly mentioned in [GERS95b], w/o reference)
- Word Table (only briefly mentioned in [GERS95b], w/o reference)

Again, I intend to analyze, discuss, and picture all these techniques.

2.3 Widgets for Information Visualization

Along with visual display techniques, information visualization has bred a new series of interaction methods for dealing with large amounts of information. These techniques allow the user to select a focus, filter out extraneous information, zoom in on certain ranges of information, and create complex query criteria for finding particular information. The following interaction techniques are particularly useful when used in conjunction with the visual techniques discussed above.

Alphaslider
Range Slider
Query Spreadsheet
Movable Filters
Magic Lens
Zoom Bar
No Information Yet!
- Range Slider
- Protofoil (covered in the RICH LIBRARY article, a bit too much, very good!) also in the [HEAR96] article
- Filter/Flow Boolean Query
- Two-dimensional widgets

I will discuss all of these in detail, and show pictures of each.

2.4 Perceptual Impedance Matching

There are many techniques we can use to increase the speed with which a user can interact with information. In addition to the visual characteristics such as color and size that trigger responses in the human perceptual system, there are techniques that help keep the user working and keep the user from becoming disoriented. I call these techniques perceptual impedance matching because they try to keep the flow of information constant and flowing.

I classify the following techniques not as focus+context techniques, not as filtering techniques, and not as widgets, but as perceptual impedance matching techniques.

(animation???)

From [GERS95b]: (quoted)

The representation must suit both the problem and user's visual sophistication and preferences.
Results must be displayed before the user loses his or her train of thought.
A great deal more has to be learned about human perception, to allow the creation of more efficient visualizations that enable the user to perceive the information quickly.
Information and data must be represented faithfully.

3. Current Applications

This section contains information about information visualizers and existing applications of Information Visualization techniques to the World-Wide Web.

3.1 Information Visualizers

This section discusses the following applications that either use Information Visualization techniques or function as test beds for exploring new Information Visualization techniques.

The Information Visualizer (Xerox PARC)
The Information Visualization Exploration Environment (Chalmers U.)
Film Finder and Home Finder (U. Maryland)
DeckView

3.2 Information Visualizer

[CARD91]

3 components:

3D Rooms
Cognitive Co-Processor (animation-oriented UI architecture)
Information Visualizations

perceptual processing time constant: 100 ms

Visualizations:

Cone Tree for hierarchical data
Perspective Wall for linearly-structured data
Data Sculpture / Data Map for 3D data in space
Building

3.3 Information Visualization Exploration Environment (IVEE)

[AHLB95]

Multiple visualizations of the same data, at the same time.
Retrieve details on demand by clicking on visualization objects.

Uses:

film finder
Zoom Bars
Starfield Display
Alphaslider
Range Slider
rotation slider
home finder
periodic table
geographic visualizations
Dynamic Queries

3.3 Film Finder & Home Finder

[JOG]

Uses starfield display and zoom bar.

3.4 DeckView

[GINS96]

Uses a thumbnail view of postscript pages.

advantages:

position of fully visible thumbnail locates the current page in the document
drag through thumbnails to scan the pages
temporary and permanent bookmarks
annotation features places a postscript comment directly in the file;
- doesn't screw up the PS document;
- can share the marked-up document
- all information is in a single file

3.5 Existing Web Applications

There have already been many attempts to apply these visualization techniques to accessing the WWW. Some of these applications include the following:

Zooming Web Browser
Narcissus Clustering System
Harmony Internet Browser
MITRE enhancements to NCSA Mosaic
Navigational View Builder
Tabular Visualization (should perhaps be under basic techniques?)
No Information Yet!
- Personal infospace
- Web Book
- Web Forager

3.6 Zooming Web Browser

[BEDE96]

multiple pages and the links between them are depicted on a large zoomable information surface
pages are scaled so that the page in focus is clearly readable with connected pages shown at smaller scales to provide context
layout changes are animated
pad++ allows www pages to remain visible at varying scales while they are not specifically being visited, so the viewer may examine many pages at once.

3.7 Narcissus

[HEND95]

automatic clustering of data in 3D
physically-based:
- all objects repel each other
- active relationships between objects cause attractive forces
can be used to generate displays for xmosaic

3.8 Harmony Internet Browser

[ANDR95]

hand-crafted vs. automatically-generated presentations

2D structure maps:

outline of browsing history
local map - the same information in a 2D graph

3D features

information landscape
fly over

3.9 MITRE enhancements to NCSA Mosaic

[GERS95a]

(includes 2D view of word correlations)

Another problem for Internet surfers is that often they do not know where they "are" in information space and cannot remember how they got there. This is sometimes referred to as being "lost in cyberspace."...One remedy is to provide users with a view of the information space available to them. The user can "jump" from one document to another by clicking the mouse button without having to backtrack resource by resource, or in Web parlance, page by page. [GERS95b]

allows the user to view the hyperspace depicted as a visual "tree" structure
users can jump from one document to another by pointing and clicking without having to go back
allows users to view the names of documents and how they are linked without actually opening and reading each document
allows users to save the documents in a personal space
allows the user to create a new custom hyperlink structure to save and share

new tool for word correlation

3.10 Navigational View Builder

which uses two different techniques:

Filtering
Hierarchization

[MUKH95a] also [MUKH95b]

can use multiple views of the network
landmarks should be identifiable
uses color, size, and shape to distinguish nodes
let the user decide the mapping between the data attributes and the visual properties (don't hard-code it!)
two aspects of color: hue vs saturation
filtering: also cluster-based vs. structure-based

3.11 Tabular Visualization

[MUKH96]

(3D blocks in a 2D space)

3.12 Web Search Services

In addition to these applications, a there are a variety of new WWW search services that explore multiple search paths in parallel and collate the results. This has the effect of performing some of the filtering that is part of the Information Visualization "bag of tricks." These services include, but are not limited to:

3.13 MetaCrawler

[SELB96]

posts the query to multiple search services in parallel
collates the returned references
load references to verify their existence

4. Proposed Applications of IV Techniques to the Web

I'd like to restrict this discussion to the use of Information Visualization techniques to display various kinds of information obtained from the WWW. There is a temptation to get bogged down in Information Retrieval techniques. Information Retrieval is a related and very relevant subject, but it is also a lot bigger than I'd like to get into. Instead, I prefer to focus on the kinds of information we can already obtain from the WWW, and consider ways we can visualize that already-existing data more efficiently.

I believe there are four questions we must ask about how we intend to visualize web data.

What kind of data do we want to visualize?
What kind of visualization techniques do we want to use?
How do we want to implement the visualization front end? That is, what platform, software, libraries, etc. do we want to use for drawing the displays?
How do we want to implement the back end? That is, how do we want to collect the information that we will then visualize?

4.1 What kind of data to visualize?

I'd like to think about sorting and filtering WWW data based on the following kinds of characteristics:

Source. We can obtain this information from the domain identifier, for example, .com or .edu.
Size. Some search engines, such as AltaVista, return this information as a part of each search result.
Site. Again, we can obtain this information from the domain name, for example, cse.ucsc.edu or gvutech.edu.
Type of page. There are many types of pages. I tend to divide the pages I have seen into the following types:
- Personal pages.
- Index pages.
- Informational pages.
- Search engines.
Whether we can actually identify the type of page and use this kind of information remains to be seen.
Type of information. Presuming that a particular page is informational, there are still many kinds of information that it can present. I have run across the following kinds of informational pages:
- Bibliographies.
- Articles.
- Pictures.
- News articles and announcements.
Again, whether we can identify the type of information is an elusive question.
Link structure. If we analyze each document as we retrieve it, we can determine what links exist between each of the documents. This allows us to create some kind of cluster diagram. When used in conjunction with the web site information, this can help us determine not only the relative importance of specific web pages, but also the relative importance of specific web sites.
Date. We might want to look for web pages within a certain time range (for example, when looking for a news article). We might also want to discard older information because we assume it is outdated or inaccurate. I'm not sure how to determine the dates for web pages, but I can think of several possibilities:
- See if the page itself has a date embedded in it somewhere. Many web pages have signatures that read "this page created on such-and-such a date," or some other identifier.
- Web search services, such as AltaVista, return the date they last scanned the page into their archives.
- See if it is possible for the web server to return the actual date on the file. This might depend on the server hardware platform, software platform, and security settings.
Language. We might want to filter out web pages written in languages that we do not understand. It ought to be easy to distinguish western european from eastern european from oriental languages based on a quick syntactic analysis. (Eastern european languages probably use more 8-bit characters for the purpose of including diacritical marks. Oriental languages use 2-byte characters that ought to be easy to spot.) Any further distinctions probably require a more extensive semantic analysis that might not be worth the cost, especially when the following characteristic, geographical location, might help determine the language.
Geographical location. We might want to filter out web pages from particular countries, perhaps in an attempt to filter out languages we do not understand. We can sometimes do this based on the domain identifier, such as .uk (for the United Kingdom), .de (for Germany), and .se (for Sweden).
Outline. If we analyze the contents of a web page, we can easily extract the structure of the page itself from its heading and links. I believe this can be a very powerful feature.
Keywords. If, in addition, we can extract some kind of information about keywords, we can use this data to provide even more dimensions around which to visualize the web information. HTML does provide for the insertion of keywords and other identifying information as comments embedded in web pages. We can also use some kind of semantic analysis to determine keywords when the web page author has not explicitly identified them. Marti Hearst does this for straight text documents with TileBars. However, her technique does not apply to web pages. Besides, her technique is beyond the scope of this paper and reaches into the vastness of the Information Retrieval literature. However, web page headings help to structure web articles and lend greater importance to particular words. This might make a quick semantic analysis more feasible.

4.2 What visualization techniques to use?

Well-suited for this application:

Fisheye Views
SeeSoft
Perspective Wall
Butterfly
Hyperbolic Tree Browser
Starfield Display
TileBars
Pad++ and Zooming Web Browser
Range Slider
Query Spreadsheet / Information Crystal
Dynamic Queries
Iterative Query Refinement (Scatter/Gather)
Animation
Thumbnails as used in DeckView and Web Forager
Narcissus-type automatic clustering
Parallel searches as in MetaCrawler

Not well suited for this application:

Cone Trees (too expensive)
Multiple Views (as implemented)
Fractal Tree (poor substitute for hyperbolic)
Variable Zoom
Data Sphere (too expensive)
Tree Maps
Focus Table
Alphaslider (I don't believe it)
Movable Filters (I don't believe it)
Magic Lens (I don't believe it)
Zoom Bars (I don't believe it)

4.3 How to implement the visualization front end?

That is, what platform, software, libraries, etc. do we want to use for drawing the displays?

Macintosh (my favorite, of course)
SGI (because of its great graphics capabilities)
Sun (more widely available than SGI, more portable, more powerful than Mac)
GL or OpenGL (a standard)
Mosaic is available in a free version or a version than can be liscenced for research purposes (I have some Mosaic source code for Macintosh at home)

4.4 How do we want to implement the back end?

That is, how do we want to collect the information that we will then visualize?

The back end must be powerful enough to sort and sift data quickly. It must also have a fast connection to the internet so it can gather the information quickly.

The Suns here at school are powerful and well-connected, but not everyone has a Sun. What if the researcher is working at home? What if working over a phone line? Then in might be better to offload the data collection to a back-end machine, and on the front end machine perform only the graphics rendering.

MetaCrawler is implemented not as a web browser running on the client machine, but as a CGI script running on the (back-end) server machine. This allows it do perform its searches and collations very quickly, returning only the end results to the user. This takes the most advantage of the server's fast connection to the internet, the server's more powerful sorting features, and the client browser's familiarity (you can use Netscape), acessibility (you can use it at home), and graphics (you can use your own Mac or PC or whatever you have at home.)

5. References

Separate document

Thesis Proposal · IV Bibliography · Information Retrieval · Online Resources