Monday, March 3, 2008

eTech - Live Vast and Deep WebNative Visualizations

Processing is a subset of Java found at processing.org.
It can be used to quickly make data visualizations. It is not as powerful as java, but it has been whittled down to be the most useful parts of java for visualization to make quick prototyping faster and more dynamic. To find lots of source code, search for ‘built with processing’.

Design starts on paper; this obvious for my designer friends. Well, visualization starts in the eye, or the minds eye. We need to begin data visualization by imagining what we dare to see.

The first step in web visualization is to show everything. Don’t waste time trying to figure out the best thing to show, or how to show it. If you can map it to the screen, do it. Get the data to the user (maybe YOU) quickly, then iterate on the visualization until you find what you are looking for.

Displaying data on the screen is cheap and easy. Analyzing the data is expensive and hard. Begin by showing everything, and then use humans to analyze. Humans are better at this when it comes to quick pattern recognition.

Showing everything also limits bias that is naturally going to creep in as you filter down.

Good examples of showing everything are: zip decoder, cab spotting, and many of the digg visualizations.

After you have everything showing, filter down the data, and look for meaningful relationships. Too often developers leave the visualization in the initial form, which makes it hard to use or understand for the user. Simple is often better.

Other interesting visualizations: Oakland crime spotting, digg arc, mappr, trulia.

When displaying large amounts of data, think about representing the data as multiple slices in pixel format. Literally make an image that depicts the data ‘under’ that pixel. These masks can then be used in conjunction with each other to form rich data visualizations. Example: construct a black and white heat map of population over the area you are interested in, also construct a black and white heat map of income over the same region. With these two images, and a map, you can programmatically decide how to display the data at a pixel by looking at each image and querying for the gray value at the point. This has vast possibilities.

If you can count something, you can color it. Duh, however, this is helpful in deciding how to color objects in you map. Tools like Color Brewer help users make good pallets for complex data scenes.

If you want to count text so you can color it, one easy way is to md5 hash the text. This gives you one color for that snippet of text wherever it is used. Two greens WILL NOT be similar text, but the same red as another red IS the same.

Spark lines are neat. Old concept from Tufte, but not used enough.

Neat visualization: IBM History flow.

The second step in web visualization is to identify the objects of interest. People, places, events, locations, costs, weights, etc… Display the objects, draw crude relationships between them, and look for other relationships. Look at the scene in terms of one object, then look at the objects that are related to that object, then keep going.

Examples: graphVis, touch graphs.

The third step is interaction.
Sliders are easy. Easy is good. Pick a data or relationship, and allow the user to see what happens when they can alter it via slider.

What is better than a slider? A Scented Map. A scented map is a slider that displays data itself. A slider with a chart IN it.

Example: Measure Map.

What is better than a Scented map? A play button that allows the slider to animate over its length. You now have animation…

The fourth step is to provide links to the visualization, and allow scripting. If you embed data on how to generate the exact visualization a user is seeing inside the URL, you can use tools like Paparazzi to get screen shots, and animations.

Provide an API to your visualization. Then use it. Make sure your API is robust. Provide multiple return values so users can choose how to use it with performance in mind. Never make the user make extra calls if you don’t have to.

Google quote: Trying things is cheaper than deciding to do them or not.

A note on complexity and technology: If you have a few hundred items, just use HTML. If you have a few thousand, try flash. If you have tens of thousands of items, us a java applet. If you have more, use a thick client. Each of these technologies increases your download time, so decide wisely.

One thing you can do to decrease download time is to not give all the data right away, either feed it to them slowly, or have it start gathering real time data only after starting.

Getting data, web scraping.
A company called Every Block is looking for data on the web that can be used to tell you about your block. They are gathering similar data as Onvia, but may not know how valuable it really is.

There is allot of free data out there, you can go get it and visualize it for free.

Tufte tells of the idea of getting rid of chart junk. His ideas are great for static images, but with dynamic visualizations they are harder to use. Getting rid of junk still works. Don’t show crap on your visualization that is not needed. Make each chart unique to show the data it is designed for. You should rarely use general purpose visualizations.

One last thing, build visualization to answer a question. But also build them to ask a new question. If it does not make the user play, they will not learn, and insight will not be found. If the visualization presents an answer, and does not ask another question, the user will not play.

Stuff for me to do more research on:
Technology, Flare is like flex...
Microforms, mofo…
Atom feed standard…
Visualcomplexity.com…

1 comment:

Muthu SEO Expert India said...
This comment has been removed by a blog administrator.