Friday, March 21, 2008

Thursday, March 6, 2008

eTech - 2008 Presentation Slides

O’Reilly Emerging Technologies 2008 Presentation Slides can be found at: http://en.oreilly.com/et2008/public/schedule/proceedings/

eTech - Four Hour Work Week

-80/20 principal:
First, try to identify the 20% of the things you do during the week that take 80% of your time. For the web, you can use tools like rescueTime.com
From this list, make a NOT to do list, and attempt to not do those things.
During the day, once an hour or two, ask yourself, "Am I busy? Am I being bothered? Am I creating work to prevent me from doing what I should be doing?"
Second, try to identify the 20% of measurements that track 80% of your success.

-Attention dependency on time:
Constant partial attention can actually cause ADHD. Don't pay attention to more than one thing at a time. Set up a firewall so you can focus on a single task. Don't think about things that you cannot affect today. Don't read work email while on vacation. When trying to work, think about work, when trying to relax, think about things that you like.

-Error based performance:
We don’t like to fail at anything, but it’s ok.
What is the one thing that you could do that would change everything for the better?
What are the small tasks that seem like they have to be done right now?
Would it be ok to fail any of the small tasks to get closer to the one big thing?
It’s ok to fail the little stuff, as long as you move toward the big stuff.

More can be found at: fourhourworkweek.com

eTech - Ensemble Learning, Better predictions through diversity


Presentation by Todd Holloway

Ensemble is the process of using multiple supervised learning models to make a prediction. This talk is arguing that using multiple types of predictor models turns out more statistically correct results.

Relating movies based on user recommendations does not work will for relatedness because we don't have enough data on all movies.

Netflix prize: 17000 sample movies, millions of sample ratings. One million dollar prize for a ten percent improvement on current Netflix model.

Using multiple models decreases error as long as they are independent decision makers.
To get independence and diversity, we use different relatedness measures for each model.

This adds complexity but gives better results, which is a violation of Ockham's Razor.

AdaBoost is the process of trying a classifier, testing it, take the incorrect results, and using them to train a new classifier. Unfortunately, this emphasizes noise.

www.abeautifulwww.com for slides and more info.

eTech - CouchDB from 1000ft

Presentation by Damien Katz

Website

CouchDB is an EASY database. A simple way to store data.

When designing a relational database, you are designing a large data structure. With CouchDB, you are just storing data.

Documents are complete units of data that is not broken up. Example, business card. This means that documents may be out of date. In a relation db, this is the worst thing that can happen, but in the real world, we deal with it all the time.

CouchDB is in JSON, similar to XML, but easy to read and write.

CouchDB is supported by all major languages and does not require a data access layer.
Communication is across HTTP API.

Indexes are built incrementally with map reduce over the tags.

Data is replicated across machines via peer based replication.
Conflicts are taken care of by the db, a winner and loser change is chooses consistently on all machines.

If you are interested in distributed programming, look into 'Erlang'. CouchDB is written in Erlang because it makes distributed programming a snap.

Read writes can happen at the same time; the reader will get the older value.

20,000 concurrent users running on a laptop. This works because Erlang does not use OS processes, they use their own lightweight process.

Comes with Lucene integration for full-text search. But any search tech could be used.

To access the db, you write javascript/ajax.

Largest replication so far= 5gb, 400,000 documents.

Not yet ready for use in production. But people ARE using it.

If a relational db model is what you need, then use that...

Wednesday, March 5, 2008

eTech - Visualizations beyond RSS and LavaLamps (Tripledex)

You NEED to have a question to have a good visualization.
Even a Taro reader will demand a question.

You can make visualizations that show everything, but these usually end up being used esthetically. This is contradiction with a speaker earlier in the conference, so we will need to think about the context in which each was speaking.

Get answers to these questions before making visualizations:
Do you want broad or deep, trends or targeting?
How much data do you want/need?
Static or dynamic? Alerts? About what?
Speed of data. Pipette or firehouse?
Lumpiness of data
Dimensionality of data
Do you need to combine relationships?

The tools you use to show data or filter data are not interchangeable.
Just like tools around the house, you would not use a lawnmower to cut bread, you would use a knife.

They are showing a technology called Tripledex that can manage around 100,000 relationships.
Email jnhq@yahoo.com to get access to the demo

eTech - How to Kick Ass


Passionate users are passionate about what they are good at.

If companies provide the ability for users to be good at out program, they will be passionate about it.

Neurogenesis...your brain can keep changing.
Dull cubicle kills the brain.
World class ability is not about talent, it is about putting in the time.
You need a 'rage to master'

6 Expertise Hacks

-1 Exploit your telepathy ... mirror neurons
Our brain can learn just from watching actions
We jump out of the way when people see someone else get hurt
We can feel emotion from seeing a face
We can simulate another persons brain when we watch them
This is more effective if we have DONE what we are watching
This is the science behind visualizing to get better
It works better if you visualize what they would actually see, rather than 3rd person

-2 Reduce interference
Don't think about what we have to do
Tell dumber part of the brain to shut up
Doing with Images makes symbols

-3 Manage your fight/flight
Get a 'the stress eraser' StreesEraser.com, Amazon

-4 Learn about your brain
Legacy brain is trying to stop you
It says ruby is not important to life/death

-5 Exercise your
Brain age is ok, but REAL exercise is more important

-6 Find the time
The twitter curve is messing us up.
We need to find time to practice what we want to be good at, don’t waste time with stuff you don’t think is important.

eTech - Elephant programs


Elephant is a glimpse at how programming languages will behave in the future. The more we know about where we are going, the better prepared for it we will be.

Elephant programs are faithful 100 percent.

They never forget
"Passenger has a reservation - compiler makes the db or array"
"Does the passenger have a reservation?"

They interact with other persons
"You have a reservation on flight UA 522 today at 7:35 pm"
This speech act, if authorized, creates an obligation

Features-
Communication inputs and outputs are meaningful speech acts.
A promise will be expressed by a string of symbols, but the meaning is semantics is a promise, not a string.
Correctness of a program is partially defined in terms of performance of speech acts.

We can look at programs to have beliefs. A thermostat can believe it is too cold, too hot, or OK. It does not have a consciousness, but it does have beliefs.

Programs can be represented as sentences of logic.

They interact non-trivially with the outside world. They have input/output AND accomplishment specifications.

A program can have internal promises.

ABRUPT END
The speaker was cut off, we will need to look into Elephant on our own.

Tuesday, March 4, 2008

eTech - Predicting Markets and the Flow of Information (Google)

Prediction market presentation by Bo Cowgill from Google

One way for management to get data about the company is to ask its employees their opinion. Problem is, when they ask, we tend to tell them what they want to hear.
If you let people predict anonymously, they will tell the truth, because there is no down side, and they might win something. (A tee-shirt or their name on a leader board)

Google has prediction markets rolling with over 80,000 trades currently.
They have found that the bias of optimism from new employees causes a safer bet to be against Google.
Most people bet in the middle of five options, and the middle often lost. It would have been safer to bet high or low.

Information flows around a company.
Knowing data about employees, and their trading habits can tell us who was talking to whom.
They found that the biggest factor was the location employees are sitting.

This is ironic, because Google is trying to make non-local communicate easier.

Software like news futures can be used for creating the prediction market.

eTech - Next generation of online gaming (Sun Microsystems DARKSTAR)

One of the issues companies that deal with many users and lots of data, is that the DB and DB access machines need to be built out for max users.

Blizzard went from a game company to a service company.
Webkinz is a toy company becoming a social game company.
Another example: Club penguin, Habbo hotel

Scaling is the major issue. It’s hard.

Currently games break up the world to simplify the problem. This is bad because it makes for a constrained game design. Further, this causes a waste of server time and energy. Not all groups are playing at peak. Further, it is complexity the user does not understand. They just want to play with their friends.

If you shard incorrectly, you have empty servers or over-capacitated servers.

$30-$90 million to make an online game, over 3-4 years.

Sun wants to make architecture to help small companies make great games without the need to shard with 'Project Darkstar'. More info http://www.projectdarkstar.com/

This is an application server for games. OS agnostic, game agnostic. But it would work for non game applications that require lots of data transactions, and might give us a glimpse into how data access will be in the future.
The technology is under gpl, but sun will make commercial licenses if asked.

Games are multiplayer, but characters are actually agnostic. Let’s distribute their actions without the developers needing to know about it.

Games are event driven, and have small tasks per action. What is hard is if two characters have contention for the same resource. All tasks are transactional. Use this to find conflicts and allow one character to get the resource and one does not. They are moving competing characters to the same server to optimize scheduling transactions.

All data and communications will go through darkstar. Abstracting the developer and client away from the data access.

The program can think of this as a single thread and a single machine.

If a box goes down, play continues. The players are not connected to a box; they are connected to a channel.

This also allows you to use the same machines for MULTIPLE concurrent games.

Go here for more info.

eTech - Practice Makes Perfect

Presentation from Peter Norvig from Google
How billions of examples lead to better models of images and text

How things are traditionally figured out.
Look at world, think about data, and figure out a model to express the world.
Problem is that this is hard and the model will be wrong.

Instead, let the data do the work.
Computing power is making it possible to make more complex algorithms because we can easily test bad algorithms to find the good ones. Example- image resizing

More data is also allowing this. Example- scene completion

For finding similar images, do a search based on keyword, see the images, user an algorithm to find similarities in photos. Use the Eigenface and SIFT features to find commonalities in images. Then rank the found images by what links to what, not on how often they are linked.

For text, grep the data to find words that are in proximity, or look in structured data, and use probabilistic models to guess the most probably answer. Example- Google Sets

Engineers later dropped the probabilistic model in favor for a liner model. They have moved away from something they can prove, and into something that can observe working.
They have optimized for translating news.

Bayesian: want argmaxc P(c|w), but model argmaxc P(w|c) P(c)

see: How to build a spell checker

Sun SPOT Java Development Kit

This is neat...

Sun SPOT

It is a small computer with a:
- a radio transmitter
- 2 accelerometers
- 6 inputs
- 5 outputs
- a battery power pack
- a java micro-controller
- a light sensor
- a temp sensor
- an 8 led display

You can use it to do almost anything. Here are a few obvious examples:
- Control a lego car wirelessly
- Control an on screen avatar wirelessly
- Track your day to day movements
- Turn on your house lights/heat/music by voice or time
- Alter a web-cam via the web

If you have a good open source project, they will give you some for free!
If you are a student you get 2 for $300, else they are 2 for $700.

Monday, March 3, 2008

Prediction theory

If a boss asks a manager when a product will be ready to release, the manager will say 2 days... because s/he does not want to look bad.

If, instead, the boss says, hey, whoever predicts the day closest to the day the product is ready to release will win $500.00 (or whatever), the manager will guess 2 weeks.

In the later example, the boss gets the more realistic answer because the manager wants to win and does not feel that there is an expected answer.

eTech - Debugging Hacks 2

Good habits for debugging:
- Keep a log of everything you observe/change
- When you get an error message, Google it
- Look on forums that Google does not index
- Graph data over time, make a truth table / chart to visualize the problem
- Log the server AND client time in web logs
- Look in bug db for similar bugs
- After having a repro case, if you are still having issues fixing the bug, try to find ANOTHER repro case
- Reread and update the bug every day you work on it
- Take baby steps, "If you cannot see land, can you see birds?"
- The worse the bug, the more logging you need
- Get on a mailing list / news group
- Recheck assumptions
- Go back to code that works, and start taking diffs
- Iff needed use binary search debugging (Spolsky)
- Explain the problem to someone else
- Get more eyes for the problem, present to a group
- Has there been a hardware change, or simple change to the environment
- Go home if you are done for the day (death march != fixed bugs)

eTech - Debugging Hacks

This seems obvious, but I'll write it down...

For really bad bugs:
0- Try to fix the bug quickly
1- Revert any changes you made trying to fix the bug quickly
2- Collect data from each component, logs, etc...
3- Reproduce the bug and automate it
4- Simplify the bug conditions when possible
5- Look for connections and coincidences in the data
6- Brainstorm theories and test them
7- When you fix the bug, verify against the report
8- Make sure fix does not break other code

Bug tracking notes:
Break data for a bug into three categories, and log them correctly.
1- Facts
2- Questions
3- Theories that turned out to be wrong