Web 2.0 Expo: Data Jujitsu

by October 15, 2011

In his Data Jujitsu: Turning Data Into Product talk at Web 2.0 Expo in New York, DJ Patil talked through the process of turning data into personalized products. Here’s my notes from his talk:

  • Data products need to facilitate an end goal. What does it take to create a data product? Often way too much work and thinking. We need a new approach: data jujitsu.
  • Data Jujitsu takes a big data problem and flips it in a way to create something that people can relate to and get feedback on fast.
  • First consider what data you are going to start with. Is it structured or unstructured?
  • Unstructured data is harder to work with. Open text fields in forms are can cause issues. There are between 4 and 8 thousand variations of IBM and “Software Engineer” in LinkedIn’s database.
  • The fastest way to clean up data is to ask people to fix it for you. You can often do this with suggestions at the point of entry. It is usually 100x cheaper to ask people then to try and process all this information on the server. Turning a big data problem over to users is an example of re-thinking big data problems.
  • Build the easy products first. Collaborative filters are the easiest to make. Collaborative filters drive page views: 3-12 times page view lift by eliminating dead ends.
  • Give data back, take the data and present it back in useful ways.
  • Avoid data vomit. When there’s too much data on the page, it paralyzes users. The amount of data on a page is inversely proportional to how much people interact with it. The richer the environment, the greater the paralysis is.
  • Move people through a data product lifecycle: incrementally across data vs. all at once.
  • Data products are hard to budge when they get into a user’s lifecycle. For example, your shopping recommendations get stuck in a particular category. How do you un-budge them?
  • Set user expectations: grace of failure can be managed. Data products should not set themselves up as overloads. Set the expectation that data products can be tuned.
  • LinkedIn put a social twist on job recommendations but allowing friends to forward job referrals to others. This took the burden off the system to get everything right.
  • Get lots of people to look at the data and report back crappy results. You need a graceful fail-out to cover situations where you’ve guessed wrong.
  • Know when you need to move your lightweight testing to a full-blown suite of technologies. When you have a hit, build serious support for it. Know when to leave the jujitsu behind.