Data Wins: The WibiData Blog

Catching up with our data…

17 January 2013 By Devjit Chakravarti Comments.

Batch building models used to be enough to get meaningful uplift on customer facing sites. Today, organizations that know how to use precise, real-time, constantly changing data have an edge. Up until now, there’s been an imbalance in power between those organizations who do real-time and those who don’t. With the growing scale and granularity of data, finding the diamond in the rough of big data currently requires some of the brightest minds of our generation, not to mention a fair bit of luck! There is some low hanging fruit and technology that is making it easier to know where to look and make use of real-time data.

Some inspiration...

Google search, for example, is able to create an incredible user experience based almost entirely on the user’s current query. Real-time bidding networks command some of the highest CPMs in advertising as they provide the ability to reach the right consumer at exactly the right time - for example the in-market car buyer that just got off the CarFax website. However, many consumer-facing applications fail to exploit this insight, even as mobile devices and other digitized experiences provide increasingly rich, real-time data about how users are behaving. Current location, time of day, recent purchases and browsing behaviors, and other digital signatures all provide potential insights that are frequently left on the floor.

Why this is hard

The biggest challenge most organizations face is that common approaches to developing products has not yet evolved to take advantage of the opportunity in real-time data. The first step towards becoming data-driven is to start A/B testing. A/B testing, however, has a number of different weaknesses. As Dan McKinley points out in his blog, A/B testing is not particularly well suited to real-time decisions off of data. A/B experiments also tend to treat ‘users’ as ‘the user’: a large homogenous mass with some normally distributed noise around it. This means you cannot understand what makes your users unique: your heavy user who knows your old website inside and out may hate your new UI, but new visitors are more likely to stick around than they were before. It also means that you cannot make product decisions fast enough since A/B testing requires humans to understand the data using statistical analysis.

Catching up

Though some product decisions are complex enough that they will always require human intervention, many can be turned over to machine learning algorithms that can do a better job. Machines are able to consider data at higher granularity and with faster responsiveness. Bandit algorithms, such as those implemented by Myna, automatically overweight better performing versions in an experiment. Google uses similar algorithms to dynamically update advertisement rankings based on recent click through rates. Spam and fraud detection applications demonstrate some of the most promising areas for leveraging real-time data. Catching and blocking an attack while it is underway can substantially limit the financial impact and make it more difficult for spammers to build their algorithms. These systems need real-time analytics to recognize anomalous patterns such as abnormal concentrations of web activity from certain IPs, an increased number of small transactions from a single merchant, or an unusual pattern of password retrievals on a banking site by maintaining materialized aggregates and tracking when a distribution changes substantially. Offer and product recommendation systems, which already use historical trends to generate insight, can also benefit from real-time analytics on fresh data. Understanding which merchants are nearby, what objects were in your cart recently, and when you might be craving that afternoon coffee provide excellent insight for applications about how to target and serve content.

Building for the future

Unlocking relevance from real-time data will require us to develop algorithms that can dynamically understand and learn behaviors and embed them in our applications. The process of creating static applications based on months of product planning quickly falls behind what’s possible if we keep pace with our data. In the future, applications will inherently be built to use real-time data and include built-in capabilities for turning this data into better application experiences. The WibiData Big Data Application Server provides enterprises the power to do this today, with real-time predictive modeling and scalable storage and serving. You can learn more about Wibidata at

comments powered by Disqus