With 50TB of machine-generated data produced daily and the need to process 100PB of data all together, eBay's data challenge is truly astronomical.
This deluge of data is helping eBay to emulate the know-how that customers used to get from a local shop owner; the only difference is, it is trying to achieve this across its global auction sites.
Speaking at the Gartner CRM Summit in London, David Stephenson, head of global business analytics at eBay, said the auction site's goal is to make shopping successful. The company is using analytics to help it understand its customers better. His ambition is to take the kind of personalisation possible in a small shop and apply it to the world of eBay.
Managing the customer journey
Stevenson admitted eBay is starting to struggle to process all the customer journey data.
The big data challenge for eBay is that asking a simple business question such as "What were the top items that showed up in searches yesterday?" involves processing five billion page views. "So there is a huge problem just to ask a basic business question," said Stephenson.
But eBay needs to do more than ask simple questions. Stephenson said the site wanted to run sentiment analysis, network analysis and image analysis, all of which cannot be run in a traditional transactional database.
In terms of customer journey data, eBay used to keep a sample of 1% and throw the rest away, said Stephenson.
"There is a tension, either to impose structure on the huge [web analytics] data set by throwing away data, or keeping all the data collected but not being able to work on it [because it is unmanageable]."
To address this issue, eBay started its second data initiative. Seven years ago, the company began a project to store all its customer data. The auction site needed a product that could handle hundreds of petabytes of raw customer journey data, but would be easy to maintain by a team of five people, yet could be accessed easily by analysts.
The company worked with Teradata to develop a custom appliance built with several hundred user-defined functions. The system was built on commodity hardware, with proprietary software to process all the customer journey data and store it cheaply.
The end result is a custom data warehouse called Singularity.
The system eBay has developed can run ad-hoc queries in 32 seconds. Stephenson said that at the time, Hadoop would have taken 30 minutes to run such queries. "Hadoop may not be best [suited] for business-critical issues such as really understanding your customers," he added.
Along with the enterprise data warehouse and Singularity, eBay is also using Hadoop, which completes the third side of its data analytics triangle. The auction site has built two 20,000-node Hadoop clusters with 80PB of capacity, said Stephenson. These work alongside the Teradata data warehouse and Singularity custom data analytics appliance to give eBay the tools it needs to use data analysis to follow the customer journey.
True value of analytics
Stephenson said Singularity is proving its value in 'A/B testing' on the eBay site, which can be compared with trying different combinations of confectionery at a supermarket checkout to capture impulse buying. This allows eBay to test ideas on the site and assess what works, such as testing whether site visitors prefer bigger pictures in search results.
The technology can also be used to power search hints, a concept Stephenson called "an economist in a box". It is possible for eBay to present search query tips based on topics that power users have already asked. “Just about every question that could be asked has already been asked by a power user," he said.
Such searches enable an eBay seller to determine whether it is best to set a low auction reserve price, whether free shipping matters, and any other possible questions related to selling an item successfully on eBay.