Alibaba Contest - Call for participation

Repeat Buyers Prediction Competition


Repeat Buyers Prediction after Sales Promotion


IJCAI is pleased to announce a large-scale machine learning competition, hosted by Alibaba Group, a gold sponsor. This competition aims to promote applications of advanced techniques from AI research to real-world problems. Contestants will have access to vast amount of data provided by, the largest B2C platform in China. Top three winners will be invited to present their results at an IJCAI workshop and get a chance to test their algorithms online.

In April 2015, participants all over the world will be invited to play with real transaction data from The goal is to apply advanced and sophisticated machine learning and data mining techniques to predict which shoppers would become repeat buyers after sales promotion. The main differences from most other AI competitions in the past are listed below:

1. A large sales promotion data set for public usage
2. A free distributed computation platform for top teams
3. A great opportunity to deploy algorithms online for winners

It is the time to demonstrate your brilliant ideas in the real world!

Problem Definition

Merchants sometimes run big promotions (e.g., discounts or cash coupons) on particular dates (e.g., Boxing-day Sales, "Black Friday" or "Double 11 (Nov 11th)" , in order to attract a large number of new buyers. Unfortunately, many of the attracted buyers are one-time deal hunters, and these promotions may have little long lasting impact on sales. To alleviate this problem, it is important for merchants to identify who can be converted into repeated buyers. By targeting on these potential loyal customers, merchants can greatly reduce the promotion cost and enhance the return on investment (ROI). It is well known that in the field of online advertising, customer targeting is extremely challenging, especially for fresh buyers. However, with the long-term user behavior log accumulated by, we may be able to solve this problem.

In this challenge, we provide a set of merchants and their corresponding new buyers acquired during the promotion on the "Double 11" day. Your task is to predict which new buyers for given merchants will become loyal customers in the future. In other words, you need to predict the probability that these new buyers would purchase items from the same merchants again within 6 months.

The competition consists of two stages:

  • In the first stage, a data set containing around 200k users is given for training, while the other of similar size for testing. Similar to other competitions, you may extract any features, then perform training with additional tools. You need to only submit the prediction results for evaluation.
  • In the second stage, th top 50 teams from the first stage will have the opportunity to work on a much larger data set on Alibaba's cloud platform. You will need to submit your code in JAVA, then the distributed computation will be handled by the cloud platform.

Data Description

The data set contains anonymized users' shopping logs in the past 6 months before and on the "Double 11" day, and the label information indicating whether they are repeated buyers. Due to privacy issue, data is sampled in a biased way, so the statistical result on this data set would deviate from the actual of Nevertheless, it will not affect the applicability of the algorithm. In the first stage, the data set is available for downloading, while it is not in the second one. Details of the data can be found in the table below.

Data Fields

user_idA unique id for the shopper.
age_range User's age range: 0 for <18; 1 for [18,24]; 2 for [25,34]; 3 for [35,54]; 4 for >=55.
genderUser's gender: 0 for female, 1 for male.
merchant_idA unique id for the merchant.
label Value from {0, 1, -1, NULL}. '1' denotes 'user_id' is a repeat buyer for 'merchant_id', while '0' is the opposite. '-1' represents that 'user_id' is not a new customer of the given merchant, thus out of our prediction. However, such records may provide additional information. 'NULL' occurs only in the testing data, indicating it is a pair to predict.
activity_logA set of interaction records between {user_id, merchant_id}, where each record is an action represented as 'item_id:category_id:brand_id:time_stamp:action_type'. '#' is used to separate two neighbouring elements. Records are not sorted in any particular order.


Evaluation Metric

The Area Under the ROC Curve (AUC), true positive versus false positive is employed as evaluation metric. It can be calculated as (1-e), where 'e' denotes the portion of incorrect pairs (i.e. a negative sample is ranked ahead a positive one). More information can be found at "wikipedia".

Important Dates:

  April 1, 2015: Competition announcement
  April 15, 2015: Competition begins
  May 15, 2015: First Stage Competition ends
  May 18, 2015: Second Stage Competition begins
  June 20, 2015:Second Stage Competition ends
  June 30, 2015: Final result announcement

First Stage

First Prize: 4,000USD
Second Prize: 3,000USD
Third Prize: 2,000USD

Second Stage

Only the top 50 teams at the first stage are qualified for the second stage.
First Prize: 6,000USD
Second Prize: 4,000USD
Third Prize: 2,000USD
The top 3 teams may present their solutions at the IJCAI workshop "Social Influence Analysis", with additional 3,000USD as registration and travelling allowance.

Extra Online Competition

The top 3 teams at the second stage will have the opportuntiy to deploy their algorithms on for the ''Double-11'' promotion, 2015. And the winner will be awarded by 50,000USD.
Remark1: The problem would be related but different from the first two competitions. Detail will be announced before September 2015.
Remark2: Participants at this stage would work onsite as interns for around two months. Besides the award, salary and housing allowances will be also provided.


For more information, please contact Alibaba Contest Organizer: Wenliang Zhong at This email address is being protected from spambots. You need JavaScript enabled to view it." target="_blank">