Univ. of Waterloo, Waterloo, Canada
University of California, Santa Cruz
Microsoft Research, USA
(Complutense University of Madrid, Spain)
(University of Delaware, USA)
(McGill University, Canada)
(Queen's University, Canada)
Zhen Ming Jiang
(Queen's University, Canada)
(Vrije Universiteit Amsterdam, Netherlands)
(University of Victoria, Canada)
(University of Lugano, Switzerland)
(University of Victoria, Canada)
Co-located with ICSE 2010,
Cape Town, South Africa
MSR 2009 – Vancouver
MSR 2008 – Leipzig
MSR 2007 – Minneapolis
MSR 2006 – Shanghai
MSR 2005 – Saint Louis
MSR 2004 – Edinburgh
MSR Mining Challenge 2009
MSR Mining Challenge 2008
MSR Mining Challenge 2007
MSR Mining Challenge 2006
The MSR 2010 Prediction Challenge is extended
2 days! Submit your predictions by February 22 and you could win a Zune HD!
Mining Challenge Deadline has been extended!
A CREX (CTags based) extraction of the FreeBSD project is added!
We've put up a parsed version of the FreeBSD bug database!
Since 2006 the IEEE Working Conference on Mining Software Repositories
(MSR) has hosted a mining challenge. The MSR Mining Challenge brings
together researchers and practitioners who are interested in applying,
comparing, and challenging their mining tools and approaches on
software repositories for open source projects. Unlike previous years
that have examined a single project, multiple projects in isolation,
or a single distribution of projects (GNOME). This year the MSR
challenge involves examining FreeBSD? operating system and
distribution, the GNOME Desktop Suite of projects, and the
Debian/Ubuntu Distribution Database. The emphasis this year is on how
the projects are inter-related, how they interact and possibly how
they evolve and function within a larger software ecosystem.
There will be two challenge tracks: #1: general and
The winner of
each track will be given the MSR 2010 Challenge Award.
Challenge #1: General
In this category you can demonstrate the usefulness of your mining
tools. The main task will be to find interesting insights by analyzing
the software repositories of the projects within FreeBSD, GNOME
Desktop Suite and the package related meta-data of the Debian/Ubuntu
FreeBSD is a BSD license BSD Unix distribution. It includes packages
for desktop, server and embedded uses. FreeBSD also takes
responsibility for porting many programs to its distribution via
GNOME Desktop Suite of projects. GNOME is very mature, and composed of
a number of individual projects (nautilus, epiphany, evolution, etc.)
and provides lots of input for mining tools.
The Ultimate Debian Database (UDD) is a database of packages, package
dependencies and related bugs. It describes the Debian and Ubuntu
One could examine multiple projects within these ecosystems. For
instance, examining API usage across all projects, training a
predictive model on one project and assessing its accuracy on another,
or examining how developers' activity spans multiple projects.
Participation is straightforward:
- Select your mining area (one of bug analysis, change analysis, architecture and design, process analysis, team structure, etc.).
Get project data for multiple GNOME projects, FreeBSD? or the UDD
Formulate your mining questions.
Use your mining tool(s) to answer them.
- Write up and submit your 4-page challenge report.
- Within the report you should clearly summarize what your contribution is, including what you found and its importance.
The challenge report should describe the results of your work
and cover the following aspects: questions addressed, input data,
approach and tools used, derived results and interpretation of them,
and conclusions. Keep in mind that the report will be evaluated by a
jury. Make sure your contributions, purpose, scope, results and
importance or relevance of your work is highlighted within your
report. Reports must be at most 4 pages long and must be in the IEEE CS proceedings style - Two Column Format.
The submission will be via Easychair
report will undergo a thorough review, and accepted challenge reports
will be published as part of the MSR 2010 proceedings. Authors of
selected papers will be invited to give a presentation at the MSR
conference in the MSR Challenge track.
Feel free to use any data source for the Mining Challenge. For your
convenience, we provide repository logs, mirrored repositories,
bugzilla database dumps, and various other forms of data linked at the
Challenge #2: Predict
This year, the MSR Mining Challenge prediction will involve predicting
the the final bug number within Debian
on April 30th, 2010. We
want you to predict the newest bug number to appear on April 30th.
Participation is as follows:
Prediction submissions will be scored by their distance from the last bug number that occurs on April 30th 2010.
- Pick a team name, e.g., WICKED WARTHOGS, BAD BIRDS, etc.
- Come up with predictions for the final Debian bug report number as of April 30th based on some criteria or prediction model. A very simple model, for instance, would be the amount of growth in the past three months.
- Predict the final bug number of Debian at the end of the day on April 30th, 2010 (according to their server time (UTC))
- e.g. as of Date: Tue, 13 Oct 2009 23:21:01 UTC there were 550906 was the largest bug number:
- Write a paragraph (max 200 words) that describes how you computed your predictions.
- Submit everything before Feb 20th (Apia time) by email to email@example.com
Frequently Asked Questions
Do I need to give a presentation at the MSR conference?
For challenge #1, the jury will select finalists that are expected to
give a short presentation at the conference. Then the audience will
select a winner. For challenge #2, there is no presentation at the
conference. The winners will be determined with statistical methods
and announced at the conference.
Does the challenge report have to be four
No, of course you can submit less than four pages. The page limit was
set to ease the presentation of space-intensive results such as
Wow, the data set is soooo big! My tool won't finish in
time. What can I do?
Just run your tool on a subset of the projects. For instance, you
could examine only the nautilus file manager and the epiphany web
browser. Especially when you are doing visualizations, it is almost
impossible to show everything.
My cat is a visionary...can I submit its predictions or is the
challenge #2 only for tools? Of course, go ahead and
submit its predictions as a benchmark. However, your cat will run out of
competition—only predictions generated by tools or by humans in
a systematic way are eligible to win challenge #2.
For the prediction challenge, can random guesses also win? If the randomness is systematic then it is allowed, if the randomness is
human generated it is allowed. In general it must be systematic randomness.
For the challenge
#2-prediction, is it acceptable if our team submit more than
prediction? Only one submission from a team (person) is
Do I have to attend in order to win the prize for either challenge? Yes you do or someone must attend who can pick it up for you, we want to
avoid the complication of shipping prizes around the globe.
Note: All deadlines are 11:59 PM (Apia, Samoa Time) on the dates indicated.
| Submission of reports: || |February 6th, 2010 February 7th, 2010
| Submission of predictions: || February 20th, 2010|
| Author notification: || February 20, 2010|
| Camera-ready copy: || March 12, 2010 |
| Conference dates: || May 2nd-3rd, 2010 |
- The efforts of Christian Bird which made this challenge so much easier to run.
- The efforts of Israel Herraiz for parsing the email databases.
- The efforts of Emad Shihab for parsing the version control systems.