One-Pagers‎ > ‎

OP11: Locality

LinkedIn maintains statistics on the professional (social) networks of its about 200 million users. Among these statistics, there is an approximate count of the number of people within three degrees of separation from the logged in user. This number is updated on a time frame of hours.

Design a system for computing/maintaining these three-degrees of separation counts for each user. Keep in mind that social networks are graphs that tend not to have small cuts. Also keep in mind that this is not exactly a mission-critical feature of LinkedIn, so the cost of the infrastructure for performing this computation and maintaining the results should be reasonable. Consider this also from the viewpoint of locality: can massive parallelization be avoided? Make an estimate of the yearly cost of this infrastructure, taking hardware and software, but not personnel costs into account. Which computations are done offline (and when), which online, and what is stored?

You may use any resources you can find on the internet, such as algorithms papers and information on how companies like LinkedIn address such problems (but you must indicate which sources you used and must not collaborate with other students in the course). You do not need to find out exactly how LinkedIn does it, but try to find a reasonable solution.