The internet is a very large place, and no one really knows how many individual pages exist on the web, but estimates place that number around the 14 billion mark, which when visualized, might look something like this. What was also unknown until recently is how these webpages are connected. Research by Albert-László Barabási published in Philosophical Transaction of The Royal Society A examined the connectedness of the internet. Using tools of network science, Barabási sought to understand the internet’s structure, and he realized that the majority of the 14 billion or so pages (including every image, video, or file on each page, for a total of 1 trillion web documents) are poorly connected. Interestingly he found that throughout the internet, there are a few pages (search engines, indexes, and aggregators) that are very highly connected. These pages are nodes or hubs in the internet, and they allow any user to navigate from one point to almost any other point on the internet in less than 19 clicks! Below Barabási describes network science.
The connections between websites and the 19 clicks has been compared to 6 degrees of separation, a theory that states that everyone is six or fewer steps away from any other person in the world (Read this interesting story about a Titanic survivor and 6 degrees of separation). Back in 2008 researchers at Microsoft confirmed the theory by examining 30 billion electronic conversations among 180 million people in various countries and they worked out that any two strangers are, on average, distanced by 6.6 degrees of separation. The 6 degrees of separation theory can be demonstrated on a smaller scale by examining the career of Kevin Bacon, and playing 6 degrees of Kevin Bacon, wherein any actor can be linked to Kevin Bacon through their film roles within 6 steps. Google has since taken some of the fun out of the game by adding a “Bacon number” to their search options, which will return the degrees and the path that any actor is away from Kevin Bacon, simply by typing their name followed by “bacon number” into the search bar.
But there is still fun to be had with degrees of separation on the internet, enter Wiki Games. The objective of a Wiki game is to use the Random Article link on the Wikipedia main page to generate two random articles, then navigate from one page to the other in the fewest number of clicks (via links on the page). One of the popular variations of this game is 6 Clicks to Hitler, which challenges the player to start on a random page, then follow the links on the page until they reach the page for Adolf Hitler, preferably in fewer than 6 clicks.
For example, Oscillaria >> cyanobacteria >> nitrogen fixation >> Haber process >> Fritz Haber >> World War I >> Adolf Hitler (seven clicks, try again!). Julius (chimpanzee) >> Norway >> World War II >> Adolf Hitler (four clicks!)
While these types of exercises are mainly for fun, there is some importance to understanding how networks and separation work. Network science can be used to describe biological and artificial systems, such as bodily organs, people, bus stops, companies, countries, and ecosystems (check out this cool documentary). The various components of these systems connect and interact to create webs or networks, and understanding these connections and networks, can sometimes help us to improve (or at least maintain) their function, or prevent their failure. For example, work on understanding social networks has shown that having a strong social network can greatly improve your chances of survival during a disaster. Network science can even be applied to how we think about food.
Barabási and colleagues are using networks to learn more about the way we eat the food, particularly with respect to the food-pairing hypothesis, which states that ingredients will work well together in a dish if they share similar molecular compounds. In their 2011 paper, researchers led by Yong-Yeol Ahn, wanted to know whether there are any quantifiable and reproducible principles behind our choice of certain ingredient combinations and avoidance of others. The authors examined a large number of recipes (56,498 provided from epicurious.com, allrecipes.com and menupan.com), to try and discover any patterns that may transcend specific dishes or ingredients. Using information available about the many chemical compounds responsible for giving different foods their distinctive smells and tastes, the authors created a flavor network, with foods linked by their shared flavor compounds, e.g., shrimp and parmesan are connected because they contain the same flavor compounds, like 1-penten-3-ol.
The authors further grouped the recipes into geographically distinct cuisines (North American,Western European, Southern European, Latin American, and East Asian). They note that the average number of ingredients used in a recipe is around 8, and that the popularity of specific ingredients varies over four orders of magnitude! Interestingly, North American and Western European cuisines exhibit a statistically significant tendency towards recipes whose ingredients share flavor compounds (supporting the food-pairing hypothesis), whereas East Asian and Southern European cuisines avoid recipes whose ingredients share flavor compounds, which disputes the food-pairing hypothesis (e.g., soy sauce, scallions, and sesame oil, share hardly any flavor compounds but are commonly combined in East Asian cuisine-see figure on left). The differences between regional cuisines can be reduced to a few key ingredients with specific flavors; for example, North American food heavily relies on dairy products, eggs, and wheat; while East Asian cuisine is dominated by plant derivatives like soy sauce, sesame oil, rice, and ginger. These differences imply that these cooking styles are quantitatively different, and that the food-pairing hypothesis is not the grand unified food theory many had hoped.
As a quasi-related challenge to the above food connectedness discussion, consider the incompatible food triad. The incompatible food triad is a challenge to find 3 foods that are acceptable (broadly defined) when paired separately, but when combined together are terrible (i.e., if A, B, and C are foods; A+B, A+C, and B+C, are all good, but A+B+C is terrible). It is quite challenging, place any suggestions/solutions in the comments below.
As important as it is to understand how things are connected, it is also important to understand how things are separated. Separation science has large implications, be it for determining the difference between species based on genetic separation, the composition of an unknown sample through chromatographic means, or the innocence of person. Separation is cool.
Both network and separation science benefited and came into their own with the advent of big data, and the ability to process that data. Advancements in the ability to process large data sets come through efforts like those used in the quest to find new prime numbers. While the quest to find new primes (or play Jeopardy!) might not in itself seem useful, the spin-off applications of that process are amazing and are connected to many more applications, some of which haven’t even been imagined yet.