Reinventing Discovery – the new era of networked science
This book is essentially a plea for open access science. The idea is that more information in the open enables faster, more efficient, and more accurate science to be conducted.
Here are some notes I made – they don’t really represent my own views but more what Nielsen actually says.
Nielsen talks about the chess match – Kasparov v the World. Essentially, Gary Kaspaov (a chess master) was very nearly beaten by the ‘world team’ – a team of people collaborating on Chess moves via the internet. Even though Kasparov was the best player, whose chess knowledge matched everyone else on an individual basis, the World Team had a number of people who were specialists in particular moves in Chess. This is referred to as ‘latent micro expertise’ which meant that their cumulative efforts were impressive.
Similarly, in a large organisation you want to encourage micro expertise. You want to encourage people with a specific ability to solve a specific problem to assist the person in charge of solving that problem. This means getting people out of their silos and at least spending some time collaborating with others in the organisation, ideally over some sort of platform.
He also uses the metaphor of uranium. Uranium is unstable, and neutrons fly out occasionally. Below a certain mass, neutrons fly out of uranium atoms. Below a certain mass, these neutons don’t tend to collide with other atoms. However, above a certain mass, they DO collide with other uranium atoms, which cause more neutrons to fly out of those atoms. Those neutrons in turn collide with other atoms, which causes more neutrons to fly out, and eventually the whole thing blows up. This is called critical mass. Nielsen posits that intelligence is similar – the more information that is publicly available, the more people will latch on to it, and the faster ‘explosion’ (ie the creation of new ideas) can occur.
‘In most cases, what makes a creative insight important is precisely the fact that it combines ideas that were previously thought to be unrelated.’
Why is online collaboration different to committee work? Well, committees work at the pace of their slowest members and can be stifled by vetos from the most obstructive or annoying people. In online collaborations, you can just ignore these people. Also, committees tend to consist of people who were forced to sit on them, whereas online collaboration, when done well, will encourage people who are already passionate about a particular subject. Online is better than offline collaboration because of scale and diversity of ideas and skills.
However, if an online collaboration is large scale, you want to construct a system that will direct people to the sub-section of the collaboration where their knowledge is most useful.
Open source software is a good example of where this has arisen. Linux is a piece of opens source software which has reached incredible extents. They focus relentlessly on modularity – Linux is broken down into individual modules which people can contribute to depending on their specialism.
Firefox has found a good way to solve problems – via an issue tracker. People can report issues and if enough of them do so it rises up a sort of league table, and people are encouraged to contribute to solving it. You can also use the issue tracker to suggest new developments. This central area of discussion encourages people to target their expertise directly.
Although you still have the problem of directing expert attention – how can you be sure that people are contributing to something where their skills are most needed?
One way that this was done was in the Mathworks competition. Essentially, Mathworks creates a competition every year where people can build a program to solve a particular issue. The example Nielsen gives is from 1998 - competitors were asked to design a program/formula that would select from a list of songs the number of songs that could most closely be put together in order to fill up as much of the length of a 74 minute CD as possible.
Mathworks designed a process wheeby each entry could be immediately judged by a secret formula and awarded as a score. In addition, the code for successful entries was immediately published. So people could see which entries got the highest score, work out what had changed with thatentry, and then finesse that particular aspect. This meant that expert attention was directed in the ways intended by the designers of the competition.
‘ If a participant in the Mathworks competition was stuck for ideas, they only needed to wait for a few hours tofind new ideas to stimulate and challenge them.’
Re politics
‘Group discussion actually makes people’s political decisions worse than they would have been if they had made those decisions individually.’
Groups focus on the information they all hold in common, not the information that they individually hold. Also they focus on knowledge held by high status group members and ignore the knowledge held by low status group members.
Most people do not consider politicians by building up a complete picture of their positions – we judge on how key aspects of their politics affects our interests. This is because there is no ‘shared praxis’ – ie. consensus on what position evinced by a politician is ‘good’ or ‘bad.’ This differs from the Mathworks competition where everyone as agreed that a higher score was good. So in order to have a successful collaboration, we need a shared praxis.
Other
Due to a shared praxis ‘In science it is often the people with the best ideas who win out, not the people with the most power.’
It is worth attempting to propagate this idea. What if it were possible to ‘match’ tasks across an organisation to people in that organisation? An algorithm could select tasks and allocate them to the person who had the greatest success in solving such problems before.
Nielsen posits that there are many unrecognised connections in science. You can use machine learning to trawl through endless amounts of data and recognise the connections between two previously unrelated things – ie. what Visulytix did with scans of eye retinas and identifying retinas that were diseased.
Google search queries can be used to predict which songs will top the chart and which stocks will do well. This is an example of identifying links between two previously unstudied data sets. What about use in political campaigns??
Data commons
‘’An unaided human’s ability to process large data sets is comparable to a dog’s ability to do arithmetic, and not much more valuable.’
Most data should be made public, in order that such patterns can be identified by computers. However, this means rea data – the people publishing it should actually be attempting to explain the data to others, and acting out of willingness rather than mere compulsion.
‘Today – the data web is in its earl days. Most data is still locked up. To the extent data is shared, many different technologies are being used to do the sharing. The open data sets that are available remain mostly unconnected to each other, still living inside their separate silos.’ This is very important – perhaps the most important sentence in the book. What is such data could be combined into a single set and then read by computers??
‘Imagine having the genome of newborn children immediately sequenced, and then correlated with a giant database of public health records to determine not just what diseases they are susceptible to…but also what environmental factors might increase their susceptibility to disease. ‘Your son has an 80 per cent chance of developing heart disease in his 40s if he is sedentary in his 20s and 30s. But with 3 hours exercise per week that probability drops to 15%.’
Practical steps
Most science journals were behind paywalls (at least when Nielsen wrote the book). This is a problem, as described earlier. Scientific info should be free.
Public Library of Science is a good example of where info has been made free. ArXiv (pronounced archive) has done this for physics preprints (ie. first drafts of journal articles. This enables people to see the latest developments in physics for free. It would be great if this were expanded to other fields.
The problem with open science is that scientists themselves have no incentive to make their data public, and lots of incentives to keep it secret. They also don’t have the time to publish online because the scientific community places greater prestige on journal articles.
Here is how this can be fixed
1. Compulsion – make people publish data online and for free if they want scientific grants.
2. Incentives (otherwise people will just do the bare minimum to get the grants). Come up with a process in which you can easily measure online citations and store it centrally. If you publish something online, and get cited a lot, this should show up in some sort of database which people use when considering scientists for job applications etc.
3. Raise public awareness – the public needs to know the importance of open science and push scientists to abide by it.