All science is about extracting relevant information from data and making new discoveries. The 2016 ICT Focus offered answers to the question of how data science and e-infrastructure can help with this.
Companies such as Google and Facebook show very clearly how much value can be extracted from data when data science and e-infrastructure are put to use at the highest level. In the academic community, too, the strategic importance of these “production factors” for excellence in research is undisputed. Today’s interdisciplinary and multidisciplinary knowledge processes depend on the availability of unlimited data and computing power. Islands, isolated data silos and network bottlenecks have no part in the modern IT landscape.
In his keynote address under the heading "The Revolution in Experimental and Observational Science and the Resulting Demands on e-Infrastructure", Tony Hey, Chief Data Scientist at the Scientific Computing Department of the UK Science and Technology Facilities Council (stfc.ac.uk), shed light on the current situation for scientists. As well as mastering experimental and theoretical know-how, they must also process the vast quantities of data that are recorded by instruments and output by computer simulations and sensor networks, all of which are being continually refined and thus producing ever more data. The amount of digital data we have is now so gigantic that managing it has become a science in its own right: data science. There is a new trend in the academic world for fusing specialist branches of science with computer science.
Hey stressed that dealing with Big Data places special demands on the infrastructure of research institutes and universities. All of the digital production factors are brought together in a single platform dubbed e-infrastructure. The Science and Technology Facilities Council defines e-infrastructure as follows: E-infrastructure refers to a combination and interworking of digitally-based technology (hardware and software), resources (data, services, digital libraries), communications (protocols, access rights and networks) and the people and organisational structures needed to support modern, internationally leading collaborative research, be it in the arts and humanities or the sciences. Hey shortened this definition to a simple formula: e-infrastructure = compute + data + networking + tools and services + people.
Hey called for European research networks to adopt an end-to-end network architecture along the lines of the Science DMZ so that scientists from all over the world can have access to the high-end resources of Europe’s leading research institutes.
Where does Europe stand in terms of e-infrastructure? Hey, who was Corporate Vice President at Microsoft Research from 2005 to 2015, sees the US as clearly out in front. One example he cites is the National Science Foundation (NSF) task force project Campus Bridging. The aim of this project, launched in 2009, was to network campus infrastructures such that scientists could use all of the connected infrastructures’ functions as if they were part of their own campus infrastructure. This gave rise to the Science DMZ, a high-speed data network with dedicated high-performance data transfer nodes. Thanks to the NSF’s funding, the Science DMZ is now available at more than 100 universities. Hey called for European research networks to adopt an end-to-end network architecture along the lines of the Science DMZ so that scientists from all over the world can have access to the high-end resources of Europe’s leading research institutes.
Olivier Verscheure’s keynote address presented a new joint project of EPFL and ETH Zurich: the Swiss Data Science Center (SDSC, datascience.ch). The Belgian has a doctorate from EPFL and worked at IBM research centres in the US and Ireland for 17 years. He became head of the SDSC, which is still being set up, in 2016.
Verscheure described data science as a "fragmented ecosystem" made up of various disciplines: data mining, statistics, machine learning, operations research, visualisation, visual analytics, data management, algorithms and more besides. The challenge, he said, lies in using data science to solve "real problems", i.e. the kind of problems that arise in specialist scientific contexts. The SDSC was created to close the gap between data science and the specialist sciences and thus contribute to progress in the latter. Its main task will be to "federalise" domain experts, data providers and data sciences on a shared platform. Verscheure aims to recruit an interdisciplinary team of 30 to 40 data and computer scientists by 2020.
Swiss science needs more experts who can analyse Big Science Data. The ETH Board has responded to this need by setting up the SDSC and launching Master’s courses in data science. In parallel with this, infrastructure initiatives are essential to ensure that growing volumes of data can continue to be transferred quickly and securely within the community. Both speakers made reference to SWITCH’s expertise and achievements in this regard, praising services such as SWITCHengines as key components in Big Data scenarios. SWITCH is also represented on the SDSC’s steering committee by its Managing Director Andreas Dudler. Both research experts made it clear that they welcome cooperations with the industry.