terça-feira, 28 de junho de 2016

[284] CHINA SUPERCOMPUTERS: CHINA NOW [jun2016] HAS MORE OF THE WORLD'S FASTEST SUPERCOMPUTERS THAN THE USA (93 PETAFLOPS)

China Five Millenium Culture preserves the Forbiden City and produces the fastest supercomputer in the World. XXI Century is the Era of China.

CHINA’s
SUPERCOMPUTERS & TECNOLOGIES

Ø China now [jun2016] has more of the world's fastest supercomputers than the US

China Tops Supercomputer Rankings with New 93-Petaflop Machine [20jun2016]

 
Source: TOP500; Michael Feldman, June 20, 2016, 9 a.m.

 Source: Jack Dongarra, Report on the Sunway TaihuLight System, June 2016

1.       A new Chinese supercomputer, the Sunway TaihuLight, captured the number one spot on the latest TOP500 list of supercomputers released on Monday morning at the ISC High Performance conference (ISC) being held in Frankfurt, Germany.
2.       With a Linpack mark of 93 petaflops, the system outperforms the former TOP500 champ, Tianhe-2, by a factor of three. The machine is powered by a new ShenWei processor and custom interconnect, both of which were developed locally, ending any remaining speculation that China would have to rely on Western technology to compete effectively in the upper echelons of supercomputing.
3.       TaihuLight is currently up and running at the National Supercomputing Center in the city of Wuxi, a manufacturing and technology hub, a two-hour drive west of Shanghai. The system will be used for various research and engineering work, in areas such as climate, weather & earth systems modeling, life science research, advanced manufacturing, and data analytics. Center director Prof. Dr. Guangwen Yang, will formally introduce the system on Tuesday afternoon, in a session at ISC.
4.       “As the first number one system of China that is completely based on homegrown processors, the Sunway TaihuLight system demonstrates the significant progress that China has made in the domain of designing and manufacturing large-scale computation systems,” Yang told TOP500 News.
5.       The supercomputer was developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC), the same organization that designed TaihuLight’s predecessor, the Sunway BlueLight system, which is installed at the National Supercomputing Center in Jinan. BlueLight is a 796-teraflop supercomputer, which was deployed in 2011.
6.       BlueLight is powered by an older version of the ShenWei processor, a third-generation 16-core chip, known as the SW1600, which tops out at about 140 gigaflops. In the five years since that system came online, NRCPC developed a much more powerful processor, the SW26010, a 260-core chip that can crank out just over 3 teraflops. TaihuLight has a single SW26010 in each of its 40,960 nodes, which adds up 125 peak petaflops across the entire machine (more than 10 million cores). Linpack, of course, is going to leave some FLOPS on the table, but 93 petaflops represents a respectable 74 percent yield of peak performance.
7.       At 3 teraflops, the new ShenWei silicon is on par with Intel’s “Knights Landing” Xeon Phi, another manycore design, but one with a much more public history. In a bit of related irony, it was the US embargo of high-end processors, such as the Xeon Phi, imposed on a number of Chinese supercomputing centers in April 2015, which precipitated a more concerted effort in that country to develop and manufacture such chips domestically. The embargo probably didn’t impact the TaihuLight timeline, since it was already set to get the new ShenWei parts. But it was widely thought that Tianhe-2 was in line to get an upgrade using Xeon Phi processors, which would have likely raised its performance into 100-petaflop territory well before the Wuxi system came online.
8.       Like its earlier incarnations, this latest ShenWei is a 64-bit RISC processor, with SIMD instruction support and out-of-order execution. Its underlying architecture is somewhat of a mystery, although it’s been speculated that the design was derived from the DEC Alpha architecture. The instruction set is specified simply as ShenWei-64.
9.       The processor is divided into four core groups, each with 64 computing processing elements (CPE) and a management processing element (MPE). Each core group also includes a memory controller delivering an aggregate memory bandwidth of 136.5 GB/second on each socket. As one might expect of a manycore design, it runs at a relatively modest 1.45 GHz and supports just a single execution thread per core. The chip was manufactured at the National High Performance Integrated Circuit Design Center, in Shanghai. The process technology node has not been revealed.
10.    Memory-wise, each node contains 32 GB, adding up to a little over 1.3 PB for the whole machine. While that seems like a lot, it’s not much memory considering the number of cores it must feed. The much smaller 10-petaflop K supercomputer at RIKEN, for example, is outfitted with 1.4 PB of memory, and most of the other large systems on TOP500 list have much better bytes-to-FLOPS ratios than that of TaihuLight. It also relies on the older DDR3 technology, which is slower and more power-hungry than the newer DDR4 memory.
11.    The system is also rather light on cache. In fact, it really doesn’t have any in the L1-L2-L3 sense. Each core is allocated 12 KB of instruction cache, along with 64 KB of local scratchpad. And that’s it. The scratchpad can be used like a level 1 cache to some degree, but without the L2 and L3 levels to buttress it, there’s not a whole lot of capability to speed up memory accesses.
12.    From a power standpoint though, TaihuLight is quite good. It draws 15.3 megawatts (MW) running Linpack, which, somewhat surprisingly, is less power than its 33-petaflop cousin, Tianhe-2, which uses 17.8 MW. TaihuLight’s energy-efficiency of 6 gigaflops/watt is excellent, which will certainly earn it a place in the upper reaches of the Green500 list. Keep in mind though, if the system had a more reasonable amount of memory for its size, it would draw significantly more power and its energy efficiency would suffer accordingly.
13.    The interconnect, simply known as the Sunway Network, is also a homegrown affair. It’s noteworthy that the older Sunlight BlueLight machine employed QDR InfiniBand for the system network. The TaihuLight one, however, is based on PCIe 3.0 technology, and provides 16 GB/second of node-to-node peak bandwidth, with a latency of around 1 microsecond. Running MPI communications over it slows that down to about 12 GB/second. Such performance is pretty much on par with EDR InfiniBand or even 100G Ethernet, although the latency seems a tad high (it depends on exactly what’s being measured, of course). In any case, it looks like the design team opted for simplicity here, rather than breakneck speeds using exotic technology.
14.    Likewise, for the operating system. The Sunway Raise OS, as it’s called, uses standard Linux as the base, along with the necessary tweaks to make it work with the custom TaihuLight architecture. Other parts of the system software are also pretty standard – compilers for C/C++ and Fortran, along with the associated math libraries. All, of course, required ports to the custom ShenWei architecture and instruction set, but presumably much of that development work had already been done for the previous-generation processors.
15.    According to TOP500 author Jack Dongarra, three scientific simulation codes run on TaihuLight have been chosen as Gordon Bell Prize finalists, two of which have managed to reach a sustained performance of 30 to 40 petaflops. The award is bestowed each year on the most noteworthy HPC application, based on “peak performance or special achievements in scalability and time-to-solution on important science and engineering problems.”
16.    In a paper written by Dongarra and published on June 20, he describes these applications and also provides a deep dive into the TaihuLight architecture (upon which much of the information in this article was based). The paper also offers some interesting comparisons to other supercomputers. While Dongarra does have reservations about some elements of the new machine’s design, he concludes: “The fact that there are sizeable applications and Gordon Bell contender applications running on the system is impressive and shows that the system is capable of running real applications and [is] not just a stunt machine.”



New Chinese Supercomputer Named World’s Fastest System on Latest TOP500 List [20jun2016]

Ø Sunway TaihuLight is the new No. 1 system with 93 petaflop/s (quadrillions of calculations per second) on the LINPACK benchmark, on Chinese-designed CPUs

Ø China draws Equal to the U.S . in Overall Installations

 
Source: TOP 500; June 20, 2016, 4:01 a.m.
https://www.top500.org/news/new-chinese-supercomputer-named-worlds-fastest-system-on-latest-top500-list/

FRANKFURT, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn
1.       China maintained its No. 1 ranking on the 47th edition of the TOP500 list of the world’s top supercomputers, but with a new system built entirely using processors designed and made in China. Sunway TaihuLight is the new No. 1 system with 93 petaflop/s (quadrillions of calculations per second) on the LINPACK benchmark.
2.       Developed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, Sunway TaihuLight displaces Tianhe-2, an Intel-based Chinese supercomputer that has claimed the No. 1 spot on the past six TOP500 lists.
3.       The newest edition of the list was announced Monday, June 20, at the 2016 International Supercomputer Conference in Frankfurt. The closely watched list is issued twice a year.
4.       Sunway TaihuLight, with 10,649,600 computing cores comprising 40,960 nodes, is twice as fast and three times as efficient as Tianhe-2, which posted a LINPACK performance of 33.86 petaflop/s. The peak power consumption under load (running the HPL benchmark) is at 15.37 MW, or 6 Gflops/Watt. This allows the TaihuLight system to grab one of the top spots on the Green500 in terms of the Performance/Power metric.  Titan, a Cray XK7 system installed at the Department of Energy’s (DOE) Oak Ridge National Laboratory, is now the No. 3 system. It achieved 17.59 petaflop/s.
5.       Rounding out the Top 10 are Sequoia, an IBM BlueGene/Q system installed at DOE’s Lawrence Livermore National Laboratory; Fujitsu’s K computer installed at the RIKEN Advanced Institute for Computational Science (AICS) in Kobe, Japan; Mira, a BlueGene/Q system installed at DOE’s Argonne National Laboratory; Trinity, a Cray X40 system installed at DOE/NNSA/LANL/SNL; Piz Daint, a Cray XC30 system installed at the Swiss National Supercomputing Centre  and the most powerful system in Europe; Hazel Hen, a Cray XC40 system installed at HLRS in Stuttgart, Germany; and Shaheen II, a Cray XC40 system installed at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia is at No. 10.
6.       The latest list marks the first time since the inception of the TOP500 that the U.S is not home to the largest number of systems. With a surge in industrial and research installations registered over the last few years, China leads with 167 systems and the U.S. is second with 165. China also leads the performance category, thanks to the No. 1 and No. 2 systems.
7.       The European share of 105 systems (compared to 107 in November 2015) has fallen and is now lower than the dominant Asian share of 218 systems, up from 173 in November. Germany is the clear leader in Europe with 26 systems followed by France with 18 and the UK with 12 systems. In Asia, Japan trails China with 29 systems (down from 37).  
8.       Cray continues to be the clear leader in the TOP500 list in total installed performance share with 19.9 percent (down from 25 percent). Thanks to the Sunway TaihuLight system, the National Research Center of Parallel Computer Engineering & Technology takes the second spot with 16.4 percent of the total performance – with just one machine. IBM takes the third spot with 10.7 percent share, down from 14.9 percent six months ago. HPE is third with 12.9 percent, down from 14.2 percent six months ago.
9.       For the first time, the data collection and curation of the Green500 project is now integrated with the TOP500 project.  The most energy-efficient system and No. 1 on the Green500 is Shoubu, a PEZY Computing/Exascaler ZettaScaler-1.6 System achieving  6.67 GFfops/Watt at the Advanced Center for Computing and Communication at RIKEN in Japan.

OTHER HIGHLIGHTS FROM THE OVERALL [WORLD] LIST
                    I.            Total combined performance of all 500 systems has grown to 566.7 petaflop/s, compared to 420 petaflop/s six months ago and 363 petaflop/s one year ago. This increase in installed performance also exhibits a noticeable slowdown in growth compared to the previous long-term trend.
                  II.            There are 95 systems with performance greater than a petaflop/s on the list, up from 81 six months ago.
                III.            Intel continues to provide the processors for the largest share – 455 systems or 91 percent – of the TOP500 systems. The share of IBM Power processors is now at 23 systems, down from 26 systems six month ago. The AMD Opteron family is used in 13 systems (2.6 percent), down from 4.2 percent on the previous list.
                IV.            Hewlett Packard Enterprise has the lead in the total number of systems with 127 systems (25.4 percent) followed by Lenovo with 84 systems. Cray now has 60 systems, down from 69 systems six months ago. HPE had 155 systems six months ago. IBM is now fifth in the systems category with 38 systems.
                  V.            A total of 93 systems on the list are using accelerator/coprocessor technology, down from 104 in November 2015. Sixty-seven of these use NVIDIA chips, 26 systems with Intel Xeon Phi technology, three use ATI Radeon, and two use PEZY technology. Three systems use a combination of NVIDIA and Intel Xeon Phi accelerators/coprocessors.The average number of accelerator cores for these systems is 76,000 cores per system.
                VI.            The entry level (No. 500) to the list moved up to the 285.9 teraflop/s mark on the LINPACK benchmark, compared to 206.3 teraflop/s six months ago. The last system on the newest list would have been listed at position 351 in the previous TOP500.
                VII.            The performance of the last system on the list (No. 500) has systematically continued to lag behind historical trends for the last 6 years and now clearly continues to run on a different growth trajectory than before. From 1994 to 2008 it grew by 90 percent per year, but since 2008 it has only grown by 55 percent per year.

 

About the TOP500 List

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.
The TOP500 list is compiled by:
Ø  Erich Strohmaier and Horst Simon of Lawrence Berkeley National Laboratory;
Ø  Jack Dongarra of the University of Tennessee, Knoxville; and
Ø  Martin Meuer of ISC Group, Germany.


Chinese supercomputer is the world's fastest — and without using US chips [20jun2016]


Ø China now [jun2016] has more of the world's fastest supercomputers than the US

Source: The Verge; By James Vincent; @jjvincent; on 
http://www.theverge.com/2016/6/20/11975356/chinese-supercomputer-worlds-fastes-taihulight

A Chinese supercomputer built using domestic chip technology has been declared the world's fastest. The news highlights China's recent advances in the creation of such systems, as well the country's waning reliance on US semiconductor technology.
THE TAIHULIGHT IS CAPABLE OF 93 PETAFLOPS
The Sunway TaihuLight takes the top spot from previous record-holder Tianhe-2 (also located in China), and more than triples the latter's speed. The new number one is capable of performing some 93 quadrillion calculations per second (otherwise known as petaflops) and is roughly five times more powerful than the speediest US system, which is now ranked third worldwide.
THE TAIHULIGHT IS COMPRISED OF SOME 41,000 CHIPS, EACH WITH 260 PROCESSOR CORES.
This makes for a total of 10.65 million cores, compared to the 560,000 cores in America's top machine. In terms of memory, it's relatively light on its feet, with just 1.3 petabytes used for the entire machine. (By comparison, the much less powerful 10-petaflop K supercomputer uses 1.4 petabytes of RAM.) This means it's unusually energy efficient, drawing just 15.3 megawatts of power — less than the 17.8 megawatts used by the 33-petaflop Tianhe-2.
More significantly than its specs, though, is the fact that the TaihuLight is built from Chinese semiconductors. "It’s not based on an existing architecture. They built it themselves," Jack Dongarra, a professor at the University of Tennessee and creator of the measurement system used to rank the world's supercomputers, told Bloomberg. "This is a system that has Chinese processors."
THE US HAS BANNED THE EXPORT OF HIGH-PERFORMANCE CHIPS TO CHINA
The previous fastest supercomputer, China's Tianhe-2, was built using US-made Intel processors. There were plans to upgrade the Tianhe-2's performance last year, but in April 2015 the US government placed an export ban on all high-performance computing chips to China. The Department of Commerce said that exporting such technology was "acting contrary" to American national security or foreign interests, and suggested that an earlier Chinese supercomputer — the Tianhe-1A — had been "used in nuclear explosive activities."
Supercomputers are thought by both the US and China to be integral for national security and scientific research. Such systems are used for a variety of tasks, including civilian work like climate forecasting and product design. However, they're also useful for more high-stakes research, including cybersecurity and nuclear weaponry. According to its creators, the TaihuLight will be used in the fields of manufacturing, life science, and earth system modeling.
China's investment in high-performance chips and supercomputers in recent years has been significant and effective. In 2001, there were no Chinese supercomputers in the world's top 500 ranking. Now, there are 167 — more than the US, which has 165 entries. The development of TaihuLight was funded under the so-called "863 program," a government project aimed at ending reliance on foreign technology.

DEFINITION OF PETAFLOP
http://whatis.techtarget.com/definition/petaflop

petaflop

Part of the Microprocessors glossary:
A petaflop is a measure of a computer's processing speed and can be expressed as:
  1. A quadrillion (thousand trillion) floating point operations per second (FLOPS)
  2. A thousand teraflops
  3. 10 to the 15th power FLOPS
  4. 2 to the 50th power FLOPS
In June, 2008, IBM's Roadrunner supercomputer was the first to break what has been called "the petaflop barrier." In November 2008, when the annual rankings of the Top 500 supercomputers were released, there were two computers to do so. At 1.105 petaflops, Roadrunner retained its top place from the previous list, ahead of Cray's Jaguar, which ran at 1.059 petaflops.
Breaking the petaflop barrier is expected to have profound and far-reaching effects on the future of science. According to Thomas Zacharia, head of computer science at Cray's Oak Ridge National Laboratory in Tennessee, "The new capability allows you to do fundamentally new physics and tackle new problems. And it will accelerate the transition from basic research to applied technology."
Petaflop computing will enable much more accurate modeling of complex systems. Applications are expected to include real-time nuclear magnetic resonance imaging during surgery, computer-based drug design, astrophysical simulation, the modeling of environmental pollution, and the study of long-term climate changes.
This was last updated in November 2008
Contributor(s): Jeremy Weiss
Posted by: Margaret Rouse

Related Terms

DEFINITIONS

Ø  biomimetics (biomimicry)
Biomimetic refers to human-made processes, substances, devices, or systems that imitate nature. Biomimetic technologies are also known as biomimicry: mimicry of biological systems.(WhatIs.com)
Ø  clock gating
Clock gating is the power-saving feature in semiconductor microelectronics that enables switching off circuits. Many electronic devices use clock gating to turn off buses, controllers, bridges and ... (WhatIs.com)
Ø  nanomedicine
Nanomedicine is the application of nanotechnology (the engineering of tiny machines) to the prevention and treatment of disease in the human body. (WhatIs.com)

GLOSSARIES

Ø  Microprocessors
Terms related to microprocessors, including definitions about silicon chips and words and phrases about computer processors.
Ø  Internet applications
This WhatIs.com glossary contains terms related to Internet applications, including definitions about Software as a Service (SaaS) delivery models and words and phrases about web sites, e-commerce ...

Ø  Dig Deeper

CONTINUE READING ABOUT PETAFLOP

PEOPLE WHO READ THIS ALSO READ...




Nenhum comentário:

Postar um comentário