China Five Millenium Culture preserves the Forbiden City and produces the fastest supercomputer in the World. XXI Century is the Era of China.
CHINA’s
SUPERCOMPUTERS
& TECNOLOGIES
Ø China now [jun2016]
has more of the world's fastest supercomputers than the US
China Tops Supercomputer Rankings with New
93-Petaflop Machine [20jun2016]
Source:
TOP500; Michael Feldman, June 20, 2016, 9 a.m.
|
Source: Jack Dongarra, Report on the Sunway
TaihuLight System, June 2016
|
1.
A new Chinese supercomputer,
the Sunway TaihuLight, captured the number one spot on the latest TOP500 list
of supercomputers released on Monday morning at the ISC High Performance
conference (ISC) being held in Frankfurt, Germany.
2.
With a Linpack mark of 93 petaflops,
the system outperforms the former TOP500 champ, Tianhe-2, by a factor of three.
The machine is powered by a new ShenWei processor and custom interconnect, both
of which were developed locally, ending any remaining speculation that China
would have to rely on Western technology to compete effectively in the upper
echelons of supercomputing.
3.
TaihuLight is currently up and
running at the National Supercomputing Center in the city of Wuxi, a
manufacturing and technology hub, a two-hour drive west of Shanghai. The system
will be used for various research and engineering work, in areas such as
climate, weather & earth systems modeling, life science research, advanced
manufacturing, and data analytics. Center director Prof. Dr. Guangwen Yang, will
formally introduce the system on Tuesday afternoon, in a session at ISC.
4.
“As the first number one system of
China that is completely based on homegrown processors, the Sunway TaihuLight
system demonstrates the significant progress that China has made in the domain
of designing and manufacturing large-scale computation systems,” Yang told
TOP500 News.
5.
The supercomputer was developed by the National Research Center of Parallel
Computer Engineering & Technology (NRCPC), the same organization that
designed TaihuLight’s predecessor, the Sunway BlueLight system, which is
installed at the National Supercomputing Center in Jinan. BlueLight is a 796-teraflop
supercomputer, which was deployed in 2011.
6.
BlueLight is powered by an older
version of the ShenWei processor, a third-generation 16-core chip, known as the
SW1600, which tops out at about 140 gigaflops. In the five years since that
system came online, NRCPC developed a much more powerful processor, the
SW26010, a 260-core chip that can crank out just over 3 teraflops. TaihuLight
has a single SW26010 in each of its 40,960 nodes, which adds up 125 peak
petaflops across the entire machine (more than 10 million cores). Linpack, of
course, is going to leave some FLOPS on the table, but 93 petaflops represents
a respectable 74 percent yield of peak performance.
7.
At 3 teraflops, the new ShenWei
silicon is on par with Intel’s “Knights Landing” Xeon Phi, another manycore
design, but one with a much more public history. In a bit of related irony, it
was the US embargo of high-end processors, such as the Xeon Phi, imposed on a
number of Chinese supercomputing centers in April 2015, which precipitated a
more concerted effort in that country to develop and manufacture such chips
domestically. The embargo probably didn’t impact the TaihuLight timeline, since
it was already set to get the new ShenWei parts. But it was widely thought that
Tianhe-2 was in line to get an upgrade using Xeon Phi processors, which would
have likely raised its performance into 100-petaflop territory well before the
Wuxi system came online.
8.
Like its earlier incarnations, this
latest ShenWei is a 64-bit RISC processor, with SIMD instruction support and
out-of-order execution. Its underlying architecture is somewhat of a mystery,
although it’s been speculated that the design was derived from the DEC Alpha
architecture. The instruction set is specified simply as ShenWei-64.
9.
The processor is divided into four
core groups, each with 64 computing processing elements (CPE) and a management
processing element (MPE). Each core group also includes a memory controller
delivering an aggregate memory bandwidth of 136.5 GB/second on each socket. As
one might expect of a manycore design, it runs at a relatively modest 1.45 GHz
and supports just a single execution thread per core. The chip was manufactured
at the National High Performance Integrated Circuit Design Center, in Shanghai.
The process technology node has not been revealed.
10.
Memory-wise, each node contains 32
GB, adding up to a little over 1.3 PB for the whole machine. While that seems
like a lot, it’s not much memory considering the number of cores it must feed.
The much smaller 10-petaflop K supercomputer at RIKEN, for example, is
outfitted with 1.4 PB of memory, and most of the other large systems on TOP500
list have much better bytes-to-FLOPS ratios than that of TaihuLight. It also
relies on the older DDR3 technology, which is slower and more power-hungry than
the newer DDR4 memory.
11.
The system is also rather light on
cache. In fact, it really doesn’t have any in the L1-L2-L3 sense. Each core is
allocated 12 KB of instruction cache, along with 64 KB of local scratchpad. And
that’s it. The scratchpad can be used like a level 1 cache to some degree, but
without the L2 and L3 levels to buttress it, there’s not a whole lot of
capability to speed up memory accesses.
12.
From a power standpoint though,
TaihuLight is quite good. It draws 15.3 megawatts (MW) running Linpack, which,
somewhat surprisingly, is less power than its 33-petaflop cousin, Tianhe-2,
which uses 17.8 MW. TaihuLight’s energy-efficiency of 6 gigaflops/watt is
excellent, which will certainly earn it a place in the upper reaches of the
Green500 list. Keep in mind though, if the system had a more reasonable amount
of memory for its size, it would draw significantly more power and its energy
efficiency would suffer accordingly.
13.
The interconnect, simply known as the
Sunway Network, is also a homegrown affair. It’s noteworthy that the older
Sunlight BlueLight machine employed QDR InfiniBand for the system network. The
TaihuLight one, however, is based on PCIe 3.0 technology, and provides 16
GB/second of node-to-node peak bandwidth, with a latency of around 1
microsecond. Running MPI communications over it slows that down to about 12
GB/second. Such performance is pretty much on par with EDR InfiniBand or even
100G Ethernet, although the latency seems a tad high (it depends on exactly
what’s being measured, of course). In any case, it looks like the design team
opted for simplicity here, rather than breakneck speeds using exotic
technology.
14.
Likewise, for the operating system.
The Sunway Raise OS, as it’s called, uses standard Linux as the base, along with
the necessary tweaks to make it work with the custom TaihuLight architecture.
Other parts of the system software are also pretty standard – compilers for
C/C++ and Fortran, along with the associated math libraries. All, of course,
required ports to the custom ShenWei architecture and instruction set, but
presumably much of that development work had already been done for the
previous-generation processors.
15.
According to TOP500 author Jack
Dongarra, three scientific simulation codes run on TaihuLight have been chosen
as Gordon Bell Prize finalists, two of which have managed to reach a sustained
performance of 30 to 40 petaflops. The award is bestowed each year on the most
noteworthy HPC application, based on “peak performance or special achievements
in scalability and time-to-solution on important science and engineering
problems.”
16.
In a paper written by Dongarra and
published on June 20, he describes these applications and also provides a deep
dive into the TaihuLight architecture (upon which much of the information in
this article was based). The paper also offers some interesting comparisons to
other supercomputers. While Dongarra does have reservations about some elements
of the new machine’s design, he concludes: “The fact that there are sizeable
applications and Gordon Bell contender applications running on the system is
impressive and shows that the system is capable of running real applications
and [is] not just a stunt machine.”
New Chinese Supercomputer Named World’s
Fastest System on Latest TOP500 List [20jun2016]
Ø Sunway TaihuLight is the new No. 1 system with 93
petaflop/s (quadrillions of calculations per second) on the LINPACK benchmark, on Chinese-designed CPUs
Ø China draws Equal to the U.S . in Overall
Installations
Source:
TOP 500; June 20, 2016, 4:01 a.m.
https://www.top500.org/news/new-chinese-supercomputer-named-worlds-fastest-system-on-latest-top500-list/
FRANKFURT, Germany; BERKELEY,
Calif.; and KNOXVILLE, Tenn
1.
China maintained its No. 1 ranking on
the 47th edition of the TOP500 list of the world’s top supercomputers, but with
a new system built entirely using processors designed and made in China. Sunway TaihuLight is the new No. 1 system
with 93 petaflop/s (quadrillions of calculations per second) on the LINPACK
benchmark.
2.
Developed by the National Research Center of Parallel Computer Engineering & Technology
(NRCPC) and installed at the National
Supercomputing Center in Wuxi, Sunway TaihuLight displaces Tianhe-2, an Intel-based Chinese supercomputer that
has claimed the No. 1 spot on the past six TOP500 lists.
3.
The newest edition of the list was
announced Monday, June 20, at the 2016 International Supercomputer Conference
in Frankfurt. The closely watched list is issued twice a year.
4.
Sunway TaihuLight, with 10,649,600
computing cores comprising 40,960 nodes, is twice as fast and three times as
efficient as Tianhe-2, which posted a LINPACK performance of 33.86 petaflop/s.
The peak power consumption under load (running the HPL benchmark) is at 15.37
MW, or 6 Gflops/Watt. This allows the TaihuLight system to grab one of the top
spots on the Green500 in terms of the Performance/Power metric. Titan, a
Cray XK7 system installed at the Department of Energy’s (DOE) Oak Ridge
National Laboratory, is now the No. 3 system. It achieved 17.59 petaflop/s.
5.
Rounding out the Top 10 are Sequoia,
an IBM BlueGene/Q system installed at DOE’s Lawrence Livermore National
Laboratory; Fujitsu’s K computer installed at the RIKEN Advanced Institute for
Computational Science (AICS) in Kobe, Japan; Mira, a BlueGene/Q system
installed at DOE’s Argonne National Laboratory; Trinity, a Cray X40 system
installed at DOE/NNSA/LANL/SNL; Piz Daint, a Cray XC30 system installed at the
Swiss National Supercomputing Centre and the most powerful system in
Europe; Hazel Hen, a Cray XC40 system installed at HLRS in Stuttgart, Germany;
and Shaheen II, a Cray XC40 system installed at King Abdullah University of
Science and Technology (KAUST) in Saudi Arabia is at No. 10.
6.
The latest list marks the first time
since the inception of the TOP500 that the U.S is not home to the largest
number of systems. With a surge in industrial and research installations
registered over the last few years, China leads with 167 systems and the U.S.
is second with 165. China also leads the performance category, thanks to the
No. 1 and No. 2 systems.
7.
The European share of 105 systems
(compared to 107 in November 2015) has fallen and is now lower than the
dominant Asian share of 218 systems, up from 173 in November. Germany is the
clear leader in Europe with 26 systems followed by France with 18 and the UK
with 12 systems. In Asia, Japan trails China with 29 systems (down from 37).
8.
Cray continues to be the clear leader
in the TOP500 list in total installed performance share with 19.9 percent (down
from 25 percent). Thanks to the Sunway TaihuLight system, the National Research
Center of Parallel Computer Engineering & Technology takes the second spot
with 16.4 percent of the total performance – with just one machine. IBM takes
the third spot with 10.7 percent share, down from 14.9 percent six months ago.
HPE is third with 12.9 percent, down from 14.2 percent six months ago.
9.
For the first time, the data
collection and curation of the Green500 project is now integrated with the
TOP500 project. The most energy-efficient system and No. 1 on the
Green500 is Shoubu, a PEZY Computing/Exascaler ZettaScaler-1.6 System achieving
6.67 GFfops/Watt at the Advanced Center for Computing and Communication
at RIKEN in Japan.
OTHER HIGHLIGHTS FROM THE OVERALL [WORLD]
LIST
I.
Total combined performance of all 500 systems has
grown to 566.7 petaflop/s, compared to 420 petaflop/s six months ago and 363
petaflop/s one year ago. This increase in installed performance also exhibits a
noticeable slowdown in growth compared to the previous long-term trend.
II.
There are 95 systems with performance greater than a
petaflop/s on the list, up from 81 six months ago.
III.
Intel continues to provide the processors for the
largest share – 455 systems or 91 percent – of the TOP500 systems. The share of
IBM Power processors is now at 23 systems, down from 26 systems six month ago.
The AMD Opteron family is used in 13 systems (2.6 percent), down from 4.2
percent on the previous list.
IV.
Hewlett Packard Enterprise has the lead in the total
number of systems with 127 systems (25.4 percent) followed by Lenovo with 84
systems. Cray now has 60 systems, down from 69 systems six months ago. HPE had
155 systems six months ago. IBM is now fifth in the systems category with 38
systems.
V.
A total of 93 systems on the list are using
accelerator/coprocessor technology, down from 104 in November 2015. Sixty-seven
of these use NVIDIA chips, 26 systems with Intel Xeon Phi technology, three use
ATI Radeon, and two use PEZY technology. Three systems use a combination of
NVIDIA and Intel Xeon Phi accelerators/coprocessors.The average number of
accelerator cores for these systems is 76,000 cores per system.
VI.
The entry level (No. 500) to the list moved up to the
285.9 teraflop/s mark on the LINPACK benchmark, compared to 206.3 teraflop/s
six months ago. The last system on the newest list would have been listed at
position 351 in the previous TOP500.
VII.
The performance of the last system on the list (No.
500) has systematically continued to lag behind historical trends for the last
6 years and now clearly continues to run on a different growth trajectory than
before. From 1994 to 2008 it grew by 90 percent per year, but since 2008 it has
only grown by 55 percent per year.
About the TOP500 List
The first version of what
became today’s TOP500 list started as an exercise for a small conference in
Germany in June 1993. Out of curiosity, the authors decided to revisit the list
in November 1993 to see how things had changed. About that time they realized
they might be onto something and decided to continue compiling the list, which
is now a much-anticipated, much-watched and much-debated twice-yearly event.
The
TOP500 list is compiled by:
Ø Erich Strohmaier and Horst Simon of
Lawrence Berkeley National Laboratory;
Ø Jack Dongarra of the University of
Tennessee, Knoxville; and
Ø Martin Meuer of ISC Group, Germany.
Chinese supercomputer is the world's fastest — and without using US
chips [20jun2016]
Ø China now [jun2016] has more of the world's fastest supercomputers than
the US
http://www.theverge.com/2016/6/20/11975356/chinese-supercomputer-worlds-fastes-taihulight
A Chinese supercomputer built using domestic chip
technology has been declared the world's fastest. The news highlights China's recent advances in the
creation of such systems, as well the country's waning reliance on US
semiconductor technology.
THE TAIHULIGHT IS CAPABLE OF 93 PETAFLOPS
The Sunway TaihuLight takes the top spot from previous record-holder
Tianhe-2 (also located in China), and more than triples the latter's speed. The
new number one is capable of performing some 93 quadrillion calculations per
second (otherwise known as petaflops) and is roughly five times more powerful
than the speediest US system, which is now ranked third worldwide.
THE TAIHULIGHT IS COMPRISED OF SOME 41,000 CHIPS, EACH
WITH 260 PROCESSOR CORES.
This makes for a total of 10.65 million cores, compared to the 560,000
cores in America's top machine. In terms of memory, it's relatively light on
its feet, with just 1.3 petabytes used for the entire machine. (By comparison, the much
less powerful 10-petaflop K supercomputer uses 1.4 petabytes of RAM.) This
means it's unusually energy efficient, drawing just 15.3 megawatts of power —
less than the 17.8 megawatts used by the 33-petaflop Tianhe-2.
More significantly than its specs, though, is the fact that the
TaihuLight is built from Chinese semiconductors. "It’s not based on an
existing architecture. They built it themselves," Jack Dongarra, a
professor at the University of Tennessee and creator of the measurement system
used to rank the world's supercomputers, told Bloomberg. "This is a system that has Chinese
processors."
THE US HAS BANNED THE EXPORT OF HIGH-PERFORMANCE CHIPS TO CHINA
The previous fastest supercomputer, China's Tianhe-2, was built using
US-made Intel processors. There were plans to upgrade the Tianhe-2's
performance last year, but in April 2015 the US government placed an export ban on all high-performance computing chips to China. The
Department of Commerce said that exporting such technology was "acting
contrary" to American national security or foreign interests, and
suggested that an earlier Chinese supercomputer — the Tianhe-1A — had been "used in nuclear explosive activities."
Supercomputers are thought by both the US and China to
be integral for national security and scientific research. Such systems are used for a variety of tasks,
including civilian work like climate forecasting and product design. However,
they're also useful for more high-stakes research, including cybersecurity and
nuclear weaponry. According to its creators, the TaihuLight will be used in the
fields of manufacturing, life science, and earth system modeling.
China's investment in high-performance chips and supercomputers in
recent years has been significant and effective. In 2001, there were no Chinese supercomputers in the world's top 500
ranking. Now, there are 167 — more than the US, which has 165 entries. The
development of TaihuLight was funded under the so-called "863
program," a government project aimed at ending reliance on foreign
technology.
DEFINITION OF PETAFLOP
http://whatis.techtarget.com/definition/petaflop
petaflop
Part of the Microprocessors glossary:
A petaflop is a measure of
a computer's processing speed and can be expressed as:
- A quadrillion (thousand trillion)
floating point operations per second (FLOPS)
- A thousand teraflops
- 10 to the 15th
power FLOPS
- 2 to the 50th
power FLOPS
In
June, 2008, IBM's Roadrunner supercomputer was the first to break what has been
called "the petaflop barrier." In November 2008, when the annual
rankings of the Top 500 supercomputers were released, there were two computers
to do so. At 1.105 petaflops, Roadrunner retained its top place from the
previous list, ahead of Cray's Jaguar, which ran at 1.059 petaflops.
Breaking
the petaflop barrier is expected to have profound and far-reaching effects on
the future of science. According to Thomas Zacharia, head of computer science
at Cray's Oak Ridge National Laboratory in Tennessee, "The new capability
allows you to do fundamentally new physics and tackle new problems. And it will
accelerate the transition from basic research to applied technology."
Petaflop
computing will enable much more accurate modeling of complex systems. Applications
are expected to include real-time nuclear magnetic resonance imaging during
surgery, computer-based drug design, astrophysical simulation, the modeling of
environmental pollution, and the study of long-term climate changes.
This
was last updated in November 2008
Contributor(s): Jeremy Weiss
Related
Terms
DEFINITIONS
Ø
biomimetics
(biomimicry)
Biomimetic refers to human-made processes, substances,
devices, or systems that imitate nature. Biomimetic technologies are also known
as biomimicry: mimicry of biological systems.(WhatIs.com)
Ø
clock gating
Clock gating is the power-saving feature in semiconductor
microelectronics that enables switching off circuits. Many electronic devices
use clock gating to turn off buses, controllers, bridges and ... (WhatIs.com)
Ø
nanomedicine
Nanomedicine is the application of nanotechnology (the
engineering of tiny machines) to the prevention and treatment of disease in the
human body. (WhatIs.com)
GLOSSARIES
Ø
Microprocessors
Terms related to microprocessors, including definitions about
silicon chips and words and phrases about computer processors.
Ø
Internet
applications
This WhatIs.com glossary contains terms related to Internet
applications, including definitions about Software as a Service (SaaS) delivery
models and words and phrases about web sites, e-commerce ...