Dirks Blog: November 2019

I once found the following question in Quora: Is a quality smartphone of today really more powerful than a 1970s era mainframe or supercomputer?

Eugene Miya stated in his post that:

It depends on the 2 factors: the hardware (are you running on a 32-bit CPU (e.g., ARM) or 64-bit?) and the software (the app).
The odds are somewhat against the phone with the supercomputer. But the odds favor the phone with 70s era mainframe apps. This is because most of those mainframes were 32-bit machines like most modern phones. Running a 64-bit supercomputer code on a 32-bit CPU will really cripple the phone.

A well-known supercomputer of that era was the Cray-1A (1976) by Cray Research. It was a 64-bit system. Also most if not all current smartphones have 64-bit CPUs. But that similarity does not help a little bit. You can not run a program of one architecture on another. Hell, on the same PC you can't even run a native Linux program directly on Windows and/or vice versa, without having some kind of virtualization. Without static linking you would not even have success with a SUSE Linux program on Redhat Linux and/or vice versa.

General differences, which prevent the running of software on another hardware, therefore are:

different instruction opcodes
different operating systems
different runtime libraries for performing system calls
etc.

But what you could do, is taking the source code of the program and compile it on the destination machine. A Cray-1A was programmed either directly in assembler or using a Cray dialect of FORTRAN IV called Cray FORTRAN (CFT), which included support for the Cray-1's vector registers. You could adapt the original FORTRAN code to e.g. Fortran 95 or later and compile it with gfortran on your smartphone (see also my article: Termux: The app that enables a parallel Linux installation under Android).

Sharan Kalwani wrote in his post:

However I would like to challenge the modern day user of a smartphone to duplicate - *exactly* - what a 1976 Cray-1A was doing day in and day out, e.g. predicting the weather every half hour for the entire United States of America, for a 100 mile x 100 mile grid (no exceptions), sucking in all the weather sensor inputs simultaneously, crunching the numbers and spitting out the data output, so that forecasts could be taken and shared amongst sites (you must also run the exact same source codes, yes, Fortran gasp!).
People get carried away with clock rates, or memory sizes, etc. but forget the I/O aspect and of course the ability to program them precisely the way you want them so it can do gigantic tasks with ease. Too much current state hardware is wasted doing fancy graphics.

So, let's have a look at this:

It seems here that the not-winnable argumentation moved from a pure performance challenge to a mimic challenge, with the sole difficulty of obtaining the original program and data, which makes this challenge useless. Also, should the mentioning of "Fortran" (back then, it was still called "FORTRAN") frighten people? I had FORTRAN 77 classes during my computer science studies. If necessary, I could adapt a given CFT program (program source code; not punched cards) into Fortran 95, compile it on my smartphone with gfortran and let it run on a given set of provided weather data.
And why should I be restricted to a specific programming language? It's like requesting, that the programs on the Cray-1A should be written in C, with the difference that it is possible to write Fortran programs on a smartphone and C programs on a Cray-1A, not. This is a question about system performance comparison. For this I could use on the smartphone any system-near programming language I like.
In contrast to the Cray-1A it is possible to write programs for a smartphone directly on the smartphone. You don't need to write them on a separate computer. And you could compile these programs directly on the smartphone, instead of punching punch cards.
Instead of assembler and/or FORTRAN (or Fortran), you could use any of many other programming languages on your smartphone, e.g.:
- C
- C++
- Rust
Therefore, programs for a smartphone could be programmed in the same way as programs for desktops. These programs could do gigantic tasks many times faster on a smartphone than on supercomputers of the 70s, 80s and the early 90s (see below: Performance comparison).
Or is he implying, that desktop computers are also only gaming machines? Ok, in some cases, they are ...
"Crunchinng the numbers" sounds like serious work. Calculations with that amount of data really was a lot of work back then. But what had the Cray-1A to offer:
- 1 "single-core" CPU
- 200,000 gates
- 80 MHz
- 160 MFLOPS (if additions and multiplications are performed in parallel at all times; otherwise 80 MFLOPS)
- instruction pipeline
- large amount of vector registers for applying one instruction on several of them
- although the Cray-1A only had one vector ALU, some parallelism was possible, like: filling source registers, executing arithmetic operation, filling result registers
- parallelism of addition and multiplication (like already mentioned above)
- vector instruction chaining
- 8 MB memory
- a few GB of storage
On the other hand, a today's smartphone offers:
- 1 "multi-core" CPU
- 8-10 billion transistors, each transistor consisting of 2 or more gates
- up to about 3 GHz
- over 50/100 GFLOPS (single/multi-core)
- instruction pipeline with several steps in parallel
- e.g. 6 or 8 cores per CPU; each capable of running individual threads in parallel
- a few GB memory
- many GB of storage
A Cray-1A simply does neither have the number of transistors, parallel processing architectore (like the successor, the Cray X-MP of 1982, which was also no match to a recent smartphone) nor the clock speed to perform the same amount of calculations in the same time as a smartphone. I/O does not help that.
Speaking of I/O: the Cray-1A had 24 I/O channels (12 input, 12 output) with a total throughput of 32 Mbit/s. The improved modell, the Cray-1S (1979), had a 100 Mbit/s data connection to its I/O subsystem. Even that is not a problem for a smartphone. Therefore, the I/O aspect could easily be disregarded.
If, theoretically, the Cray-1A had an I/O throughput much higher than a smartphone, it does not have the CPU power to process that amount of data.
And what again was according to Sharan Kalwani the key advantage of a Cray-1A against a smartphone? "Crunching numbers" or the "I/O aspect"?
And it is not true, that current-state hardware is wasted doing fancy graphics. The chips of current smartphones are implemented as system on a chip (SoC). One chip includes:
- several processors/cores
- caches
- memory controller (with increasing memory sizes, the memory is often implemented off-chip, right next to the SoC)
- wireless networking capabilities
- sensor coprocessor
- audio and video DSPs
- graphics processing unit (GPU)
Therefore, the processor power is not wasted completely doing the graphics. The SoC has a dedicated GPU doing that, like the separate graphics card in a desktop computer.

Performance comparison

Because the challenge mentioned by Sharan Kalwani is not easily reproducible with him not providing any sources, I have done some tests on my smartphone with verifyable results of Cray-1 supercomputers (and others) in the Internet.

All test programs were compiled and run on my 3 year old Huawei P9 Plus (HiSilicon Kirin 955 SoC, 64-bit, 4x 2.52 GHz ARM Cortex-A72 and 4x 1.80 GHz ARM Cortex-A53 cores, about 3 billion transistors, 4 GB memory) smartphone. Current top smartphones are a few times faster.

All tests could be reproduced on any Android smartphone or tablet (see my article: Termux: The app that enables a parallel Linux installation under Android).

Prime numbers

There is another good Quora question: Can you write a C++ program that finds all prime numbers from 2 to 2 billion in under 1 second on an average €500 PC? The ingenious single-thread solution in the answer of Vlado Boza needed on my smartphone about 4 seconds and the mentioned 4-thread version less than 2 seconds to find all primes up to 2 billion.

The highly optimized PrimeSieve 7.5 took on my smartphone less than 0.3 seconds to find all 98,222,287 primes between 2 and 2,000,000,000.

Already counting from 0 to 98,222,287 would have taken on a Cray-1A longer than one second, not to mention performing primality tests. Based on:

one addition per CPU cycle
80,000,000 CPU cycles per second (80 MHz)
therefore, during one second maximum reachable value by repeated +1 addition: 80,000,000

Prime factorization

In 1983 James A. Davis, Diane B. Holdridge and Gustavus J. Simmons at Sandia National Laboratories factorized the "ten 'most wanted' factorizations", including the 71 decimal digit number (10⁷¹-1)/9, consisting of 71 1 digits. It took a Cray X-MP 9.5 hours to factorize this number into its two prime factors with 30 and 41 decimal digits, respectively.

In 1991 a MasPar MP-1 modell MP-1216 parallel minisupercomputer factorized RSA-100, an integer with 100 decimal-digits (330 bits), within 3 CPU-days, as stated by Brandon Dixon and Arjen K. Lenstra in Factoring Integers Using SIMD Sieves. An MP-1 consisted of many parallel ALUs (called "PE", for processor element). 32 PEs were combined in one chip. Each chip consisted of 400,000 transistors. Each PE was a full 32-bit ALU, containing 64 32-bit registers (integer or floating point), the FPU used single and double precision arithmetic. There were 5 MasPar MP-1 modells, the smallest with 1,024 PEs. The modell MP-1216 used for the RSA-100 factorization was the largest with 16,384 PEs. It had a total memory capacity of 1 GB and a theoretical total processing power of about 26,000/13,000 MIPS and 1,200/550 MFLOPS of 32/64-bit operations.

It took my smartphone less than 16 hours to factorize RSA-100 using msieve-1.53 (built with "make all ECM=1").

Oh, before I forget: the 71-digit composite number was factorized by my smartphone in less than 3 minutes.

Mersenne primes

The 28th Mersenne prime number 2⁸⁶²⁴³-1 was found on a Cray-1 by David Slowinski, a systems analyst at Cray Research, on 25th September 1982. This Mersenne prime has 25,962 decimal digits. It took the Cray-1 600 hours to find it. The previous Mersenne prime number 2⁴⁴⁴⁹⁷-1 (13,395 decimal digits) was found on 8th April 1979, also on a Cray-1 by Harry L. Nelson and David Slowinski.

On my smartphone I compiled and used a C program (built with "clang -O3 -fomit-frame-pointer") utilizing the Lucas-Lehmer primality test to search for Mersenne primes. All Mersenne primes up to M27 (2⁴⁴⁴⁹⁷-1) were found in less than 4 hours. After that, it took my smartphone less than 16 hours to find M28 (2⁸⁶²⁴³-1).

Dirks Blog

Menü

Samstag, 30. November 2019

Is a today's smartphone more powerful than a 1970s supercomputer?

Performance comparison

Prime numbers

Prime factorization

Mersenne primes

Montag, 11. November 2019

Termux: The app that enables a parallel Linux installation under Android

Other Linux environments