This blog is moving to http://raid6.com.au/~onlyjob/ Please update your links.

WAVE - World Association for Vaccine Education - Online vaccine education database and community Windows 7 Sins

onlyjob

24 October 2011

20 October 2011

23 March 2011

Donating this month

This article has been moved to

http://raid6.com.au/~onlyjob/posts/donations/

Please update your links.

08 March 2011

Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java benchmark/comparison.



This article is moving to

http://raid6.com.au/~onlyjob/posts/arena

Please update your links - it will disappear from here some time later.


Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java benchmark/comparison.

Understanding difference(s) between programming languages is crucial. If wrong language is chosen for a project it will take a lot of time and efforts to change the course and re-implement the project or its part in different language. Typically it takes years of efforts, misery and dissatisfaction for everyone: yourself, your colleagues, your clients and your systems administrator(s). Needless to mention it can be dangerous for business.
Knowledge of how languages differ from each other is the key to making right decisions. Environments may have different demands - for example what language will be the best choice for VPS with limited RAM? Sometimes it is not easy to answer questions like this, considering many false beliefs and rumors so common among developers.
This testing is designed to demonstrate the difference between popular programming languages.
I hope you consider results of this little research to be interesting.

Method

Test code grows text string by adding another string in cycle until it grows up to 4 mb. Each iteration substitutes some text. Every time string becomes 256 KiB larger program prints number of seconds passed since beginning of test. App's output is being piped to script capturing memory usage (using memstat) for every line printed.

String manipulation is the core functionality for all languages so this allows to compare languages fairly. Processing of large string(s) reasonably stresses memory which manifests difference between language's efficiency.
Because the test case is very simple it is easy to implement it in different languages in similar way. Obviously code itself should not be considered practical because its only purpose is to create some computational load for measurement. Code samples are available for review. All implementations are reasonably accurate and straightforward. Again, similar amount of work done in similar way should be considered fair for comparison.

String processing has been chosen for numerous reasons.
Most applications don't do long calculations. For serious math core functionality of any language is not good enough. Using 3rd party math libraries will make comparison unfair despite the fact that comparing libraries would be meaningless if we want to compare languages, not math libraries.
Moreover integer calculations are not a good subject to test because integer size may be different. Accuracy of floating point calculations is affected by default precision and may be even more hardware-dependent. String processing is essential in every application because strings are just data. More data means more stress for garbage collection etc. Processing of large strings is easy to compare because all the languages in this testing will be doing same amount of work.
Essentially string processing is very common - XML(-RPC), HTML, logs, messages, GUI - all of this processing string at low level even when details of this process are hidden from developer behind API. Strings processing is not accelerated by hardware. By processing large string languages do many memory (re)allocations and if necessary copy data in memory. Efficiency of such processes is the subject of this testing because it shows well enough how languages are different.

Tests run long enough to compare performance and memory usage, but not the time needed for runtime startup. That's why running each test once is good enough for comparison. During experiments I ran every test many times and noticed just a little deviation between results. I considered those deviation to be negligibly small (statistically insignificant) so final comparison made from just one execution of test case for every language without gathering results of multiple tests and comparing their average. Remember that precise numbers are not too important in this test because relative difference manifested very well.

During the test I compared speed, memory usage, and performance degradation as per grow of processed data. When application struggles with more data it affects processing speed which is important characteristic to understand.

Only core language functionality has been used for testing.

Originally I wanted to compare only mainstream cross platform interpreting languages - namely PHP, Perl5, Python, Ruby and Java (Sun's and OpenJDK). Then curiosity made me include C, C++, Javascript ("spidermonkey", Mozilla), Javascript ("V8", Webkit), tcl, Lua and Java GCJ.

Whilst it is interesting to compare languages to each other, Javascripts, tcl and Lua are falling outside of scope so I will not compare their features.
Technically C and C++ should not belong here because they are very different from interpreting languages by nature, however their results are important to match against.

All tests have been conducted on Intel Core2 Duo T7500@2.20Ghz CPU; 2 GB RAM; OS Debian GNU/Linux 2.6.32 i686
During tests there were always enough free memory to fully accommodate running test without swapping and no resource-hungry applications running. However more accurate results can be gathered if X server and most other processes will be stopped for the period of testing. Difference in running the same test with or without swap or with higher priority were negligibly small if any. During tests CPU power management was disabled so both CPUs (cores) were running at maximum speed.

Defaults has been used for all languages but PHP. By default PHP restrict maximum memory usage and maximum execution time. In order to complete test those parameters had to be changed in PHP runtime configuration.

Compilation time needed for C, C++ and Java wasn't counted in this testing.

This comparison consists of three parts:
Part 1: Speed.
Part 2: Memory usage.
Part 3: Language features.

October 2011 update: Python v3 added to comparison.

Speed

Execution speed is obviously important to understand the language. I would say that if you're not considering performance at all you simply don't care about your application. However performance alone is not the most important characteristic and therefore other aspects should be taken into consideration as well.

This table shows number of seconds taken to complete every testing stage.
Line size Kb Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 2 6 7 7 7 2 3 30 17 33 49 39 38 451
512 7 23 29 32 26 8 21 131 81 141 203 162 157 1783
768 16 54 75 78 60 19 51 300 201 324 480 381 371 3937
1024 27 96 141 144 107 34 91 535 373 583 886 711 696 6952
1280 43 153 225 232 167 53 144 842 598 921 1423 1161 1145 10744
1536 62 227 328 342 242 76 208 1220 877 1334 2090 1751 1739 15372
1792 84 318 452 476 329 104 283 1672 1211 1823 2886 2489 2478 20819
2048 109 424 597 634 431 136 370 2203 1598 2387 3856 3370 3358 27132
2304 139 549 758 815 546 173 469 2799 2039 3030 4963 4453 4448 34302
2560 171 691 941 1019 675 214 578 3463 2533 3753 6198 5710 5719 42330
2816 206 849 1143 1248 817 259 700 4198 3070 4553 7568 7146 7186 51118
3072 245 1022 1366 1497 972 309 834 4997 3659 5422 9084 8852 8983 60779
3328 288 1211 1607 1771 1142 363 979 5875 4300 6378 10759 10784 10916 71275
3584 334 1414 1869 2064 1324 423 1136 6825 4992 7409 12594 12696 12867 82619
3840 384 1634 2150 2381 1522 487 1304 7848 5729 8503 14564 14861 15053 94686
4096 437 1869 2455 2720 1731 555 1484 8928 6534 9680 16674 17262 17426 107887

 

This table has the same results in more human-readable format (h:m:s)
Line size Kib Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 0:00:02 0:00:06 0:00:07 0:00:07 0:00:07 0:00:02 0:00:03 0:00:30 0:00:17 0:00:33 0:00:49 0:00:39 0:00:38 0:07:31
512 0:00:07 0:00:23 0:00:29 0:00:32 0:00:26 0:00:08 0:00:21 0:02:11 0:01:21 0:02:21 0:03:23 0:02:42 0:02:37 0:29:43
768 0:00:16 0:00:54 0:01:15 0:01:18 0:01:00 0:00:19 0:00:51 0:05:00 0:03:21 0:05:24 0:08:00 0:06:21 0:06:11 1:05:37
1024 0:00:27 0:01:36 0:02:21 0:02:24 0:01:47 0:00:34 0:01:31 0:08:55 0:06:13 0:09:43 0:14:46 0:11:51 0:11:36 1:55:52
1280 0:00:43 0:02:33 0:03:45 0:03:52 0:02:47 0:00:53 0:02:24 0:14:02 0:09:58 0:15:21 0:23:43 0:19:21 0:19:05 2:59:04
1536 0:01:02 0:03:47 0:05:28 0:05:42 0:04:02 0:01:16 0:03:28 0:20:20 0:14:37 0:22:14 0:34:50 0:29:11 0:28:59 4:16:12
1792 0:01:24 0:05:18 0:07:32 0:07:56 0:05:29 0:01:44 0:04:43 0:27:52 0:20:11 0:30:23 0:48:06 0:41:29 0:41:18 5:46:59
2048 0:01:49 0:07:04 0:09:57 0:10:34 0:07:11 0:02:16 0:06:10 0:36:43 0:26:38 0:39:47 1:04:16 0:56:10 0:55:58 7:32:12
2304 0:02:19 0:09:09 0:12:38 0:13:35 0:09:06 0:02:53 0:07:49 0:46:39 0:33:59 0:50:30 1:22:43 1:14:13 1:14:08 9:31:42
2560 0:02:51 0:11:31 0:15:41 0:16:59 0:11:15 0:03:34 0:09:38 0:57:43 0:42:13 1:02:33 1:43:18 1:35:10 1:35:19 11:45:30
2816 0:03:26 0:14:09 0:19:03 0:20:48 0:13:37 0:04:19 0:11:40 1:09:58 0:51:10 1:15:53 2:06:08 1:59:06 1:59:46 14:11:58
3072 0:04:05 0:17:02 0:22:46 0:24:57 0:16:12 0:05:09 0:13:54 1:23:17 1:00:59 1:30:22 2:31:24 2:27:32 2:29:43 16:52:59
3328 0:04:48 0:20:11 0:26:47 0:29:31 0:19:02 0:06:03 0:16:19 1:37:55 1:11:40 1:46:18 2:59:19 2:59:44 3:01:56 19:47:55
3584 0:05:34 0:23:34 0:31:09 0:34:24 0:22:04 0:07:03 0:18:56 1:53:45 1:23:12 2:03:29 3:29:54 3:31:36 3:34:27 22:56:59
3840 0:06:24 0:27:14 0:35:50 0:39:41 0:25:22 0:08:07 0:21:44 2:10:48 1:35:29 2:21:43 4:02:44 4:07:41 4:10:53 26:18:06
4096 0:07:17 0:31:09 0:40:55 0:45:20 0:28:51 0:09:15 0:24:44 2:28:48 1:48:54 2:41:20 4:37:54 4:47:42 4:50:26 29:58:07

Speed graph
Speed (seconds)

Speed tests fall into 4 categories:
Slowest: Java gcj (native executable)
Slow: Java (openJDK); Java (Sun); Lua
Not-so-fast: tcl; Javascript (spidermonkey)
Fastest: Python; Ruby; PHP; C++; Javascript V8; C; Perl5

As you can see from performance graph, processing speed slows down as the test string grow. The more graph curves up the more performance degrades. Graph reveals that performance of Java and Lua degrades dramatically.
All tested languages are good with manipulation of little strings but as the processed data grow the difference manifests itself.
Slow group [Java, Lua] suffer from severe performance degradation.
There are almost no difference in performance between OpenJDK Java and Sun Java. Lua's performance is very close to Java.
Initially GCJ Java interpreter crashed during the test, however GCJ Java can compile Java code to executable file which completed the test even though awfully slow. Here and below unqualified "Java" means only mainstream Sun/OpenJDK Java.

Let's have a closer look at Fastest group:

Pyhon, Ruby and PHP are slightly slower than than C++. This is not a surprise because those languages are optimised well enough.
Javascript V8 completed test slightly faster than C++.
This group of languages shows average slow down while performance of C and Perl5 is almost a flat line on graph indicating very little degradation. It means that C and Perl5 process increasing amount of data at (almost) constant speed.

Unexpected result: somehow Perl5 managed to finish faster than C. This came as unforeseen surprise which I found difficult to explain. Probably Perl does less memory reallocations to accommodate string growth.
I didn't do serious coding in C since 1995 but implementation is quite simple and straightforward so test result stands.

Perl5 is a clear winner with just a little more than 7 minutes needed to finish test against Java with worst result as big as nearly 5 hours to do the same. (Worst result of GCJ Java - almost 30 hours, doesn't worth comparing against)
Perl5 is not only superior in performance but it shows very little slow down on larger data. This is as close to C (compiled to machine code) as it can be for scripting language. Absolutely amazing!
Interesting to note that with "use strict;" Perl completed the same test ~6 seconds quicker.

In the table below Perl5 has been taken as 1 and other language's performance measured in Perls so you can see how many times slower a particular language comparing to Perl5 in this test. Because of performance degradation it will be incorrect to say something like "This is twice faster than That". Some language's performance degrade faster than others so in beginning of this test Java somewhat 20 times slower than Perl5 and in the end Java is about 40 times slower (for same amount of data).
Clearly this is an important characteristic - size matters! This is correspond with observation of some Java applications which behave well under little load and degrade exponentially as the load increases.

Relative speed: Perl5 (fastest) taken as 1.
Line size Kib Perl5 PHP Ruby Python C++ (g++) C (gcc) Javascript (V8) Javascript (sm) Python3 tcl Lua Java (openJDK) Java (Sun) Java (gcj)
256 1 3.00 3.50 3.50 3.50 1.00 1.50 15.00 8.50 16.50 24.50 19.50 19.00 225.50
512 1 3.29 4.14 4.57 3.71 1.14 3.00 18.71 11.57 20.14 29.00 23.14 22.43 254.71
768 1 3.38 4.69 4.88 3.75 1.19 3.19 18.75 12.56 20.25 30.00 23.81 23.19 246.06
1024 1 3.56 5.22 5.33 3.96 1.26 3.37 19.81 13.81 21.59 32.81 26.33 25.78 257.48
1280 1 3.56 5.23 5.40 3.88 1.23 3.35 19.58 13.91 21.42 33.09 27.00 26.63 249.86
1536 1 3.66 5.29 5.52 3.90 1.23 3.35 19.68 14.15 21.52 33.71 28.24 28.05 247.94
1792 1 3.79 5.38 5.67 3.92 1.24 3.37 19.90 14.42 21.70 34.36 29.63 29.50 247.85
2048 1 3.89 5.48 5.82 3.95 1.25 3.39 20.21 14.66 21.90 35.38 30.92 30.81 248.92
2304 1 3.95 5.45 5.86 3.93 1.24 3.37 20.14 14.67 21.80 35.71 32.04 32.00 246.78
2560 1 4.04 5.50 5.96 3.95 1.25 3.38 20.25 14.81 21.95 36.25 33.39 33.44 247.54
2816 1 4.12 5.55 6.06 3.97 1.26 3.40 20.38 14.90 22.10 36.74 34.69 34.88 248.15
3072 1 4.17 5.58 6.11 3.97 1.26 3.40 20.40 14.93 22.13 37.08 36.13 36.67 248.08
3328 1 4.20 5.58 6.15 3.97 1.26 3.40 20.40 14.93 22.15 37.36 37.44 37.90 247.48
3584 1 4.23 5.60 6.18 3.96 1.27 3.40 20.43 14.95 22.18 37.71 38.01 38.52 247.36
3840 1 4.26 5.60 6.20 3.96 1.27 3.40 20.44 14.92 22.14 37.93 38.70 39.20 246.58
4096 1 4.28 5.62 6.22 3.96 1.27 3.40 20.43 14.95 22.15 38.16 39.50 39.88 246.88
Average: 1 3.84 5.21 5.59 3.89 1.23 3.23 19.66 13.92 21.35 34.36 31.16 31.12 247.32

Memory usage

During testing memory usage were captured as per every completed step.

Memory usage
Line size Kb C (gcc) C++ (G++) Perl5 Python Python3 Ruby Lua tcl PHP Javascript (sm) Javascript (V8) Java (gcj) Java (OpenJDK) Java (Sun)
0 1,668 2,932 4,776 5,352 10,328 11,040 2,416 1,236 36,752 7,720 39,272 49,156 72,4832 658,560
256 1,928 3,444 5,052 6,384 13,404 9,620 3,960 13,696 38,040 50,664 47,236 68,320 725,852 661,056
512 2,184 3,956 5,308 5,876 16,476 11,672 5,404 14,720 39,064 29,672 47,636 76,200 725,852 661,056
768 2,440 3,956 5,564 7,676 19,548 7,328 6,428 18,052 40,088 16,872 49,404 84,392 725,852 661,056
1024 2,696 4,980 5,820 6,388 14,420 12,704 7,820 14,716 41,112 53,224 46,540 92,584 725,852 661,056
1280 2,952 4,980 6,076 9,212 15,444 8,604 6,104 15,228 42,136 44,520 47,044 110,072 725,852 661,056
1536 3,208 4,980 6,332 6,900 16,468 11,164 10,572 18,816 43,160 21,480 50,124 118,264 725,852 662,080
1792 3,464 4,980 6,588 7,156 17,492 8,856 11,812 16,252 44,184 38,376 51,916 126,976 725,852 662,080
2048 3,720 7,028 6,844 11,516 18,516 13,724 10,908 16,764 45,208 51,176 47,540 126,976 725,852 662,080
2304 3,976 7,028 7,100 7,668 19,540 12,700 6,644 17,276 46,232 38,376 46,252 161,824 725,852 662,080
2560 4,232 7,028 7,356 7,924 20,564 11,160 15,592 22,912 41,876 41,960 44,452 161,824 725,852 662,080
2816 4,488 7,028 7,612 8,180 21,588 14,748 16,848 18,300 42,388 79,336 50,612 161,824 725,852 662,080
3072 4,744 7,028 7,868 8,436 22,612 15,772 15,716 18,812 49,304 73,704 51,636 161,824 725,852 662,080
3328 5,000 7,028 8,124 8,692 23,636 16,796 19,492 19,324 50,328 39,400 55,996 170,536 725,852 662,080
3584 5,256 7,028 8,380 12,536 24,660 17,820 17,072 19,840 43,924 27,624 46,500 170,536 725,852 662,080
3840 5,512 7,028 8,636 9,204 25,684 18,844 23,276 20,348 44,436 29,160 58,556 170,536 725,852 662,080
4096 5,768 11,124 8,892 9,460 26,708 15,768 20,200 20,860 44,948 96,232 59,836 170,536 725,852 662,080

Memory usage - there is no "mainstream" Java on graph because of constantly high usage.
Memory usage

Result fall into five categories:
Highest: Java OpenJDK, Java Sun
High:Java GCJ
Medium:Javascript V8, Javascript sm., PHP
Low:tcl, Lua, Ruby
Lowest: Python, Perl5, C++, C

Highest group - mainstream Java pre-allocates a fairly big chunk of memory (certain percentage) by default and does memory management inside this chunk. During this test memory usage hasn't change and was constantly high - so it is not present on graph: if included it makes all other results appear as flat lines well below.
To capture internal memory usage I introduced print statements to Java code to show internal memory usage as per string growth. (It doesn't affect performance) Unfortunately printed numbers has no correspondence with string growth. This shows that Java garbage collection works completely independent from application code. Output numbers appeared to be random, sometimes as high as up to 95% of pre-allocated memory. Even if internal memory usage did not correspond with the string size it seems that sometimes Java is using nearly all of its memory before garbage collection (GC) releases some of it.
Java memory management appears to be extremely ineffective which seems to be the primary cause for poor performance. I leave further investigation with specific Java-monitoring tools for those who might find it interesting. Java professionals may also try to improve results with fine tuning using miscellaneous GC parameters.

High group - Java GCJ compiled to native executable. Thanks to this special feature GCJ Java demonstrated predictable behaviour when memory allocation grows together with data processed. Comparing with other non-Java runtimes memory utilisation is huge.

Medium group: Javascript demonstrate more or less consistent grow in memory usage as per data growth. PHP shows very little grow but its heavy runtime uses a lot of memory from very beginning. Despite initial requirements PHP uses memory pretty wise. High memory usage upon startup is not necessarily bad thing: if meant for continuous execution it may be OK to pre-load common libraries. However this may be a limitation for PHP usage on VPS server i.e. when available memory is limited.

Let's have a closer look at Low and Lowest group:
Mamory usage magnified

Lua and tcl runtimes are tiny, but their memory management not very effective. Ruby used more memory than Python. Python utilises memory almost as good as Perl5 - perhaps their runtimes are almost the same size. Once again Perl5 performed amazingly well, demonstrating behaviour very similar to C - best among scripting languages. As expected C++ memory usage is roughly between C and Perl5.

As we did in speed test let's take Perl5 as 1 and see how other language's memory usage compares on every step and on average.

Memory usage in Perls + average
Line size Kb C (gcc) C++ (G++) Perl5 Python Python3 Ruby Lua tcl PHP Javascript (sm) Javascript (V8) Java (gcj) Java (OpenJDK) Java (Sun)
0 0.35 0.61 1 1.12 2.16 2.31 0.51 0.23 7.70 1.62 8.22 10.29 151.77 137.89
256 0.38 0.68 1 1.26 2.65 1.90 0.78 2.15 7.53 10.03 9.35 13.52 143.68 130.85
512 0.41 0.75 1 1.11 3.10 2.20 1.02 2.51 7.36 5.59 8.97 14.36 136.75 124.54
768 0.44 0.71 1 1.38 3.51 1.32 1.16 2.35 7.20 3.03 8.88 15.17 130.46 118.81
1024 0.46 0.86 1 1.10 2.48 2.18 1.34 2.30 7.06 9.15 8.00 15.91 124.72 113.58
1280 0.49 0.82 1 1.52 2.54 1.42 1.00 1.65 6.93 7.33 7.74 18.12 119.46 108.80
1536 0.51 0.79 1 1.09 2.60 1.76 1.67 2.73 6.82 3.39 7.92 18.68 114.63 104.56
1792 0.53 0.76 1 1.09 2.66 1.34 1.79 2.27 6.71 5.83 7.88 19.27 110.18 100.50
2048 0.54 1.03 1 1.68 2.71 2.01 1.59 1.46 6.61 7.48 6.95 18.55 106.06 96.74
2304 0.56 0.99 1 1.08 2.75 1.79 0.94 2.25 6.51 5.41 6.51 22.79 102.23 93.25
2560 0.58 0.96 1 1.08 2.80 1.52 2.12 2.89 5.69 5.70 6.04 22.00 98.67 90.01
2816 0.59 0.92 1 1.07 2.84 1.94 2.21 2.24 5.57 10.42 6.65 21.26 95.36 86.98
3072 0.60 0.89 1 1.07 2.87 2.00 2.00 2.23 6.27 9.37 6.56 20.57 92.25 84.15
3328 0.62 0.87 1 1.07 2.91 2.07 2.40 2.22 6.19 4.85 6.89 20.99 89.35 81.50
3584 0.63 0.84 1 1.50 2.94 2.13 2.04 1.58 5.24 3.30 5.55 20.35 86.62 79.01
3840 0.64 0.81 1 1.07 2.97 2.18 2.70 2.21 5.15 3.38 6.78 19.75 84.05 76.67
4096 0.65 1.25 1 1.06 3.00 1.77 2.27 2.21 5.05 10.82 6.73 19.18 81.63 74.46
Average: 0.53 0.85 1 1.20 2.79 1.87 1.62 2.09 6.45 6.28 7.39 18.28 109.87 100.13

Environment where applications work may have certain memory limits. It is true not only for popular Virtual Private Servers (VPS) where sometimes amount of RAM can be as little as 128 Mb for OS and all applications/services but also for embedded devices and heavily loaded servers.
Good understanding of memory utilisation is equally important for consideration as speed.

Read more after code section below.

Source codes and test results

C; Result: C gcc (Debian 4.4.4-1) 4.4.4

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(){

setbuf(stdout,NULL); //disable output buffering

char *str=malloc(8);
strcpy(str,"abcdefgh");

str=realloc(str,strlen(str)+8);
strcat(str,"efghefgh");     //sprintf(str,"%s%s",str,"efghefgh");

int imax=1024/strlen(str)*1024*4;

printf("%s","exec.tm.sec\tstr.length\n"); //fflush(stdout);

time_t starttime=time(NULL);
char *gstr=malloc(0);
int i=0;
char *pos;
int lngth;

char *pos_c=gstr;
int str_len=strlen(str);

    while(i++ < imax+1000){
        lngth=strlen(str)*i;
        gstr=realloc(gstr,lngth+str_len);
        strcat(gstr,str);    //sprintf(gstr,"%s%s",gstr,str);
        pos_c+=str_len;

        pos=gstr;
        while(pos=strstr(pos,"efgh")){
            memcpy(pos,"____",4);
        }

        if(lngth % (1024*256)==0){
            printf("%dsec\t\t%dkb\n",time(NULL)-starttime,lngth/1024); //fflush(stdout);
        }
    }
//printf("%s\n",gstr);

}

C++ (source); Result: C++ g++ (Debian 4.4.3-7) 4.4.3


#include <iostream>
#include <string>
#include <time.h>

using namespace std;

main ()
{
  string str = "abcdefgh";
    str += "efghefgh";
  int imax = 1024 /str.length() * 1024 *4;
  time_t currentTime = time(NULL);
  cout << "exec.tm.sec\tstr.length" << endl;

  string find= "efgh";
  string replace ="____";
  string gstr;
  int i=0;
  int length;
//  int end=0; //  size_t end=0;

  while(i++ < imax +1000){
    gstr += str;
    gstr = gstr;
    size_t start, sizeSearch=find.size(), end=0;

    while((start=gstr.find(find,end))!=string::npos){
        end=start+sizeSearch;
        gstr.replace(start,sizeSearch,replace);
    }
    length = str.length()*i;
    if((length%(1024 * 256))==0){
        cout << time(NULL) - currentTime << "sec\t\t" << length/1024 << "kb" <<  endl;
    }
  }
// cout << gstr << endl;

return 0;
}

Javascript (source); Results: Javascript (Spidermonkey - Mozilla) 1.8.0 pre-release 1 2007-10-03, Javascript (V8 - Chrome)

#!/usr/local/bin/js

var str = "abcdefgh"+"efghefgh";
var imax = 1024 / str.length * 1024 * 4;

var time = new Date();
print("exec.tm.sec\tstr.length");

var gstr = "";
var i=0;
var lngth;

while (i++ < imax+1000) {
    gstr += str;
    gstr = gstr.replace(/efgh/g, "____");
        lngth=str.length*i;
        if ((lngth % (1024*256)) == 0) {
                var curdate=new Date();
                print(parseInt(((curdate.getTime()-time.getTime())/1000))+"sec\t\t"+lngth/1024+"kb");
        }
}

Java (source); Results: Java (OpenJDK) "1.6.0_18", Java (Sun) "1.6.0_16", Java (gcj) (Debian 4.4.3-1) 4.4.3

public class java_test {

    public static final void main(String[] args) throws Exception {
        String str = "abcdefgh"+"efghefgh";
        int imax = 1024 / str.length() * 1024 * 4;

        long time = System.currentTimeMillis();
        System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used");
        Runtime runtime = Runtime.getRuntime();
        System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);

        String gstr = "";
        int i=0;
        int lngth;

        while (i++ < imax+1000) {
            gstr += str;
            gstr = gstr.replaceAll("efgh", "____");
            lngth=str.length()*i;
                if ((lngth % (1024*256)) == 0) {
                        System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);
                }
        }
    }
}

Perl5 (source); Result: This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi

#!/usr/bin/perl
$|=1;    #disable output buffering, this is necessary for proper output through pipe

my $str='abcdefgh'.'efghefgh';
my $imax=1024/length($str)*1024*4;               # 4mb

my $starttime=time();
print "exec.tm.sec\tstr.length\n";

my $gstr='';
my $i=0;

while($i++ < $imax+1000){   #adding 1000 iterations to delay exit. This will allow to capture memory usage on last step

        $gstr.=$str;
        $gstr=~s/efgh/____/g;
        my $lngth=length($str)*$i;   ##     my $lngth=length($gstr);        # Perhaps that would be a slower way
        print time()-$starttime,"sec\t\t",$lngth/1024,"kb\n" unless $lngth % (1024*256); #print out every 256kb
}

PHP (source); Result: PHP 5.3.1-5 with Suhosin-Patch (cgi-fcgi) (built: Feb 22 2010 17:38:41)

<?php


$str="abcdefgh"."efghefgh";
$imax=1024/strlen($str)*1024*4;      # 4mb

$starttime=time();
print("exec.tm.sec\tstr.length\n");

$gstr='';
$i=0;

while($i++ < $imax+1000){

        $gstr.=$str;
        $gstr=preg_replace('/efgh/','____',$gstr);
        $lngth=strlen($str)*$i;
        if($lngth % (1024*256)==0){
                print (time()-$starttime."sec\t\t".($lngth/1024)."kb\n");
        }
}

?>

Python (source); Result: Python 2.5.5

#!/usr/bin/python -u
import re
import time
import sys

str='abcdefgh'+'efghefgh'
imax=1024/len(str)*1024*4   # 4mb

starttime=time.time();
print "exec.tm.sec\tstr.length"
sys.stdout.flush()

gstr=''
i=0

while (i < imax+1000):
        i=i+1
        gstr+=str
        gstr=re.sub('efgh','____',gstr)
        lngth=len(str)*i
        if(lngth % (1024*256) == 0):
                print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb"
                sys.stdout.flush()

Python3 (source); Result: Python 3.1.3

#!/usr/bin/python3 -u
import re
import time
import sys

str='abcdefgh'+'efghefgh'
imax=1024/len(str)*1024*4   # 4mb

starttime=time.time();
print "exec.tm.sec\tstr.length"
sys.stdout.flush()

gstr=''
i=0

while (i < imax+1000):
        i=i+1
        gstr+=str
        gstr=re.sub('efgh','____',gstr)
        lngth=len(str)*i
        if(lngth % (1024*256) == 0):
                print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb"
                sys.stdout.flush()

Ruby (source); Result: ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]

#!/usr/bin/ruby
$stdout.sync=true;

str='abcdefgh'+'efghefgh';
imax=1024/str.length*1024*4;       # 4mb

starttime=Time.new;
print("exec.tm.sec\tstr.length\n");

gstr='';
i=0;

while i < imax+1000
        i=i+1;
        gstr+=str;
        gstr=gstr.gsub(/efgh/, "____")

        lngth=str.length*i;
        if(lngth % (1024*256)==0)
                print(((Time.new-starttime).ceil).to_s+"sec\t\t",(lngth/1024).to_s,"kb\n");
        end
end

#puts gstr;

Lua (source): Result: Lua 5.1.4

#!/usr/bin/lua

io.stdout:setvbuf "no";             --  io.flush();

str='abcdefgh'..'efghefgh';
imax=1024/string.len(str)*1024*4;         -- 4mb

starttime=os.time();
print "exec.tm.sec\tstr.length";

gstr='';
i=0;

while i < imax+1000 do
        i=i+1;
        gstr=gstr..str;
        gstr=string.gsub(gstr,"efgh","____");
        lngth=string.len(str)*i;
        if(math.mod(lngth,1024*256)==0) then
                print(os.time()-starttime.."sec\t\t"..(lngth/1024).."kb");
        end
end



tcl (source): Result: tcl 8.4.19

#!/usr/bin/tclsh

set str "abcdefgh"
append str "efghefgh"

set imax [expr {1024/[string length $str]*1024*4}]

set starttime [clock clicks -milliseconds]
puts "exec.tm.sec\tstr.length";

set gstr ""
set i 0

while {$i<[expr {$imax+1000}]} {
        incr i
        append gstr $str;
        regsub -all {efgh} $gstr ____ gstr
        set lngth [expr {[string length $str]*$i}]
        if {[expr {$lngth % (1024*256)}] == 0} {
                puts "[expr int([expr [clock clicks -milliseconds] - $starttime] / 1000)]sec\t\t[expr {$lngth/1024}]kb"
        }
}

exit

Files:

June 2011 Update: One bright Java developer felt like I'm bashing Java so he decided to optimise Java test. Initially I was sceptical about it because two other Java programmers failed to do so.
As you may already noted from source codes, for high level languages I use regular expression to substitute substring on each iteration.
However when I decided to include C and C++ to the test case regex was replaced with traditional "moving window" technique where searching for substring start from position calculated on previous step instead of scanning the whole growing string every time.
This approach has been chosen because regular expressions are not part of core functionality of C/C++ and also because for low level languages this seems to be a natural way to do substitution.
Unfortunately this affected comparison fairness. (Perhaps all tests should have been using indexed substitutions.)
The fact that C++ example use "moving window" substitution instead of regular expression allow to rewrite Java code like in the following example:

public class java_test_optm {

    public static final void main(String[] args) throws Exception {
        String str = "abcdefgh"+"efghefgh";
        int imax = 1024 / str.length() * 1024 * 4;

    long time = System.currentTimeMillis();
    System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used");
    Runtime runtime = Runtime.getRuntime();
    System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);

    final StringBuilder gstr = new StringBuilder();
    int i=0;
    int lngth;

        while (i++ < imax+1000) {
            gstr.append(str);

            int startIndx = gstr.indexOf("efgh");
            while(startIndx != -1){
                gstr.replace(startIndx, startIndx + 4, "____");
                startIndx = gstr.indexOf("efgh", startIndx + 4);
            }

        lngth=str.length()*i;
        if ((lngth % (1024*256)) == 0) {
            System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024);
        }
        }
    }
}

/*
exec.tm.sec     str.length      allocated memory:free memory:memory used
0               0               32320:32103:216
2sec            256kb           32320:29420:2899
9sec            512kb           32320:29033:3286
21sec           768kb           32320:28250:4069
38sec           1024kb          32320:26692:5627
59sec           1280kb          32320:23612:8707
85sec           1536kb          32320:22116:10203
116sec          1792kb          32320:23647:8672
153sec          2048kb          32320:22101:10218
194sec          2304kb          32000:14067:17932
240sec          2560kb          32000:12571:19428
292sec          2816kb          32192:14283:17908
348sec          3072kb          32192:12713:19478
410sec          3328kb          32064:14356:17707
477sec          3584kb          32064:12827:19236
549sec          3840kb          32128:14615:17512
626sec          4096kb          32128:13095:19032
*/

Surprisingly this took away stress from garbage collection allowing Java to finish the test in 626 seconds only. (Thanks Brian Bason!)
However IMHO this somehow proves that Java is ineffective and overcomplicated because with all the expertise and effort required to optimise Java test case, Perl code modified to use moving window substitution completed the test in less than 2 seconds - somewhat 300+ times faster than Java.
Once again to achieve reasonable performance Java require low level approach which is not only labour intensive but also can't compete with speed of other languages.

Language features

Sometimes comfort and speed of development may outweigh performance and memory usage. Or in other words, perhaps sometimes performance and memory usage may be sacrificed in favour of quicker/easier development. For example, it is understandable if higher level language is chosen over C in order to benefit from automatic memory management. In this section I'm going to briefly scratch the surface of comparing language features.
Whilst it's quite a philosophical statement, language features play an important role in development.
Let's see how easy can we parse an integer value from text string in popular languages. This task only looks straightforward. In fact there are plenty caveats.

In Java we could do something like

//Java
    int val;
    val = Integer.parseInt("10000000000");
 
But there are problems. The example above will not only fail to parse correct value, but actually crash the entire application because of unhandled exception. Sometimes gotchas like this may byte you when you do not expect it: In this Java example
//Java
    val = Integer.parseInt("-10");   //this will work
    val = Integer.parseInt("+10");   //but not this - silly!
 
parsing integer from "+10" crashing application. To emulate this behaviour in PHP or Perl we have to explicitly create point of failure:
$val=intval($str) or die("it didn't work");
In Java pretty much any call that does something can be a failure point unless enclosed within ugly try-catch statements. So to avoid crash we have to wrap 'dangerous operations like this:
//Java
 try {
        val = Integer.parseInt(str);
 } catch (NumberFormatException nx) {
        //it didn't work, do something about it here
 }
 
In fact try-catch is a fancy syntax for if-else. Similar operation in PHP will not crash, but we can wrap it with if-else to make sure number parsed successfully.
#PHP
 if($val=intval($str)){     # please note this has "zero case" caveat: in PHP and Perl 0 = 'false'
    print $val;             # so $val will not get 0 if input string is '0' (zero)
 }
 
Python and Ruby use similar to Java fatal behaviour. Is that good? Perhaps sometimes. However in many cases returning something is better than nothing. Application may not do exactly what's expected but it may be considered to be better than crash. Perhaps you want your application to keep running despite minor error instead of terminating. Maybe particular part of application is not too important to try-catch absolutely everything. I've seen many examples of this in web applications when seemingly innocent operation is in fact a fatal failure point leading to application crash. Several times I had to troubleshoot Java and Python web-apps made by different teams, in different companies, in different time but all of them used to crash on string transformations because of uncatched/unhandled exceptions when unexpected character came from database. Needless to say this was causing a great deal of frustration for users of those applications. You may argue that developers created those applications were incompetent. Could be. However development approach enforced by necessity of catching all possible exceptions is troublesome, difficult and slow. Obviously It clutters the code by generating 'noise' and implies a routine not strictly related to application's logic. I think forgiving nature of Perl better match Test Driven Development when developer is not distracted with try-catch and therefore can concentrate on making code better, create more tests, check input values etc.

ParseInt comparison
String (str) Java
Integer.parseInt(str)
or Integer.valueOf(str)
PHP
intval($str)
Python
int(str)
Ruby
str.to_i
Ruby
Integer(str)
Perl
int($str)
C++
istringstream buffer(str);
double val;
buffer >> val;
C++
istringstream buffer(str);
int val;
buffer >> val;
C++
double val=atoi(str)
C++
int val=atoi(str)
" 1111" exception OK OK OK OK OK OK OK OK OK
"10.0" exception OK exception OK exception OK OK OK OK OK
"10000000000" exception incorrect: 2147483647 OK OK OK OK OK (1e+10) incorrect: 134520252 incorrect: 2.14748e+09 incorrect: 2147483647
"2e+2" exception incorrect: 2 exception incorrect: 2 exception OK OK (200) incorrect: 2 incorrect: 2 incorrect: 2
"-10" OK OK OK OK OK OK OK OK OK OK
"+10" exception OK OK OK OK OK OK OK OK OK
"asdasd" exception 0 exception 0 exception 0 0 incorrect: 134520248 0 0
"0.0" exception incorrect: No value parsed exception OK exception OK OK OK OK OK
"00" OK incorrect: No value parsed OK OK OK OK OK OK OK OK
"2+3" exception 2 exception 2 exception 2 2 2 2 2


1 2e+2=2*102=200

Java has the most number of exceptions to handle - of course you may handle them as one but, as demonstrated in this example, a usable value can be parsed in most cases so if you want to do a good job you have to do it yourselves, for every case. Java is the only language which couldn't extract value from "+10".

Python is slightly smarter with recognising numbers in strings.

Ruby has two different methods to do the job - it is confusing which one is better.

PHP silently parses incorrect values.

Complexity and power of C++ vividly manifested in this example: you can choose from 4 different ways to parse a value from string but as soon you know which one of them is right, results are nearly perfect.
Since return value has to be a number, it returns 0 for non-numeric strings so it can be treated as exception to somehow determine if it was an error or an actual value.

Perl demonstrated perfect result. From the first look you may see that it's almost similar to C++: it returns 0 from non-numeric string. However with standard

use warnings;
a non-fatal warning will be issued: "Argument "asdasd" isn't numeric in int at ./tst.pl line 8." This warning can be converted to fatal with
use warnings FATAL=>'numeric';
Now we have an exception to catch like in the following example:
#!/usr/bin/perl
{ use warnings FATAL=>'numeric';
    my $str="asdasd";
    my $num=eval {int $str};
    if(defined $num){
        print "we got it - it's $num";
    }else{
        print "error: $@";
        # with "use English;" the line above could look like: print "error: $EVAL_ERROR";
    }
}
There are some important things to note:
  • Fatal exception is enabled by developer's decision
    • Only for particular problem;
    • Only for particular block, so exception scope is strictly defined
  • Only core language functionality used
  • It works perfectly, including "zero case" and "2+3"
  • It provides human-readable explanation of failure
  • It extracts all usable values
  • With minimal effort
So with just core Perl functionality it is possible to do the job a lot easier than with other languages. Not only this - the great flexibility of Perl is that you can use modules to introduce different styles of exception handling - you're not bound to the example above. With Try::Tiny (not the only module of such) you can use almost "traditional" Java's try-catch syntax in Perl:
#!/usr/bin/perl
use Try::Tiny;
use warnings FATAL=>'numeric';

    my $str="asdasd";
    my $num = try {
                    int $str;
              } catch {
                    die "error: $_";
              };
    print q{we got it - it's },$num;

Some links below might be interesting in order to compare languages' syntax:
Compare structure of Perl, Ruby, Python, Java and PHP
Wikipedia: Exception handling syntax

Notes (per language)

PHP

PHP is not a universal language. Perhaps it may be considered for web development only.
Another problem with PHP is administration needed to configure runtime for different applications. Some PHP applications have different expectations regarding notorious "Magic quotes" runtime parameter. Read more in Wikipedia: Magic quotes criticism.

Runtime is fast but not very compact. PHP has reputation of lightweight and fast language. While first happen to be false (PHP memory usage is quite big comparing with Python, Ruby and Perl5) it is a close second after Perl5 in Performance.
In some situations PHP functions cannot be trusted as demonstrated in "parsing integer from string" example.

Ruby

Ruby is universal but relatively young language. Its availability on different platforms is still limited and history of introducing backward incompatible changes makes development and maintenance unnecessary complicated. Performance and memory usage of Ruby and Python are close to each other. While Ruby is slightly faster, Python utilises memory better.

Python

Python is ripe and universal language. It stands strong enough during this test. However Python is interpreting white spaces and tabs.
This particular 'feature looks unnecessary and silly especially after so much being said about importance of separation presentation from logic. Presentation is logic in Python. Python enforces certain way of formatting code in the most rude way I can imagine.
Unless it makes your eyes bleed you may find peace in Python especially after Java.
Its "whitespace as constraint" could make reading/writing code harder. To my understanding the only explanation for such strange Python's feature is that you can literally see the code flow pretty much the way interpreter see it.
I doubt that good coding style can be effectively enforced - readable code formatting can be easily achieved with other languages through exercising best practice guidelines.
In a way Python use military dress code - all applications should wear the same uniform.
How this can make programming task easier? I believe the more freedom programming language gives you - the better.

"There is no programming language - no matter how structured - that will prevent programmers from making bad programs."
-- Larry Flon
Read more about Python's white spacing in The hard edges of Python.

Perl5

Perl5 demonstrated amazing performance and memory usage far beyond all other languages tested. It proved to be most optimised, ripe and stable language. While some people believe it to be the most advanced programming language in the world it is clearly a very good choice.

  • Perl proved to be an extremely effective, highly optimised language.
  • Perl has a massive library of reusable code.
  • Perl is mature: it's 23 years old; (Perl5 is 17 years old).
  • Perl is very portable.
  • Perl is elegant and flexible.
Unfortunately Perl is often misunderstood because of widespread myths misrepresenting language capabilities. Typically those myths are product of ignorance and/or lack of knowledge.
Some of those myths:
Myth: Perl is UNIX shell on steroids.
This is really an insult to Perl which is much more than this. In year 2010 Perl is a very mature and universal language with perhaps largest library of reusable code available. In Perl you can write GUI applications, web applications, systems daemons etc. It is possible to pack Perl's application, runtime and libraries to windows executable and distribute as single .EXE file. Perl's object oriented features and flexibility are far beyond perhaps any other language. Learn more about Modern Perl (presentation).
Myth: Perl is "write once - read never"
Perls often falsely accused of lack of readability. I confess - sometimes I have problems reading my own poor handwriting from notes I took weeks ago. However is has nothing to do with language I use. With certain discipline you can develop clear, understandable and maintainable code in any language. It's all a matter of learning good habits like commenting the code (especially if you're not the only developer) or choosing meaningful long names for variables etc. It comes with experience. You can't blame programming language for lack of clarity in your code just like you cant blame natural language for its inappropriate use. If your Perl code is not beautiful you're doing it wrong - there is another, nice way.
Most people that complain about syntax have none or very little experience in Perl -- YAPC::EU::2009 - How Opera Software uses Perl presentation.
There is a very good presentation Perl Myths 2009 where Tim Bunce is explaining some common Perl misunderstandings and revealing some of Perl's powers.

Perl is truly language of freedom. It gives amazing power and has features, non existing in other languages. Those powers can be used to create nice, tidy, clean and yet effective and concise code. Of course same powers can be used to write obfuscated code but, again, this is not a language problem because it is also possible with other languages. This is best explained the by creator of Perl himself (emphasis added):

Let me state my beliefs about this in the strongest possible way. The very fact that it's possible to write messy programs in Perl is also what makes it possible to write programs that are cleaner in Perl than they could ever be in a language that attempts to enforce cleanliness. The potential for greater good goes right along with the potential for greater evil. A little baby has little potential for good or evil, at least in the short term. A President of the United States has tremendous potential for both good and evil.
I do not believe it is wrong to aspire to greatness, if greatness is properly defined. Greatness does not imply goodness. The President is not intrisically "gooder" than a baby. He merely has more options for exercising creativity, for good or for ill.
True greatness is measured by how much freedom you give to others, not by how much you can coerce others to do what you want.
Larry Wall http://www.wall.org/~larry/pm.html
Reasons for using Perl summarised in Why Perl?

Java

Just like Perl, Java is a subject of numerous myths misrepresenting its real position.
Despite commercial popularity there are multiple problems with the language:

* Poor memory management (garbage collection):

IMHO Java suffers from a garbage collection problem. If you don't allocate objects and maybe use only static methods, Java can be quite fast. But when you start creating huge amounts of objects (like required when working with Java's String class) its memory use and performance are getting worse and worse.

In theory GCs should be at least as fast as manual memory management or reference counting (which Python uses). Instead of wasting time for memory management while the program is working, it defers the memory management until the program is idle or it runs out of memory. Unfortunately on today's systems, memory is extremely slow and CPU cycles are cheap, and this is why the GC theory does not work. The Java VM constantly trashes the cache because it does not re-use memory fast enough. Instead it takes new (usually uncached) memory for new objects und defers freeing the unused memory of old objects (that are in the cache). This is probably the worst thing that you can do to the cache. A good VM would try to re-use memory as soon as possible, to increase the chances that it is still in cache (like Python's refcounter). Java does the opposite.

To make things worse, the VM seems to lack any coordination with the kernel. When the system is running out of RAM and needs to swap, the logical action for the VM would be to start the garbage collector. It doesn't however, and instead it starts allocating the new memory, forcing the kernel to move the old (unused) memory into the swap space! And when the VM finally decides to start the GC it will go through all the unused memory that is now in the swap, causing it be reloaded and possibly moving more frequently used memory back in the swap, only to re-load it again later. How much worse can it get?

-- Java has a GC problem, posted 10 Feb 2003 at 16:55 UTC by tjansen

Historically Java was successful partially because developers found it attractive comparing to C due to "automatic" memory management. It's turned to be a Java's greatest weakness. In C memory should me managed by developer to the contrast to Java where memory usually managed by systems administrator. In numerous papers explaining sophisticated garbage collection you may find dozens(!) parameters for memory tuning. And trust me, because Java developers usually cannot predict application's behaviour under load the only reliable way to configure memory management for particular application is to test, change parameter(s) and test again and again. Sometimes it helps. But defaults often not good enough, and it's too easy to make a mistake. Despite configuring Java "automatic" memory usage, developers can do very little. Java applications are handicapped by default.

* Verbosity:

Consider the following HTTP POST example:

JavaPerl
import java.net.URL;
import java.net.HttpURLConnection;
import java.io.DataOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;

public class java_post {

public static void main (String args[]) throws Exception {
    System.out.println(
        executePost("http://www.smh.com.au/execute_search.html",
                    "text=fluoride")
    );
}

public static String executePost(String targetURL, String urlParameters){
    URL url;
    HttpURLConnection connection = null;
    try {
        //Create connection
        url = new URL(targetURL);
        connection = (HttpURLConnection)url.openConnection();
        connection.setRequestMethod("POST");
        connection.setRequestProperty("Content-Type",
                                      "application/x-www-form-urlencoded");
        connection.setRequestProperty("Content-Length", "" +
               Integer.toString(urlParameters.getBytes().length));
        connection.setRequestProperty("Content-Language", "en-US");
        connection.setUseCaches (false);
        connection.setDoInput(true);
        connection.setDoOutput(true);

        //Send request
        DataOutputStream wr = new DataOutputStream (
                  connection.getOutputStream ());
        wr.writeBytes (urlParameters);
        wr.flush ();
        wr.close ();

        //Get Response    
        InputStream is = connection.getInputStream();
        BufferedReader rd = new BufferedReader(new InputStreamReader(is));
        String line;
        StringBuffer response = new StringBuffer();
        while((line = rd.readLine()) != null) {
            response.append(line);
            response.append('\r');
        }
        rd.close();
        return response.toString();
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    } finally {
        if(connection != null) {
            connection.disconnect();
        }
    }
  }
}
#!/usr/bin/perl

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
my $res=$ua->post(  'http://www.smh.com.au/execute_search.html',
                    {
                         text=>'fluoride',
                    }
                 );

print $res->is_success ? $res->content : $res->status_line;
Ruby
#!/usr/bin/ruby

require "uri"
require "net/http"

x = Net::HTTP.post_form(URI.parse('http://www.smh.com.au/execute_search.html'),
                            {
                                'text' => 'fluoride',
                            }
                        )
puts x.body
Python
#!/usr/bin/python -u

import urllib, urllib2

data = urllib.urlencode({
                'text' : 'fluoride',
        })
req = urllib2.Request('http://www.smh.com.au/execute_search.html', data)
response = urllib2.urlopen(req)

print response.read()
PHP
<?php

$postdata = http_build_query(
    array(
        'text' => 'fluoride',
    )
);

$opts = array('http' =>
    array(
        'method'  => 'POST',
        'header'  => 'Content-type: application/x-www-form-urlencoded',
        'content' => $postdata
    )
);

$context  = stream_context_create($opts);
print file_get_contents('http://www.smh.com.au/execute_search.html', false, $context);

?>

June 2011 Update: Greg McLaghlan made a good point:

I think the Java verbosity example is a little misleading. If we compare it to the Perl example, you are loading Perl module which handles the http post whereas in the Java example you actually code that. It could be argued that that part of the code could have been packaged up and loaded just like the Perl module. It's a minor point I guess.
Yes this is true, but first I chosen the job to do and then it turned out that standard Java distribution does not come with HTTP Post methods by default.
Other languages have instruments to help with similar task in their standard distribution.
I believe it would be incorrect to involve 3rd party libraries to comparison, however here you may find a Java example of HTTP Post using Apache libraries. It is 33 lines long (no empty lines) - about 40% shorter than original Java example but nowhere near as compact as other languages: 2nd longest HTTP Post code is PHP - only 14 lines.

That's how one person expressed his frustration of Java verbosity in his blog:

Whenever I write code in Java I feel like I'm filling out endless forms in triplicate.
"Ok, sir, I'll just need your type signature here, here, and ... here. Now will this be everything, or..."
"Well, I might need to raise an exception."
The compiler purses its lips."An exception? Hmmm... let's see.... Yes, I think we can do that... I have the form over here... Yes, here it is. Now I need you to list all the exceptions you expect to raise here. Oh, wait, you have other classes? We'll have to file an amendment to them. Just put the type signature here, here, ... yes, copy that list of exceptions....
And one of the comments from above blog's discussion (there are some other comments worth reading):
I think the problem with Java is not it's verbosity, but as someone else said "infrastructure framework". I have to go through so many classes, through so much leaps and bounds, to do anything.
I need a factory, to create a manager. Then I factor anther factory, to create a stream, then assign that stream to the manager. Afterwords, I give the manager to a dispatcher.
Then there is an uncaught exception and I have to sift through 50 lines of junk to actually find out what went wrong.
Verbosity is bad because code is read more times than its written therefore verbosity increases effort needed to maintain code.
Java verbosity hurts both maintaining and development.

Usually Java developers claim Java code is easier to develop/maintain. I failed to discover any particular Java language feature to support that claim. Java's makes developers to do a lot of work even for simplest tasks.
Java's bad performance and memory usage are not compensated by any particular language feature(s).
It is far behind other languages in both performance and memory usage/management.
Time needed to tweak and test memory management together with maintenance and troubleshooting efforts are horrifying.

Personal experience:

Results of this testing are consistent with my personal experience.
Over the years I was involved in several projects where all Java applications demonstrated miserable performance while having tremendous system requirements.

Once on public-facing web site I found problematic ~1000+ lines long Java servlet. Incapable of fixing it I couldn't think of better solution than to rewrite it from scratch in different language.
In several days I produced ~200 lines Perl application, running up to 10 times faster than original Java application. Numerous bugs were fixed in process, and new version was easier to debug and had some improvements and new features.

I can't recall a single Java application server which doesn't degrade. Apparently they all leak memory so sooner or later they should be restarted. (I'd like to believe there are exceptions somewhere).
As a matter of fact restarting Java application servers is common practice in the industry, however it appears that only Java really needs it. It seems unnecessary for stable software like Apache web server which can run for years without restart. I rarery let busy Java application run longer than a week while web-facing Java application servers restarted nightly.

Another example vividly demonstrates problems with Java's memory management: once I found that particular web-facing Java application could handle no more than 24 simultaneous requests. (you may suspect it was running on old/virtualized server but it was really a relatively up to date machine, 8 x Intel(R) Xeon(R) CPU L5420 @ 2.50GHz/RAM 6 GiB/CentOS 5.5 GNU/Linux system) After days of tweaking and testing we found that capacity can be increased (doubled) by allocating more memory but this negatively affected response time. Too little memory is not enough; too much and garbage collection is choking.
Ridiculous solution was found: to farm Java application servers on the very same hardware, to give each just enough memory and to restrict maximum simultaneous connections per backend on load balancer. Needless to mention this "solution" cost great deal of effort - to set up, test, tweak memory parameters, test again etc.
Later developers managed to optimise application a little but two or more Java application servers per physical server are still working better than one.
Because of history of degradation each Java application server in a farm runs no longer than 24 hours - they all restarted overnight in round-robin manner. (Believe me it's much better than wake up at 3:00 just to do monkey's job restarting another Java application server which stopped responding.) That much effort needed only to ensure system's normal functioning.
Remarkably this service hosted on 8 HP Proliant G6 servers with two quad-core Intel(R) Xeon(R) CPUs - 64 CPUs (cores) total, and 72 GiB of RAM. With database size only 1GB the whole system can merely respond to ~180 simultaneous HTTP requests (lesst than 3 visitors per CPU and 2.5 GiB RAM per connection) - a tremendous waste of resources.
I remember several cases when new Java application release introduce negative change to backend capacity (surprisingly release/QA team wasn't aware) so during peak hours servers were collapsing unable to sustain load because load balancer was configured to allow more connections to backends than they could handle. Sometimes allowing just 20 less requests make a difference.
Another interesting problem was discovered when about 2500 MiB were allocated to JVM on x86 platform: Resin (Java application server) was crashing under load, sometimes every hour if enough load was provided. Apparently that was because of lack of addressable space (memory), not for application which got pre-allocated 2500 MiB, but for Java runtime itself which on some occasions tried to allocate memory for internal needs and failed.

Java - summary

As you may see from this research, in all three categories Java behave extremely bad, like no other language.
Java applications cannot match a fraction of other language's performance.
Java applications are truly the most expensive in development and administration.
Java needs more system resources i.e. more memory and more processing power. Usually more servers and therefore more electricity needed i.e. Java is not environment-friendly.
Fragile Java application servers need to be periodically restarted.
Unnecessary sophistication creates more points of failure so Java web application's availability is usually not somewhat impressive.

To make high-quality Java code and to run it in well-optimised environment requires tremendous effort and experience. Even then performance and capacity will be a fracture of similar system implemented in different language. By simply using different programming language same result can be achieved with less effort in development, debugging, maintenance and administration. Fortunately there is a good choice of mature languages to use - Nowadays in 2011 there is nothing you can do in Java that cannot be done in other languages.
No matter which other *mainstream* language will you choose - your applications and experience will benefit from switching.
Even if your Java skills are profoundly good, your only excuse to use Java is personal convenience. Lack of experience with other languages should be motivation to learn rather than excuse for using Java. Everyone will benefit from better applications written in other language(s).
Java is disaster. A disease. Rooted deeply to industry it is hard to escape it while ignorant architects keep pushing it. Java is a trap for system architects and managers who know no other languages. Typically they do not understand Java weakness and tend to overuse it because that's "the only tool" for the job. Those people should learn. Blind beliefs that Java is universal and good for any job simply can't be more wrong. Java not suitable for *anything*.
When starting new project you hardly can seriously consider writing it in Lua or tcl. However those languages beat Java in speed/RAM usage. Saying that Java is equally suitable for a job than tcl/Lua would be a compliment to Java. Gap between Java and other languages is so huge, so it would be a good idea to avoid Java whenever possible,
disregarding of how familiar with language you are.

More information about Java problems and weaknesses can be found in excellent Sean Kelly's videos:
Recovery from Addiction
Better Web App Development

Java Quotes:

"If Java had true garbage collection, most programs would delete themselves upon execution."
-- Robert Sewell
"Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test, it introduces security challenges, and it causes end-user and administrator frustration."
-- Ray Ozzie
Java is the SUV of programming tools. A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl. ... But the programmers and managers using Java will feel good about themselves because they are using a tool that, in theory, has a lot of power for handling problems of tremendous complexity. Just like the suburbanite who drives his SUV to the 7-11 on a paved road but feels good because in theory he could climb a 45-degree dirt slope.
-- Greenspun, Philip
Java: write once, run away!
-- Cinap Lenrek
Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.
-- Steve Yegge (2007, Codes Worst Enemy)
JAVA truly is the great equalizing software. It has reduced all computers to mediocrity and buggyness.
-- NASA's J-Track web site
Using Java for serious jobs is like trying to take the skin off a rice pudding wearing boxing gloves.
-- Tel Hudson

Conclusion

To take the right tool for a job it is important to understand position of programming Languages to each other. Tricky decision is easier to make if you consider right things while avoiding irrelevant ones.

There are some things irrelevant to good decision:

Your favourite language at the moment.
You may be very good and comfortable with language you already know, but this is not good enough excuse for not considering alternatives. Learning is important.
Language creator(s) personality.
It simply doesn't matter if you like them or not or even who they are.
Your expectations regarding language features.
It is always takes time to get used to new things especially if they are quite different.
Speed of learning.
Some languages have short *startup* learning curve. However in reality it is more like a "A minute to learn, a lifetime to master". This idea best explained by Peter Norvig in his Teach Yourself Programming in Ten Years essay.

There are some things to avoid:

Considering one single language feature alone.
Considering only one language feature, like speed or memory usage, will inevitably lead to wrong decision.
Narrow purpose languages.
Specialised languages like PHP may be good for web development only. When you need to do something different or simply extend the task's scope, a language for particular use only may not be good enough.
Non-portable languages.
Cross-platform portability matters. Too many people locked-in, stuck with windows-only technologies with only little hope of escaping.
Non-free license.
Non-free licenses comes with risks and restrictions.

There are some valuable things to consider:

Availability of reusable code
Even the best language in the world worth little without good free libraries.
Free license.
Freedom is very important, even if you don't fully understand why.
Universal languages.
Universal languages like Perl5 are generally good for pretty much any task. Universal languages are more powerful by definition which makes your skills universal.
Well-portable languages
Some time later software may be ported to different platform or operating system. Portability guarantees choice. Choice is good.
"Feels good" feeling
Essentially your feelings towards language is an ultimate merit of its goodness for you. For example, not all people can be comfortable with Python, but if you're OK with it you can tell from how comfortable it feels. Coding is fun if you like the language. Fun helps to make better programs.

FAQ.

You deliberately make this test tough for Java! Java not optimised for strings.
The key words here are "not optimised". (Apparently it was tough only for Java.) OK, if Java not optimised for strings, please let me know what exactly Java is optimised for.
You deliberately chosen string manipulation to show Java weakness.
Not quite... As I explained in the beginning, I believe strings are good test subject for comparison. I did expect Java wouldn't be the winner, but I certainly couldn't expect that miserable performance. Initially there was no Java in this testing - it has been added later.
That's no surprise Java is slow.
Even if you already knew it's slow, did you know about performance degradation and garbage collection problems? Did you know HOW slow it is? Honestly?
Java is so slow because strings are immutable in Java.
Immutable strings are not unique to Java. For example, strings are also immutable in Python. Python performed very well in this testing.
Java's internal string representation in memory is UTF16 so Java has to do more work comparing to single-byte representation.
This may be the case for other languages as well. However this does not explain why Java performance so much worse. If that affects Java test results - it may be one of those differences I'm trying to emphasise. Please note that in this test only Latin characters were used. Other languages support unicode as well. Test case based on defaults so no encoding has been explicitly chosen, neither UTF support explicitly disabled or enabled.
What's wrong with Java?
Well, everything. :( Read the gory details above. In short Java's biggest problems are inefficient Garbage Collection and verbosity. Unfortunately those problems are not compensated by any language features. Java's Language features looks poor comparing to other languages. Java development and maintenance require a great deal of effort.
You shouldn't write a real code like this.
True, but that's test code, remember? It's made slow deliberately, to produce computational load for comparison. Job can be done hundred times faster if optimised. Pretty much any artificial test would be quite different from reality. However even if test code doesn't look like real application, it clearly reflects problems that are manifested in real applications.
Java works for some companies.
We may disagree on definition of "works". Sometimes definition is quite loose - once I've been told that for production web site 2% of request timeouts is acceptable for business. (Yes, it was a web-facing Java application, of course.) I believe any number of timeouts for public facing web site is intolerable. If Java not expected to perform well we may have a double standards problem. If you look at companies who successfully maintain sophisticated Java services you may find that most of them are big companies who have virtually unlimited resources. If you can have as many servers as you want, as much staff as you want and as much time as you want - you can make everything work, but at what cost? Big companies may have luxury of being inefficient. Java may work for you if your survival doesn't depend on your effectiveness.
Why you devote so much attention to Java and Perl and so little to Python and Ruby?
I'm working in environment where Java is dominating. At the same time both Java and Perl are the most misunderstood languages around. In the minds of many developers, managers and system architects Java stands inadequately high while Perl is usually treated badly. Because in general industry so predetermined I believe it is necessary to do some explanations. There are not as many myths regarding Python and Ruby and their features are not so controversial. Perhaps if I were more competent with Python and Ruby I would have more to add.
Java is good, I know how to make a great applications with Java.
Great, you must be very talented, because for ordinary developer a great effort and experience is needed to overcome numerous problems of developing in Java. If you have to be a genius to create good and reliable Java applications, it's simply too difficult to mere mortals i.e. for most developers. (Author of this article consider Java too difficult for himself). Unfortunately Java problems, like garbage collection, exist even for well-written programs. Despite problems, comparing to some other languages there is considerably greater effort required for Java to achieve the same result. You may have better productivity with different language.
My Java application works well.
Probably it barely does anything or is not loaded enough to show performance degradation. That's a typical case when no more than few people using application at the same time or when application is extremely simple.
Should we choose Java for our new project?
By all means if
  • you want to sabotage project
  • you want it to be as expensive as possible
  • speed of development doesn't matter
  • product quality doesn't matter
  • developers refuse to learn
  • you didn't read/understand this article.
Seriously, there are simply no reason and no excuse for choosing Java for new project.
What about .NET ?
.NET (dot net) not so portable so it doesn't satisfy criteria for choosing languages. Because it has so much to do with Windows and Microsoft I see no reason for considering dot Net disregarding of its features or performance. Quoting Oktal: "I think Microsoft named .Net so it wouldn't show up in a Unix directory listing." Dot Net's license is not free which raises an ethical issue as well. There are no reasons to work with non-free language whatsoever. As a matter of fact proprietary nature is a strong argument against dot NET.
You've just started another flame war.
No I've not. Results of testing speak for themselves, even without examples from my personal experience. I have no agenda to soften embarrassing Java's performance to make Java users feel not so bad. If your favourite language wasn't the best in this testing perhaps you may benefit from learning something else and this article aims to encourage such learning. Learning, if done right, leads to better decisions. We need better decisions because industry will benefit from it. Sadly too many people who have been taught Java in Uni know too little about other languages to make good decisions. From my experience I know that Java professionals sometimes take results of this testing personally. It is good, because it is natural to feel outrage knowing how poor their programming language comparing to others. It is good because this outrage may encourage learning which eventually help to create better applications.

Credits

I'm indebted to patient colleagues of mine who kindly provided important feedback and criticism for this research.
I'm grateful to my family - numerous times they had to go out without me when they couldn't separate me from computer;
I'm obliged to my manager who tolerated discussions related to this research and somehow partially inspired it;
At last I'm thankful to Cityrail for providing reasonable comfort which makes possible to work on trains during traveling to/from city.

If you found this essay interesting please donate below to support the author.

 



Comments are moved to

http://raid6.com.au/~onlyjob/posts/arena/#comments

Please update your links.




Followers

[FSF Associate Member]