GC content continued: Python vs D vs C++ vs Free Pascal

My previous blog post, comparing a simple character counting algorithm between python and D, created an interesting discussion on Google+, with many tuning tips for the D version, and finally we also had a C++ version (kindly provided by Harald Achitz).

Then I could not resist to try out something I've been thinking about for a while (comparing D with Free Pascal (FPC)), so I went away and jotted down an FPC version as well (my first FPC code ever, I think).

Anyway, I now just wanted to summarize a comparison between all of these! I have added below the python code, the faster of the two D versions in my previous post, as well as the new additions: Harald's C++ version and my FPC version.

Performance results

PythonD (DMD)D (GDC)D (LDC)C++FPC
7.904s0.652s0.668s0.538s0.399s0.629s

Numbers are execution time in seconds, for counting the fraction of "G" and "C" characters compared to "A", "T", "C" and "T" in a 58 MB text file of nearly 1 million lines of text. And so, why not plot it as well:

My comments

Not surprising maybe that C++ is the fastest. I find it interesting though that both D and FPC perform in the same ballpark at least, with code written by a novice user (although with some minor tweaking).

For me personally, I also got a first answer on the question I've been wondering about for a while: Will D or FreePascal be faster, given that Pascal is an older and more mature language, and doesn't use a Garbage collector, while D on the other hand have better compiler support, which it can benefit from (like LLVM).

Since I've been betting the most on D lately, I find it interesting to see that still, D can perform better, at least with the right compiler. Still I'm impressed that FPC is totally on par with D, and beaten only when D is compiled with the LLVM backed LDC.

On another note though, I tend to like the compactness of the D code. It's not really that far from the python version, if omitting the closing curly brackets, no? Not bad for a statically compiled high performance language! :) (That is, while FreePascal has it's merits from using mostly natural language words, over quirky symbols, which IMO can enhance readability a bit).

The code

Python version

import re
import string
 
def main():
    file = open("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa","r")
    gcCount = 0
    totalBaseCount = 0
    for line in file:
        line = line.strip("\n")
        if not line.startswith(">"):
            gcCount += len(re.findall("[GC]", line))
            totalBaseCount += len(re.findall("[GCTA]", line))
    gcFraction = float(gcCount) / totalBaseCount
    print( gcFraction * 100 )
 
 
if __name__ == '__main__':
    main()

D version

import std.stdio;
import std.string;
import std.algorithm;
import std.regex;
 
void main() {
    File file = File("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa","r");
    int gcCount = 0;
    int totalBaseCount = 0;
    string line;
    char c;
    while (!file.eof()) {
        line = file.readln();
        if (line.length > 0 && !startsWith(line, ">")) {
            for(uint i = 0; i < line.length; i++) {
                c = line[i];
                if (c == 'C' || c == 'T' || c == 'G' || c == 'A') {
                    totalBaseCount++;
                    if ( c == 'G' || c == 'C' ) {
                        gcCount++;
                    }
                }   
            }
        }
    }
    float gcFraction = ( cast(float)gcCount / cast(float)totalBaseCount );
    writeln( gcFraction * 100 );
}

C++ version

(Kindly shared by Harald Achitz!)

#include <string>
#include <fstream>
#include <iostream>
 
int main() {
 
    std::fstream fs("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa", std::ios::in);
 
    int gcCount = 0;
    int totalBaseCount = 0;
 
    if ( fs.is_open() )  {
        std::string line;
        while (std::getline(fs, line)) {
            if(*(line.begin()) != '>') {
                for(std::string::const_iterator pos = line.begin(); pos!=line.end() ; ++pos){
                     switch (*pos){
                       case 'A' :
                         totalBaseCount++;
                         break;
 
                       case 'C' :
                         gcCount++;
                         totalBaseCount++;
                         break;
 
                       case 'G' :
                         gcCount++;
                         totalBaseCount++;
                         break;
 
                       case  'T' :
                         totalBaseCount++;
                         break;
 
                       default:
                         break;
                     }
                }
          }
    }
    //std::cout << "gcCount: " << gcCount << " / totalBaseCount: "<< totalBaseCount << " = " ;
    std::cout << ( (float)gcCount / (float)totalBaseCount ) * 100 << std::endl ;
  } else {
    std::cout << "can't open file" ;
  }
  return 0 ;
}

Free Pascal (FPC)

program gc;
{$mode objfpc} // Do not forget this ever
 
uses
    Sysutils;
 
var
    FastaFile: TextFile;
    CurrentLine: String;
    GCCount: Integer;
    TotalBaseCount: Integer;
    i: Integer;
    c: Char;
    GCFraction: Single;
 
begin
    GCCount := 0;
    TotalBaseCount := 0;
 
    AssignFile(FastaFile, 'Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa'); 
    {$I+} //use exceptions
    try  
        Reset(FastaFile);
        repeat
            Readln(FastaFile, CurrentLine); 
            i := 0;
            while i < Length(CurrentLine) do
            begin
                c := CurrentLine[i];
                if (c = 'A') or (c = 'G') or (c = 'C') or (c = 'T')  then
                begin
                    TotalBaseCount := TotalBaseCount + 1;
                    if (c = 'G') or (c = 'C') then
                        GCCount := GCCount + 1;
                end;
                i := i + 1;
            end;
        until(EOF(FastaFile)); 
        CloseFile(FastaFile);
    except
        on E: EInOutError do
        begin
            Writeln('File handling error occurred. Details: '+E.ClassName+'/'+E.Message);
        end;    
    end;
    GCFraction := GCCount / TotalBaseCount;
    Writeln(FormatFloat('00.0000', GCFraction * 100));
end.

Compile flags

I used the following compile commands / flags for the test above (from the Makefile):

d:
        dmd -ofgc_d_dmd -O -release -inline gc.d
        gdmd -ofgc_d_gdc -O -release -inline gc.d
        ldmd2 -ofgc_d_ldc -O -release -inline gc.d
cpp:
        g++ -ogc_cpp -O3 gc.cpp
 
fpc:
        fpc -ogc_fpc -Ur -O3 -TLINUX gc.pas

Let me know if there is other flags that should be used, for some of the above compilers!

Comments

Optimize FPC version

I believe you don't need try-except for fpc (thus you can remove {$mode objfpc} and change integer to longint), and the code can be optimized away:

Reset(FastaFile);
while not EOF(FastaFile) do begin
    Readln(FastaFile, CurrentLine); 
    i := 0;
    while i < Length(CurrentLine) do
    begin
        c := CurrentLine[i];
        if c in ['A','G','C','T']  then
        begin
            TotalBaseCount := TotalBaseCount + 1;
            if c in ['G','C'] then
                GCCount := GCCount + 1;
        end;
        i := i + 1;
    end;
end; 
CloseFile(FastaFile);

You might get additional speed by replacing CurrentLine indexing with pointer dereference, due to fpc is currently missing array indexing optimization feature. So the final code would be:
program gc;
 
uses
    Sysutils;
 
var
    FastaFile: TextFile;
    CurrentLine: String;
    GCCount: LongInt;
    TotalBaseCount: LongInt;
    c: Char;
    GCFraction: Single;
    PC,PCEnd: PChar;
 
begin
    GCCount := 0;
    TotalBaseCount := 0;
 
    Assign(FastaFile, 'Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa'); 
    Reset(FastaFile);
    while not EOF(FastaFile) do begin
        Readln(FastaFile, CurrentLine); 
        PC := @CurrentLine[1];
        PCEnd := @CurrentLine[Length(CurrentLine)];
        while PC <= PCEnd do
        begin
            c := PC^;
            if c in ['A','G','C','T']  then
            begin
                Inc(TotalBaseCount);
                if c in ['G','C'] then
                    Inc(GCCount);
            end;
            Inc(PC);
        end;
    end;
    Close(FastaFile);
    GCFraction := GCCount / TotalBaseCount;
    Writeln(FormatFloat('00.0000', GCFraction * 100));
end.

I get a little speedup with that on my machine. Please test on yours.

Cool!

Cool, thanks a million! :) Do you mind if I include it in the post, with credit, and a link? (You suggest one!)

fpc with improved file buffer

Reading the file with a size-adjusted buffer, working with pointers and skipping the 'N's gives a lot a speed.

program gc;
 
const
  BufferSize = 1024*256;
 
type
  TBuffer = array[1..BufferSize] of byte;
 
var
  FastaFile: file of TBuffer;
  GCCount: longint;
  ATCount: longint;
  Buf: TBuffer;
  i, Number: longint;
  c: Pchar;
 
begin
  GCCount := 0;
  ATCount := 0;
 
  assign(FastaFile, 'Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa'); 
  reset(FastaFile, 1);
  BlockRead(FastaFile, Buf, BufferSize, Number);
  while Number > 0 do
  begin
    c :=  @Buf[1];
    for i := 1 to Number do
    begin
      if c^ <> 'N' then
        if      c^ in ['A', 'T'] then 
          inc(ATCount)
        else if c^ in ['G', 'C'] then
          inc(GCCount);
      inc(c);
    end;
    BlockRead(FastaFile, Buf, BufferSize, Number);
  end;
  close(FastaFile);
  writeln(GCCount / (ATCount+GCCount) * 100 :7:4);
end.

Some Pascal dialects allow even shorter code

DWScript version, performance here is slightly faster than your Python version

var data := LoadStringFromFile("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa")
            .Split(#$A);
var gcCount, totalBaseCount : Integer;
 
for var line in data do
   if not line.StartsWith('>') then
      for var c in line do
         if c in 'GCTA' then begin
            Inc(totalBaseCount);
            if c in 'GC' then
               Inc(gcCount);
         end;
 
var gcFraction := gcCount/totalBaseCount;
PrintLn(gcFraction);

And you can even go a wee bit faster and shorter in the inner loop at the expense of readability with

gcCount += Ord(c in 'GC');

general comments and speed improvements, mainly for fpc.

I think that the program spends about 40% of its time in reading the file. As a quick test, i took out all the analysis in the fpc program and the execution time dropped to 40%. I did not test it, but i expect that reading the file in one or only a few large chunks will increase the speed quite a lot.

Regarding the Pascal example replacing the while loop with a for loop, using the builtin function inc and switching to set logic gives a small improvement in speed for the fpc example. The analysis loop then becomes:

  for i := 0 to length(CurrentLine) - 1 do
  begin
    c := CurrentLine[i];
    if c in ['A', 'G', 'C', 'T']  then
    begin
      inc(TotalBaseCount);
      if c in ['G', 'C'] then
        inc(GCCount);
    end;
  end;

Some additional improvement can be achieved with Whole Program Optimization, which needs two compiler runs:

fpc -Ur -O3 -Xs- -OWall -FWgc -XX -CX gc.pas
fpc -Ur -O3 -Xs- -Owall -Fwgc -XX -CX gc.pas

Altogether, execution time dropped by 15%.

I think changing the logic to this version, also gives another small speed increase:

if c in ['A', 'T'] then
    inc(TotalBaseCount)
else if c in ['G', 'C'] then
begin
    inc(TotalBaseCount);
    inc(GCCount);
end;

Many thanks Mischi! The

Many thanks Mischi! The compiler flags and double compilation indeed shaved off some milliseconds! Would be interesting with some short explanation of what it does! :) (Didn't find much in the man pages for fpc, for some of them?)

I didn't manage to get the code faster with the code changes though ... don't know if I do something stupid.

Whole Program Optimization

This is the link to the respective wiki page: http://wiki.freepascal.org/Whole_Program_Optimization

fpc -i shows which wpo is possible for the respective processor. I have not much more clue about it.

Mischi

Ah, thanks! There's something

Ah, thanks! There's something to study ... and more things to try, if one gets the time :)

Re: reading the file

The speed of reading the file can be improved quite a bit with tuning the file buffer.

adding

Buf : Array[0..4*1024] of byte;

to the var section and the additional Settextbuf statement after the assignfile yields quite a gain:

Assign(FastaFile, 'Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa');
Settextbuf(FastaFile, Buf, sizeof(Buf));
Reset(FastaFile);

Hi, I'm not a fan of either

Hi,

I'm not a fan of either Python or D, but your Python code could be speed up a bit I think:
 

  • It does not seem necessary to strip the newline character at the end of the line. This is just an unnecessary string copy.
  • You first count the frequency of [GC] and then the frequency of [GCTA]. You should instead sum up the frequencies of [GC] and [TA] separately. After the loop you can add the [GC] frequency to the [TA] frequency.

I have not tested it on my own. I'd be very curious to see the effect of my suggestions if you'd like to test and post.

In case you're interested in High Performance Python:

http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-f...

http://ianozsvald.com/2012/03/18/high-performance-python-1-from-pycon-20...

Thanks! Counting them

Thanks! Counting them separtely definitely saved a bit. Upcoming update post in the works! Stay tuned!

Separate counting

I also tried this in the pascal example, but it did not really gain something on a first try. I think that the integer increases might be done in parallel to something else with no extra cost. The time is basially determined by the comparisons. I could shave off quite a bit by starting with a test on the character 'N', which is the most frequent. Although this is one additional test for all the valid characters, it saves two tests for 'N'.

One could also take advantage of the fact that nearly all lines have the same length.

Using a profiler will probably tell you more about where the time is spent.

The fastest version for such a simple program is probably one with pointers, labels and gotos, but no one wants to maintain such code. I could do it in Pascal, but i do not even want to see such code in the public ;-)

MiSchi.

Implementation in nimrod

I liked your small test case idea so I made my own version of your test in Nimrod (http://nimrod-code.org) which you can grab at https://gist.github.com/gradha/5555512. I wrote as a comment the performance values I get on my machine. If I use your measurements as reference and normalize my results against those I get:

cpp: 0.399 (normalized), python: 6.353 and nimrod: 0.5329

So it would seem that nimrod is faster than the alternatives you present, though my python version is faster than yours too, so it would be better if you could run the nimrod version on your machine to compare.

Wow, didn't even know there

Wow, didn't even know there is such a language. Does it have ubuntu packages? :)

The guys at IRC tell me there

The guys at IRC tell me there was some stale package at some point. Looks like https://aur.archlinux.org/packages/ni/nimrod-git/PKGBUILD seems to be some recipe for making one, though compiling nimrod yourself should not be hard following the instructions from https://github.com/Araq/Nimrod#compiling.

I'm using the git version on macosx. IIRC the last stable release doesn't work unless you manually switched to the clang compiler, which is fixed on the git version.