Ode of Code

Sunday, February 26, 2017

GPU Acceleration of Quicksort without Dynamic Parallelisation

Recently I was playing around with CUDA and GPU acceleration of different algorithms. The only limitation of my setup was that, it was not a very new one and hence it didn't support dynamic parallelism with CUDA.

Now, if you know the quicksort algorithm, you will find that the algorithm by itself is "embarrassingly" parallel. That is, place the pivot in it's one position in one step and then sort the two side of the pivot in parallel. Now if you are working on a system which allows you create parallel threads from a section already running in parallel with other threads (Yes, I expanded Dynamic Parallelism in layman's terms, deal with it), then your work with quicksort becomes very simple. But lacking that is a handicap and I was handicapped in this case.

Now, if you lack dynamic parallelism, you need to call all the parallel thread from the parent process only. Writing a parallel quicksort without dynamic parallelism needs the iterative variant of the algorithms (it is pretty obvious, you cannot recursively call threads). Iterative Quicksort works using a stack. In each iteration, the high and low index of the next iterations is stored in a stack. The concept has been explained here. To accelerate this algorithm using GPU, we store the stack in the device memory, so that every thread can access it, and in every iteration, we create threads equal to the number of index pair in this stack. Each thread, then uses the index pair stored at the memory address equal to the thread ID and partitions that region of the array, storing the next index pair in the stack. The important point to be taken care is that the addition of index pair to the stack needs to be synchronised, which is achieved by using atomic add.

The working code for this can be found here.

Sunday, January 15, 2017

ns-3 Shorts: PyViz!!

How do you turn "Viz" on?

Firstly, sorry for sneaking in the Age of Empires reference in a post about an open source software but then moving on. I have been working on ns-3 for quite sometime now, and if you work on a software for sometime, you, at times, start feeling like.... well, *insert your favourite Hackerman Meme here*. I have observed that a lot of new users completely overlook the functionality of PyViz which ns-3 provides. PyViz allows you to visualise your simulation without adding a single line of code, but it needs proper python configuration.

First, to run any simulation with PyViz, you just need to add --vis during execution.

./waf --run second --vis

If your python configuration is correct, you can see visualisation on screen.

To see if your PyViz is enabled, try

./waf configure

And see the output for "PyViz visualizer", it should show either "enabled", or "not enabled (Python Bindings are needed but not enabled)"

Now ns-3 documentation provides a long and detailed explanation of how to download and run ns-3 on any system, but they recommend the installation of the allinone package or installation through bake. Both these methods along side the base ns-3 directory, contains a few other directories including a version of pybindgen. This is the "by the book" method to do it, but if you want to keep several copies of ns-3, installing every time with these processes will have both time and space overheads.

So, here goes the alternative method. The most important dependency of ns-3 which comes in all these installations is pybindgen and, even though it is an optional dependency, it is a necessary requirement to unlock all the python based functionality of ns-3, say, PyViz. So, what should we do to a important dependency? Install it globally. Installing pybindgen globally makes the installation of a new copy of ns-3 very easy.

To install pybindgen globally you can just clone this, and do a python install and you can get your ns-3 work with python without having a pybindgen in the directory one level above it.

*A PG rated hack*

Every version of ns-3 comes with a specific commit hash of the above repository, so your latest globally installed pybindgen might not work for older versions, but you can try to overcome by editing the pybindgen version (you can see your pybindgen version in the output of ./waf configure) in bindings/python/wscript, but this might work for only a few versions.

Now there is a chance that even after the configuration is proper, the PyViz might not show up. That might be because the command line parser has not been added to code.

CommandLine cmd;
cmd.Parse (argc,argv);

Friday, January 13, 2017

So What was Discussed?

A Common Predicament for IRC users!

Clubs and Groups are an integral part of college life. They create a more friendly and open environment among everyone and also help a lot of people. Recently, the computer club, I am a part of, planned to organise online sessions for our juniors. The programme was meant to help out freshers and other juniors with basics of general topics like algorithms, data structures, machine learning and so on. Going by the fact that it is extremely light weight, we finalised IRC as the platform for discussions.

If you have used IRC before, you will knowing that in IRC, there is no mean of retrieving a conversation if you have missed (maybe because you were too lazy to join on time or your network provider hates you). The only thing you can do in this situation is that some angel who has been connected since the beginning of the session to post the logs. We realised that this might lead to a few people missing a few important messages and then become unable to catch up with their friend who were always connected.

To solve this problem, we came with the idea of an active log of our IRC session and used the naivest way to achieve it. We found our solution in two unlikeliest of allies, with the first being an IRC Bot and the second being Github. We simply built an IRC Bot which can connect and listen to any of our channel and can keep pushing the whole conversation to a Github repository in a periodic manner. As I said, it was an extremely naive solution but it turned out to very effective. We had instances of people attending the entire session by reading the active logs as they were unable to connect to IRC for a reason or other.

You can find the implementation of this bot here.

Thursday, January 12, 2017

It's Random

And I'm Lovin' It

I love random numbers. I just love them! It's one of those few things where everything is right and nothing is. Being from the country from where ludo originated and the country where the population pray for hours for a favourable coin toss (yea, we are obsessed with cricket!!!), I feel my love for random numbers is not misplaced. I also love random numbers because of the pristine adrenaline rush, hacking them provides. You just feel happy with every output (Segmentation Fault is not an output). The first computer science course project in my undergraduate course was a game of poker in C. But in the recent past, I have moved past to writing random number generators instead of using one.

Being a computer science undergraduate, I might or might not understand the theory of the generation of random numbers (I have spent quite sometime in understanding Marsaglia's KISS, Mersenne Twister, Blum Blum Shub etc.), but it cannot stop me from hacking them.

One of my favourite implemented PRNG comes with a story. It was course of Information Security and we had to propose a course project. Now, I will confirm your doubts, whatever story you have heard of Engineering students and deadlines...... they are all true, we are just never before time. So, on the last day of proposal, I, based on my professors dislike of every single PRNG present in public domain, gave a new PRNG as a proposal. Now, the description of the PRNG I implemented can be found here. It contains entropy generation modules and finally a Blum Blum Shub implementation.

Recently, as a part my parallel programming assignment, we were asked to make a thread safe PRNG. I again went with my trusted Blum Blum Shub algorithm. Now, since I am trying to make this blog a technical one (What? You don't trust me? I am hurt), I will explain how I did it.

The basic formula of Blum Blum Shub algorithm is:

x[i] = (x[i-1]*x[i-1]) mod M

where M is a product of two large randomly generated prime numbers.

"Large Randomly Generated Prime", every word of this phrase decides the effectiveness of the PRNG from now on. Generating these has been one of the most difficult problem I faced in my student life and I probably solved it. The trick I use is to do this is a rather simple one. I generate two large numbers and perform multiple iterations of Fermat's primality test on every number greater than this till I get one. This is a slow process and can generate only what is called "Probable Primes" (I said, I only probably solved it). Also, you can speed it up a bit by checking for only the numbers in the form of 6n-1 and 6n+1. Once we get this, we can follow the generation formula.

As for the thread safe part, just make the seed private to thread and calculate the seed from the thread state (source code can be found here and here).

I am not very sure if these are the best way to do it and am open to suggestions.

Friday, January 6, 2017

Why Use One?

Why Using a Single Language is not always the Best Idea!

A good code is meant to solve a problem fast and efficiently. A code needs a language to be written in (duh!). Every language out there has its own advantages. As a student, I always associated these advantages with my favourite languages:

C - Fastest runtime, pointers, easiest for systems level programming (always preferred raw system calls), great for learning. I also considered it best for structures like trees and heaps (everyone have their opinions).
C++ - C + STL + Objects Oriented Programming
Java - Applets, Swing and AWT. Also, I have always found Java great for string handling
Python - Smaller codes, very useful libraries and sometimes a great replacement for bash

Many times I have seen people around me struggling with their project because the language fails to solve a specific problem efficiently. For example, Java is great for creating something with a graphical user interface but you lose out on, say, great machine learning libraries of python or even the process management that C provides (there is no easy way to achieve system() of C or os.system() of python, ProcessBuilder comes the closest). So I have always felt that best way to complete a work is to use the languages for the tasks which they do the best.

If we choose to take the path of using multiple languages, the one challenge we will face is to integrate all the chunks. This however can be achieved in multiple ways:

Languages like python and C provides methods to call each other
The subprocess module of python is great for binding
A socket can always come in handy
Pipes always help in the flow
The easiest and maybe the most inefficient, use the file system (😛), write the data on to the file from one and read the same from other.

I hope this helps, this has helped me out several times.

Examples,

ns-3's check-style.py, A Boon

At least for me

In the initial years of my college, I was known for my bad codes. By bad I don't mean that they were wrong or inefficient, but by bad I mean they looked really bad. You can just say I was really a miser when it came to spaces. In India, where the presentation of assignment sometimes makes up for nearly half of the evaluation, I was really losing out.

It was the winter of 2015, when my life changed (*dramatic music*). I had completed my first ever module for ns-3, the DHCP module, and had sent it out for reviews. As expected, the very first review pointed out that the module was no where close to the coding standards of ns-3 contributions 😟. But this one also came with a silver lining, alongside the problem, the reviewer had also mentioned the solution. To check and rectify badly indented contributed codes, ns-3 community has made a solution of its own. In the utils of ns-3 lies check-style.py. This small python script is capable of finding out indentation faults in C like language and can also be used to correct it. The way to use it for has been explained in the official documentation.

The best thing about this script is that its effectiveness is not limited to ns-3. Since, the time I have been introduced to it, I have used this on all my codes. The trick is very simple and apparent. Copy this script to your assignment folder, and follow the same instruction as ns-3 documentation:

./<path to check-style.py> --level=3 --in-place -f <path of file you want to correct>

PS: This was the very first code of ns-3 to which I contributed to.

Pre-requisite - Uncrustify