Nitish Upreti

A daydreamer exploring Algorithms, Software Systems Engineering and Data Science.

Introducion to Apache Spark

| Comments

A couple of months back, I gave an introductory talk on Apache Spark. Spark, designed at Berkeley’s amplab is a fast general engine for large-scale data processing. This slide deck covers the background behind Apache Spark and discuss two academic papers Spark: Cluster Computing with Working Sets and RDD: A Fault-Tolerant Abstraction for In-Memory Cluster Computing that provide deep insights into the architecture. There is a also sneak peek of BlinkDB which I previously covered in-depth in an another post.

I am also really excited to be attending Spark Summit East in New York this March. Special thanks to folks at Databricks for sponsoring my ticket! :-)

Here are the embedded slides :

How Secure Is Your Password ?

| Comments

If you are reading this, chances are that you are wise enough not to use “123456” and “dragon” for your password. Even for the folks who are somewhat paranoid about online security and believe they are indeed using “strong” passwords, there is a still a chance of improvement (with a better understanding of the science behind). This post scratches the surface on “state of the art research for choosing a secure password”. The idea is to document the key lessons and provide pointers to academic journals to dive in deeper.

Our Assumption for the “Threat Model” : We assume that the attacker has obtained access to a list of hashed passwords. They can easily launch offline attacks in their own sweet time as an attempt to crack these passwords. Also most users tend to reuse passwords across different accounts, which helps an adversary maximize the possible damage.

How strong is your password?

A strong password is not easily “guessable”. Thus the time needed for some of the most efficient password-cracking algorithms to reveal the plaintext password can quantify a password’s strength.

Here are some of the findings from the security community you can use for a quick self assessment. We discuss the all issues more clearly subsequently.

  1. Any dictionary based password is inherently insecure.
  2. As a thumb rule, having a longer password makes it more secure. That said, it is still quite possible that a user with the best of intentions chooses a long password (non dictionary) which is still easy to break
  3. Password that follow a “predictable” structure are more insecure.
  4. Multiple websites force a password expiration policy. Most of the users when choosing a new password, tend to generate future passwords incrementally based on old passwords. Most of the time, these new passwords are a few simple tweaks on the old ones. (Almost everyone does this, right?)

So how do the bad guys exploit the weakness?

In an ideal world, every user chooses a perfectly random password. Perhaps something like : “cmMs&59my@6Ue*T” or “3%RaKYn5zMBB&9P”. Randomness is indeed our friend. There is a just small flaw, these password are incredibly hard to remember.

Not so surprisingly, dictionary based passwords and attacks are most common. Any attacker will try to guess the password by initially using a set of dictionary words. Even most of the lengthy non dictionary passwords usually have a predictable structure with a non-uniform (in simple words: predictable) distribution of characters. Thus the most efficient cracking tools use heuristics and do not explore all possible passwords(brute force).

They instead use clever techniques like :

Markov Chains

“For example, in English, ‘w’ is more likely to be followed by ‘h’ than by another ‘w’. The concept can be extended to character sequences of arbitrary length, called n-grams; the 3-gram (or tri-gram). ‘thr’ is far more likely to be followed by a vowel than by a consonant. By training the Markov chain on known password lists, dictionaries, or both, these distributions can be estimated and used to generate a list of possible passwords that is significantly more effective than random guessing or the publicly available programs like John the Ripper”[1]

Probabilistic Context Free Grammars

“This algorithm is based on the observation that passwords tend to have predictable “structures”. The structure of a password is defined as the way in which the password can be broken into strings(or tokens) of letters, digits, and symbols (e.g. S1U1L6D2 represents a special character followed by an uppercase letter,then 6 lowercase letters, and ending with two digits). PCFG attacks are highly effective because when a password composition policy requires that a digit or symbol be included in a password, users are far more likely to append the required character to an existing password rather than place it in the middle. Similarly, uppercase letters are predominantly used at the beginning of an alphabetic string. By tabulating the number of occurrences of each distinct structure found in a training set, an attacker can gain valuable insight into the likely distribution of structures of the passwords.”[1]

Thus even with strict password policies enforced by websites, PCFG based attacks can be highly lethal, exploiting the bias users have when choosing passwords.

Transformation based Algorithmic Attacks

These attacks when targeted on an individual user, make small modifications to existing leaked passwords, exploiting incremental nature and making the attacks more efficient.

Summing Up

Choosing a strong password has quite a few associated nuances. It is simply not a choice between a shorter and a longer passphrase. It takes a little more thought and effort to ensure our passwords are indeed strong and resistant to offline attacks.

Suggested Readings :

References : 1

BlinkDB: Queries With Bounded Errors and Bounded Response Times on Very Large Data

| Comments

I recently gave an introductory talk on ‘BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data’. BlinkDb, designed at Berkeley’s amplab provides an approximate query engine for running interactive SQL queries on large volumes of data. Kudos to Sameer Agarwal for his great work!

Here are my slides covering an overview of BlinkDb :

How Does Your Company Make Money?

| Comments

Sometimes as a developer working in a very large company, it is very common to be oblivious on “how your company really makes money”? How do the folks in sales actually woo new customers? What exactly do these customers want? What sort of competition exists in the market? How does your company differentiate itself from its competitors?

Recently I was having a discussion with a Program Manager at Microsoft where it all stuck me. A chain of thoughts further took me down this lane : If I want to spend three to four years of my initial career working as a developer in a huge company, I should do some business reading to build a background in something other than pure engineering.

A quick Google search took me to this page : . I am reading the book now and hope that it serves as a gateway for me to dig in deeper further down the lane. Bon Voyage! ;)

Passenger - Rolling Stone

| Comments

Last weekend while working from Starbucks, I heard this song and have been in love with it since then.

The lyrics are beautiful…

Sometimes I feel I’m going nowhere
Sometimes I’m sure I never will
She said it’s cause I’m always moving
I never notice cause I never stand still

Sometimes I feel like I’m falling
Falling fast and falling free
She says my darling you’re not falling
It always looked like you were flying to me

But I fear I’ve grown a rolling stone inside of me
She said oh don’t you know
that rolling stones stop at the sea
And that’s where I’ll be

Sometimes (I feel/) I’m sure I know no one
A thousand faces but no names
She said my love you do know someone
Oh and I know you back just the same

But I’m scared I said, what if this stone don’t slow down
Just be aware she said, what goes up will come down
and when you do I’ll be around

Oh when I’ve dragged this rolling stone across this land
I’ll make sure I leave this stone in her hands
For we both know too well, rolling stones turn in to sand
if they don’t find a place to stand

Hope the song vibrates with your soul :)

Say NO to Notifications

| Comments

Technology has always been my bread and butter. I strive to be an ‘early adopter’, trying to integrate every piece of meaningful technology into my daily life. However, since the past few months I have started thinking deeply about productivity and how our interactions with cellphones, wearable gadgets, internet among others affect it. After reading a bunch of literature around, there is one very obvious and clear lesson :

Say NO to Notifications

Yes, I am referring to the luring Facebook, Quora, LinkedIn, < Your Favorite Messenger App >, < Your Favorite News App > (the list continues)… on your smartphone, tablet and email. They are always on the hook to lure you in with further engagement. A very simple key to productivity thus is turning them off.

I have had a very positive experience after turning off most of the notifications I was subscribed to. As of now, I only have two major applications with notifications : Whatsapp and Twitter on my cellphone. Have cut down on everything else and it does feel good.

Wish you a productive day! :)

Beauty of Tail Recursion

| Comments

I always overlooked Tail Recursion whenever I encountered it. Also, I am ashamed to admit that for a very long time I had a very wrong idea of what exactly was being optimized by Tail Recursion.

Today while following FP with Scala on coursera, I finally took time to think through the idea which cleared my everlasting misconception.

I will quickly and lazily summarize the difference between the two flavors ( recursion with and without tail call optimization) here.

Factorial without Tail Call optimization
def factorial(x : Int) : Int = if ( x == 0) 1 else x * factorial(x-1)
Factorial with Tail Call optimization
import scala.annotation._
def factorial(x:Int):Int = {

       def factorial(accumulator: Int, x: Int) : Int = {
         if(x == 1 || x==0)
            factorial(x * accumulator, x - 1)


Consequence of second implementation : If a function calls itself as its last action, the function’s stack frame can be reused, so we save lots and lots of memory!

Now that I have a blog post on Tail Recursion, I am free from the guilt of misunderstanding this concept for so long. :D

Interview Preparations

| Comments

So here is the idea : I want to finish solving this book in the upcoming four to five months. The reason is quite evident : Mastering the skills needed for acing technical interviews and then moving on to spend time on things that actually do matter in real life.

Will be doing a lot of problems from the book and document code with my personal notes for the problems here. So without further adieu, time to get cracking! :D


| Comments

Technology is so complex, yet humans have championed it. Are we so smart?

Maybe yes and maybe no, but if studying computer science has taught me something, it is the power of Abstraction.

I can bet my life on the fact that no human alive (and living in near future) can comprehend all that goes inside making our present technology feasible. There are just way too many layers of abstraction in place. However, the beauty of this idea lies in the fact that we can build a simple comprehensible thing and then later take it for granted to build infinitely complicated objects further down the road.

The layers continue to wrap around and the marvel is technology.

What I Want to Do in Computer Science?

| Comments

In retrospection, I have come a long way from being fascinated by my first computer (remembered as the Cyrix machine machine) to being a graduate CS&E student pursuing the field for life. Last year was one of those years, where I actually understood the deeper philosophy behind the subject and felt a little more enlightened.

Computer Science is a huge field with multiple different existing specializations. Pondering about it, I can only imagine my calling in this fascinating area full of opportunities. As of now, I have several questions waiting to be answered and here are some of them :

What are my skills most suitable for?

What do I want to pursue with my own interests?

What should I spend most of time on to improve my skills further ?

How can I have the maximum impact (with my own personal strengths and weakness)?

This post is documenting what I have in mind as of today. It will be interesting to see how my career shapes up in future and then coming back to it. :)

As of now these are the areas I have been / am most active in (I am well aware of the fact that there are multiple overlaps between these areas)

1. Software Application Engineering

I almost spent my entire undergraduate on application development. However, my interest in this area has reduced dramatically.

2. Software Systems Engineering

For now this area appeals the most to me, especially development of software infrastructure for data intensive applications requiring massive amount of scale.

3. Algorithm Design

I started exploring algorithms beginning from my final undergraduate year and have a good grasp of the fundamentals. However, I feel I am still lacking the formalism and rigor needed in this area.

4. Data Science

I am just a newcomer in this field but really find it interesting. Starting from next semester, I will be taking lots of subjects that will hopefully help me understand this subject deeper.

5. Management and Technology Entrepreneurship

I believe (with my own self judgment, which could be miserably incorrect) that my awareness on how technology industry works is relatively good for my limited real world experience. I wonder, how would I fare in the role of a technology manager or leader for a team instead of an engineer or software architect? If someday I decided to get an MBA, how would things shape up? This entire scenario seems to be far ahead as of now but is quite interesting to imagine. Joining an early age startup or trying something on my own is among these possibilities.

Have you ever been in such a similar position ? Thoughts and Suggestions are welcomed.