Nitish Upreti

A daydreamer exploring Algorithms, Software Systems Engineering and Data Science.

My Programming Journey in a Full Circle

| Comments

As a kid I was introduced to computers with Windows 98, programming on Turbo C++ and feeling clever playing with Windows Registry. Growing up, things changed. I was using a macbook, programming in Ruby and messing with up the new cool web stuff.

Now I feel life has taken a full circle and I am back working on Microsoft platform (day job) doing more systems and less of application programming. Like I would have imagined myself as a kid.

However, here is what I have learned : The beauty of programming and playing with computers transcend all the superficial stuff (Operating Systems, Programming Languages, Compilers etc as they are more of less just tools). The pure joy of tinkering with computers and software is forever. :-)

Abstraction Is Good. Magic Is Bad

| Comments

At Microsoft engineers are usually working on very large, sometimes old and inherintly complex codebases. Most of the time when making a code change it is thus impossible to have an absolute knowledge of every component you are touching. As an engineer, I have personally had following extreme experiences :

The Best Case

This is where good old abstraction comes in. You can bid a component to do its thing without knowing all the messy details. You achieve the task at hand effeciently by calling these components, get the job done and move on quickly.

The Worst Case

In this case, Abstraction joins the dark side and turns into magic. Exisiting code appears to work the desired way, but the new code does not. Debugging is super hard. This is because it is hard to just peek into the internals and get a sense of what is going on. The existing framework/platform abstraction is simply just doing a lot of things under the hood at different places. A new engineer struggles to narrow down issues without investing significant time understanding the codebase.

Food for Thought: How do you ensure you have the right set of abstractions but no (black)magic ?

Best Practices for Moving Data Encrypted With Always Encrypted

| Comments

Being a part of the Azure SQL DB and SQL Server product team at Microsoft, I recently authored an article on the SQL Security blog titled “Best Practices for Moving Data Encrypted with Always Encrypted”. As the title suggests, the article summarizes best practices to follow when moving data encrypted with SQL16’s new Always Encrypted feature.

Would love to hear any feedback and suggestions @nitish or comments on the blog post. :)

Running Inside Windows Debugging Sample Programs in Windows 10

| Comments

If you are trying to run the sample programs from “Inside Windows Debugging” in windows 10, you are at the right place. This post will save you troubleshooting time.

  1. Install WDK (Windows Driver Kit) from msdn downloads.
  2. Locate the binary “NMake2MSBuild.exe” in your installation directory.
  3. Run “NMake2MSBuild.exe dirs” (where dirs is the top level file in the book’s sample code directory)

This will generate appropriate VStudio solution file in the code directory. When you build this soltution, some projects might fail with: Warning treated as error. Disable this with: “Treat Warning as Errors” options from Visual Studio.

My Observation of Indians Working in Silicon Valley/Technology

| Comments

I am an Indian (desi) living in United States and working in technology. Most(All?) of my friends are Indians who are working in engineering positions for top technology companies. All of them are brilliant folks. I happen to be around 6-10 years younger to all of them. They studied from top tier schools, got the best of jobs and simply aced everything that they ever put their hands on.

Here is my general observation : All of them started off brilliant and hungry. Over the years with multiple years of experience and working on the same job, ‘work’ is now just ‘work’ and nothing more. It is definitely not a passion. This does not mean that they suck at their jobs. They are pragmatic, efficient and highly skilled engineers.

This scenario would definitely not just apply to Indians and perhaps is true for any worker in technology(any job?). It also makes a lot of sense. You have a family to take care of, you also want to enjoy your life and fulfill other responsibilities. Life is definitely more than just work/professional life. You simply have more commitments.

That makes me wonder :
Should a guy with 10 years or experience be as obsessed about his work as a fresh graduate ? Is it even possible to share a similar enthusiasm after years of experience?

I think it boils down to personal temperament and choices. There is no right or wrong answer.

What do you think?

Ghazal

| Comments

Ghazals are losing prominence in India. To the best of my understanding, people in my generation (and coming?) simply cannot relate to Ghazals. Everyone finds the music too slow and lyrics too cheesy. However, sometimes you just want to slow down and appreciate the music with childlike innocence. To me ghazals bring back the nostalgic 90s childhood I spent in India. :)

I would end this post with a Ghazal I recently stumbled on and fell in love with.


चांद के साथ कई दर्द पुराने निकले,

कितने गम थे जो तेरे गम के बहाने निकले

फ़सल-ए-गुल आई फ़िर एक बार असीनाने-वफ़ा,

अपने ही खून के दरिया में नहाने निकले

दिल ने एक ईंट से तामीर किया हसीं ताजमहल,

तुने एक बात कही लाख फसाने निकले

दश्त-ए-तन्हाई ये हिजरा में खडा सोचता हुँ,

हाय क्या लोग मेरा साथ निभाने निकले ?

Kinara - Coke Studio

| Comments

The song Kinara by Atif Aslam and Riaz Ali Khan from Coke Studio Pakistan is simply amazing. I am in love with the part from 3:52 onwards. Listen to it here.

What do you do when you are in love with a song ? Especially, if it is one of those not so popular song (atleast in my friend circle)?

You write a blog post, hope that people stumble onto the page (thanks Google) and share your love for the song. Say Hi on the comments if you love the song! :)

Cheers !

Readings on Graph Mining

| Comments

I recently took a class on graph mining. In this post, I am aggregating all the lecture topics that make a complete four months worth of material. Hope this serves as a good archive on hand picked topics around several different areas of graph mining.

Thanks to Dr. Kamesh Madduri for this course!

Introduction

Slides from Georgia Tech Network Science class

Link1 and Link2

Chapters 1 and 2 of Networks, Markets, Crowds book

Brandes et al., What is Network Science?

Estrada et al., Complex Networks: an invitation

Lecture blurbs of Barabasi’s Network Science class

Mislove et al., Measurement and Analysis of Online Social Networks, Proc. IMC 2007

Broder et al., Graph structure in the web, Proc. WWW 2000

Meusel et al., Graph structure in the web – revisited, Proc. WWW 2014

Centrality

Centrality chapters in Network Analysis: Methodological foundations book Link1 and Link2

Freeman’s paper on centrality in social networks

PageRank papers : Link1 and Link2

Gleich, PageRank beyond the web, 2014

The Wikipedia page on PageRank has some clarifications about the formula to use

Hoeffding’s inequality (used to analyze randomized algorithms)

Brandes’ Betweenness centrality algorithm paper

Faust, Centrality in affiliation networks

Papers/slides on Harmonic centrality by Boldi and Vigna : Link1 and Link2

Community detection

Fortunato, Community detection in graphs, 2010

The paper that introduces the modularity measure Newman, Fast algorithm for detecting community structure in networks, 2003

Newman presents the divisive spectral approach to modularity maximization Newman, Modularity and community structure in networks, 2006

The agglomerative clustering based CNM algorithm for modularity maximization Clauset, Newman, Moore, Finding community structure in very large networks,2004

Introduces the Louvain algorithm, a “multilevel” method for modularity maximization Blondel et al., Fast unfolding of communities in complex networks, 2008

A fast and simple algorithm for community detection. Doesn’t explicitly try to maximize modularity. Raghavan, Albert, Kumara, Near linear time algorithm to detect community structure in large-scale networks, 2007

Comparative evaluation of community structure in a collection of large-scale networks Leskovec et al., Community Structure in large networks: natural cluster sizes and the absence of well-defined clusters, 2008

Various implementations for the partitioning and community detection problems were compared here. Note that graph partitioning and community detection are two different problems. The 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering

Winner of the DIMACS challenge for community detection. If you had to pick one implementation for non-overlapping community detection, this would be it. Ovelgonne and Geyer-Schulz, An ensemble learning strategy for community detection

Review paper on overlapping community detection. Xie et al., Overlapping community detection in networks: the state of the art and comparative study, 2011

One possible algorithm for overlapping community detection. Yang and Leskovec, Defining and evaluating network communities based on ground-truth, Proc. ICDM 2012.

A very nice exposition of this problem. An extension of the above paper. Yang, Community Structure of Large Networks, PhD dissertation, 2014.

Miscellaneous analysis algorithms

Assortativity

Newman, Mixing patterns in networks, 2003

Latapy, Main-memory triangle counting for very large sparse power-law graphs, 2008

Seshadhri et al., Wedge sampling for computing clustering coefficients and triangle counts on large graphs, 2013.

Tsourakakis et al., DOULION: Counting Triangles in Massive Graphs with a coin, 2009.

Degeneracy/k-core decomposition

Batagelj and Zaversnik’s (Pajek) 2002 k-core algorithm paper:

Sariyuce et al., Streaming algorithms for k-core decomposition, 2013

Rossi et al., Fast maximum clique algorithms for large graphs, 2014

Cohen, Cohesive subgraphs for social network analysis, 2008

Sariyuce et al., Finding the Hierarchy of Dense Subgraphs using Nucleus decompositions, 2014

Network Models

Watts and Strogatz, The Collective dynamics of small-world networks, 1998

Model for small-world phenomenon : Barabasi and Albert, Emergence of scaling in random networks, 1999

The preferential attachment paper : Kleinberg, The small-world phenomenon: an algorithmic perspective, 2000

Another model explaining small-world nature : Kleinberg et al., The web as a graph: measurements, models, and methods, 2000

Web graph modeling : Doyle et al., The “robust yet fragile” nature of the internet, 2005

Observations about the internet router-level topology Aiello, Chung, Lu, A random graph model for power law graphs, 1999

Some results about power-law random graphs Robins et al., An Introduction to exponential random graph models for social networks, 2007

Newman, Structure and function of complex networks, 2003

Chapter 4 on Random graphs from Foundations of Data Science book by Hopcroft and Kannan, 2014

Link prediction

Kleinberg, The link prediction problem for social networks, 2004.

Lu, Zhou, Link prediction in complex networks: a survey, 2010.

Al Hasan, Zaki, A survey of link prediction in social networks 2010.

Wang et al., Human mobility, social ties, and link prediction, Proc. KDD 2011.

Sadilek et al., Finding your friends and following them to where you are, Proc. WSDM 2012.

WTF: the Who to Follow service at Twitter, Proc. WWW 2014 : Link1 and Link2

Chakrabarti et al., Joint inference of multiple label types in large networks, 2014.

Schifanella et al., Folks in folksonomies: social link prediction with shared metadata, 2010.

Dong et al., Link prediction and recommendation across heterogeneous social networks, Proc. ICDM 2012.

Cannistraci et al., From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks, 2014

Subgraph mining

The Wikipedia page on network motifs:

Milo et al., Network motifs: Simple building blocks of complex networks, 2002

Yan, Han, gspan: Graph-based substructure pattern mining, 2002

Kuramochi and Karypis, Finding frequent patterns in a large sparse graph, 2005

Gibson, Kumar, Tomkins, Discovering large dense subgraphs in massive graphs, 2005.

Alon, Yuster, Zwick, Color-coding, 1995.

Ray, Holder, Choudhury, Frequent subgraph discovery in large attributed streaming graphs, 2014.

Pattabiraman et al., Fast algorithms for the maximum clique problem on massive graphs, 2013.

Yang et al., Schemaless and structureless graph querying, 2014.

Applications

Merritt, Clauset, Social network dynamics in a massive online game: network turnover, non-densification, and team engagement in Halo Reach, 2013.

Chapter 6 of Networks, Crowds, and Markets book

Gale, Kariv, Financial networks.

Zhu, Wilkinson, Shreiber, Pan, Large-scale parallel collaborative filtering for the Netflix prize, Proc. AAIM 2008.

Koren, Bell, Volinsky, Matrix factorization techniques for recommender systems, IEEE Computer, 2009.

Natarajan, Dhillon, Inductive matrix completion for predicting gene-disease associations, 2014.

Singh-Blom et al., Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses, 2013.

Graph storage and analysis frameworks

Bronson et al., TAO: Facebook’s distributed graph data store for the social graph, Proc. ATC 2013.

Curtiss et al., Unicorn: a system for searching the social graph, Proc. VLDB 2013.

Gonzales et al., PowerGraph: distributed graph-parallel computation on natural graphs, Proc. OSDI 2012.

Gonzales et al., GraphX: Graph processing in a distributed dataflow framework, Proc. OSDI 2014.

Graph partitioning

Buluc et al., Recent advances in graph partitioning, 2015

The 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering

Graph compression & visualization

Boldi, Vigna, The WebGraph Framework I: Compression Techniques, Proc. WWW 2014.

Buehrer, Chellapilla, A scalable pattern mining approach to web graph compression with communities, Proc. WSDM 2008

Hu, Efficient and High-quality force-directed graph drawing.

Ahmed et al., Visualisation and Analysis of the Internet Movie Database, Proc. APVIS 2007

Herr, Ke, Hardy, Borner, Movies and Actors: Mapping the Internet Movie Database, Proc. IV 2007

Freebase US film data analytics using GraphLab

Introducion to Apache Spark

| Comments

A couple of months back, I gave an introductory talk on Apache Spark. Spark, designed at Berkeley’s amplab is a fast general engine for large-scale data processing. This slide deck covers the background behind Apache Spark and discuss two academic papers Spark: Cluster Computing with Working Sets and RDD: A Fault-Tolerant Abstraction for In-Memory Cluster Computing that provide deep insights into the architecture. There is a also sneak peek of BlinkDB which I previously covered in-depth in an another post.

I am also really excited to be attending Spark Summit East in New York this March. Special thanks to folks at Databricks for sponsoring my ticket! :-)

Here are the embedded slides :