Being a part of the Azure SQL DB and SQL Server product team at Microsoft, I recently authored an article on the SQL Security blog titled “Best Practices for Moving Data Encrypted with Always Encrypted”. As the title suggests, the article summarizes best practices to follow when moving data encrypted with SQL16’s new Always Encrypted feature.
If you are trying to run the sample programs from “Inside Windows Debugging” in windows 10, you are at the right place. This post will save you troubleshooting time.
- Install WDK (Windows Driver Kit) from msdn downloads.
- Locate the binary “NMake2MSBuild.exe” in your installation directory.
- Run “NMake2MSBuild.exe dirs” (where dirs is the top level file in the book’s sample code directory)
This will generate appropriate VStudio solution file in the code directory. When you build this soltution, some projects might fail with: Warning treated as error. Disable this with: “Treat Warning as Errors” options from Visual Studio.
I am an Indian (desi) living in United States and working in technology. Most(All?) of my friends are Indians who are working in engineering positions for top technology companies. All of them are brilliant folks. I happen to be around 6-10 years younger to all of them. They studied from top tier schools, got the best of jobs and simply aced everything that they ever put their hands on.
Here is my general observation : All of them started off brilliant and hungry. Over the years with multiple years of experience and working on the same job, ‘work’ is now just ‘work’ and nothing more. It is definitely not a passion. This does not mean that they suck at their jobs. They are pragmatic, efficient and highly skilled engineers.
This scenario would definitely not just apply to Indians and perhaps is true for any worker in technology(any job?). It also makes a lot of sense. You have a family to take care of, you also want to enjoy your life and fulfill other responsibilities. Life is definitely more than just work/professional life. You simply have more commitments.
That makes me wonder :
Should a guy with 10 years or experience be as obsessed about his work as a fresh graduate ? Is it even possible to share a similar enthusiasm after years of experience?
I think it boils down to personal temperament and choices. There is no right or wrong answer.
What do you think?
Ghazals are losing prominence in India. To the best of my understanding, people in my generation (and coming?) simply cannot relate to Ghazals. Everyone finds the music too slow and lyrics too cheesy. However, sometimes you just want to slow down and appreciate the music with childlike innocence. To me ghazals bring back the nostalgic 90s childhood I spent in India. :)
I would end this post with a Ghazal I recently stumbled on and fell in love with.
चांद के साथ कई दर्द पुराने निकले,
कितने गम थे जो तेरे गम के बहाने निकले
फ़सल-ए-गुल आई फ़िर एक बार असीनाने-वफ़ा,
अपने ही खून के दरिया में नहाने निकले
दिल ने एक ईंट से तामीर किया हसीं ताजमहल,
तुने एक बात कही लाख फसाने निकले
दश्त-ए-तन्हाई ये हिजरा में खडा सोचता हुँ,
हाय क्या लोग मेरा साथ निभाने निकले ?
The song Kinara by Atif Aslam and Riaz Ali Khan from Coke Studio Pakistan is simply amazing. I am in love with the part from 3:52 onwards. Listen to it here.
What do you do when you are in love with a song ? Especially, if it is one of those not so popular song (atleast in my friend circle)?
You write a blog post, hope that people stumble onto the page (thanks Google) and share your love for the song. Say Hi on the comments if you love the song! :)
Both papers are quite fun to read! The presentation aims to summarize both the papers.
I recently took a class on graph mining. In this post, I am aggregating all the lecture topics that make a complete four months worth of material. Hope this serves as a good archive on hand picked topics around several different areas of graph mining.
Thanks to Dr. Kamesh Madduri for this course!
Slides from Georgia Tech Network Science class
The paper that introduces the modularity measure Newman, Fast algorithm for detecting community structure in networks, 2003
Newman presents the divisive spectral approach to modularity maximization Newman, Modularity and community structure in networks, 2006
The agglomerative clustering based CNM algorithm for modularity maximization Clauset, Newman, Moore, Finding community structure in very large networks,2004
Introduces the Louvain algorithm, a “multilevel” method for modularity maximization Blondel et al., Fast unfolding of communities in complex networks, 2008
A fast and simple algorithm for community detection. Doesn’t explicitly try to maximize modularity. Raghavan, Albert, Kumara, Near linear time algorithm to detect community structure in large-scale networks, 2007
Comparative evaluation of community structure in a collection of large-scale networks Leskovec et al., Community Structure in large networks: natural cluster sizes and the absence of well-defined clusters, 2008
Various implementations for the partitioning and community detection problems were compared here. Note that graph partitioning and community detection are two different problems. The 10th DIMACS Implementation Challenge: Graph Partitioning and Graph Clustering
Winner of the DIMACS challenge for community detection. If you had to pick one implementation for non-overlapping community detection, this would be it. Ovelgonne and Geyer-Schulz, An ensemble learning strategy for community detection
Review paper on overlapping community detection. Xie et al., Overlapping community detection in networks: the state of the art and comparative study, 2011
One possible algorithm for overlapping community detection. Yang and Leskovec, Defining and evaluating network communities based on ground-truth, Proc. ICDM 2012.
A very nice exposition of this problem. An extension of the above paper. Yang, Community Structure of Large Networks, PhD dissertation, 2014.
Miscellaneous analysis algorithms
Model for small-world phenomenon : Barabasi and Albert, Emergence of scaling in random networks, 1999
The preferential attachment paper : Kleinberg, The small-world phenomenon: an algorithmic perspective, 2000
Another model explaining small-world nature : Kleinberg et al., The web as a graph: measurements, models, and methods, 2000
Web graph modeling : Doyle et al., The “robust yet fragile” nature of the internet, 2005
Observations about the internet router-level topology Aiello, Chung, Lu, A random graph model for power law graphs, 1999
Some results about power-law random graphs Robins et al., An Introduction to exponential random graph models for social networks, 2007
Graph storage and analysis frameworks
Graph compression & visualization
A couple of months back, I gave an introductory talk on Apache Spark. Spark, designed at Berkeley’s amplab is a fast general engine for large-scale data processing. This slide deck covers the background behind Apache Spark and discuss two academic papers Spark: Cluster Computing with Working Sets and RDD: A Fault-Tolerant Abstraction for In-Memory Cluster Computing that provide deep insights into the architecture. There is a also sneak peek of BlinkDB which I previously covered in-depth in an another post.
Here are the embedded slides :
If you are reading this, chances are that you are wise enough not to use “123456” and “dragon” for your password. Even for the folks who are somewhat paranoid about online security and believe they are indeed using “strong” passwords, there is a still a chance of improvement (with a better understanding of the science behind). This post scratches the surface on “state of the art research for choosing a secure password”. The idea is to document the key lessons and provide pointers to academic journals to dive in deeper.
Our Assumption for the “Threat Model” : We assume that the attacker has obtained access to a list of hashed passwords. They can easily launch offline attacks in their own sweet time as an attempt to crack these passwords. Also most users tend to reuse passwords across different accounts, which helps an adversary maximize the possible damage.
How strong is your password?
A strong password is not easily “guessable”. Thus the time needed for some of the most efficient password-cracking algorithms to reveal the plaintext password can quantify a password’s strength.
Here are some of the findings from the security community you can use for a quick self assessment. We discuss the all issues more clearly subsequently.
- Any dictionary based password is inherently insecure.
- As a thumb rule, having a longer password makes it more secure. That said, it is still quite possible that a user with the best of intentions chooses a long password (non dictionary) which is still easy to break
- Password that follow a “predictable” structure are more insecure.
- Multiple websites force a password expiration policy. Most of the users when choosing a new password, tend to generate future passwords incrementally based on old passwords. Most of the time, these new passwords are a few simple tweaks on the old ones. (Almost everyone does this, right?)
So how do the bad guys exploit the weakness?
In an ideal world, every user chooses a perfectly random password. Perhaps something like : “cmMs&59my@6Ue*T” or “3%RaKYn5zMBB&9P”. Randomness is indeed our friend. There is a just small flaw, these password are incredibly hard to remember.
Not so surprisingly, dictionary based passwords and attacks are most common. Any attacker will try to guess the password by initially using a set of dictionary words. Even most of the lengthy non dictionary passwords usually have a predictable structure with a non-uniform (in simple words: predictable) distribution of characters. Thus the most efficient cracking tools use heuristics and do not explore all possible passwords(brute force).
They instead use clever techniques like :
“For example, in English, ‘w’ is more likely to be followed by ‘h’ than by another ‘w’. The concept can be extended to character sequences of arbitrary length, called n-grams; the 3-gram (or tri-gram). ‘thr’ is far more likely to be followed by a vowel than by a consonant. By training the Markov chain on known password lists, dictionaries, or both, these distributions can be estimated and used to generate a list of possible passwords that is significantly more effective than random guessing or the publicly available programs like John the Ripper”
Probabilistic Context Free Grammars
“This algorithm is based on the observation that passwords tend to have predictable “structures”. The structure of a password is defined as the way in which the password can be broken into strings(or tokens) of letters, digits, and symbols (e.g. S1U1L6D2 represents a special character followed by an uppercase letter,then 6 lowercase letters, and ending with two digits). PCFG attacks are highly effective because when a password composition policy requires that a digit or symbol be included in a password, users are far more likely to append the required character to an existing password rather than place it in the middle. Similarly, uppercase letters are predominantly used at the beginning of an alphabetic string. By tabulating the number of occurrences of each distinct structure found in a training set, an attacker can gain valuable insight into the likely distribution of structures of the passwords.”
Thus even with strict password policies enforced by websites, PCFG based attacks can be highly lethal, exploiting the bias users have when choosing passwords.
Transformation based Algorithmic Attacks
These attacks when targeted on an individual user, make small modifications to existing leaked passwords, exploiting incremental nature and making the attacks more efficient.
Choosing a strong password has quite a few associated nuances. It is simply not a choice between a shorter and a longer passphrase. It takes a little more thought and effort to ensure our passwords are indeed strong and resistant to offline attacks.
Suggested Readings :
References : 1
I recently gave an introductory talk on ‘BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data’. BlinkDb, designed at Berkeley’s amplab provides an approximate query engine for running interactive SQL queries on large volumes of data. Kudos to Sameer Agarwal for his great work!
Here are my slides covering an overview of BlinkDb :