The Official Klout Blog

Find Your Klout

December 9th, 2011 by Felipe Oliveira and Derek Wollenstein
11 Flares 11 Flares ×

At Klout, we love data and as Dave Mariani, Klout’s VP of Engineering, stated in his latest blog post, we’ve got lots of it! Klout currently uses Hadoop to crunch large volumes of data but what do we do with that data? You already know about the Klout score, but I want to talk about a new feature I’m extremely excited about — search!

Problem at Hand
I just want to start off by saying, search is hard! Yet, the requirements were pretty simple:  we needed to create a robust solution that would allow us to search across all scored Klout users. Did I mention it had to be fast? Everyone likes to go fast! The problem is that 100 Million People have Klout (and that was this past September—an eternity in Social Media time) which means our search solution had to scale, scale horizontally.

So how did we accomplish that?

Share Nothing and Don’t Block
We use Node.js in our front end to help scale to thousands of concurrent users.  We follow the same philosophy in our backend for search. Given the size of our dataset and its substantial growth rate, we needed to choose a search solution which would allow us to scale horizontally; On the application side we wanted to have a stateless Web layer, not only for performance, but also for manageability. So share nothing and block as little as possible!

Let’s Play! and be “cool, bonsai cool”
The technology stack chosen to address the problem was ElasticSearch and the Play! Framework. Why did we choose that stack? At Klout, we like to choose the right tool for the job, regardless of the platform it runs under or the company that’s behind it.  We chose ElasticSearch and Play! because both of these were designed to use fast, non-blocking IO, both of these provide powerful infrastructure, and both of these were designed to be easy to extend.  These tools help us build powerful search now, and continue improving search to give you more relevant results.

ElasticSearch is a powerful, scalable and distributed search solution built on strong foundations like JBoss Netty and Apache Lucene. ElasticSearch builds off of Apache Lucene, a personal favorite of mine, created by Doug Cutting.  Doug Cutting has had a huge impact on many tools we use at Klout;  He is also the creator of Hadoop (and Nutch for that matter!).  Lucene is a search library—more than 10 years old—that provides powerful search capabilities such as relevancy ranking, fuzzy matching, wildcard, proximity operators, fielded searching, spell-checking, multi-lingual and all that jazz—all while still being completely portable since it’s a JVM-based solution; most important, it’s blazing fast!

ElasticSearch uses JBoss Netty as its network library for async/non-blocking IO.  In a traditional blocking IO model, performing a search across multiple shards would be extremely expensive.  We could retrieve results serially, meaning that our search would become slower as our data size increased, or execute results in parallel threads, which would require ever increasing processing resources.  Netty allows ElasticSearch to retrieve results from multiple search nodes in parallel; there are no blocking threads waiting for it to finish.

We used Play! Framework for the Web layer, which also uses JBoss Netty as its network library. Why? To find out more about this great framework, watch my Dreamforce presentation from this past September here in San Francisco, CA: “Introducing Play! Framework: Painless Java and Scala Web Applications”. Just recently, Play! has joined Typesafe, the creators of Scala, as an official part of its Scala-based technology stack and providers of the Web solution for Scala.

Akka is also part of Typesafe’s stack and provides an event-driven and self-healing concurrency platform based on an Erlang-style, actor-based concurrency model for the JVM. In summary, Akka helps Klout’s search go fast! We have actors for the different searches we support, messages are dispatched to their mailboxes as Play’s controller actions are invoked. Akka actors, which are pretty similar to Scala actors, allow us to effortlessly execute parallel searches to minimize overall response time to provide our users the best experience possible.

If you are down to Play! come join us and follow us on Twitter as @_felipera and @dwollen.

Happy Searching!

11 Flares Twitter 1 Facebook 0 LinkedIn 0 Google+ 10 Pin It Share 0 11 Flares ×
The following two tabs change content below.

Felipe Oliveira and Derek Wollenstein

Latest posts by Felipe Oliveira and Derek Wollenstein (see all)

This entry was posted on Friday, December 9th, 2011 at 7:42 am and is filed under engineering. You can follow any responses to this entry through the RSS 2.0 feed.

You can leave a response, or trackback from your own site.

  • http://twitter.com/jmussuto Jorge A. Mussuto

    #FF as @_felipera and @dwollen. #klout #search

  • Anonymous

    Thank you for all these explanations. However, it remains to know the guidelines that permit the calculation of your Klout index. Indeed, except for all the details that you have listed, we do not know more about the differences between the networks, what are the different coefficients, etc …
    Unfortunately, there are still inconsistencies in my case my network increases as my index decreasesnetwork!
    A little more transparency would be ideal …

  • http://twitter.com/SteveOnTech Steven Young

    This post seems to be more targeted towards those who are interested in the driving technology behind Klout, not the algorithm behind the score.

    I’m sure Klout is well aware of the lack of transparency towards score. It is still in “Beta” and understandable that giving such details could skew the results of data being collected. “Gaming” the score comes to mind.

    As a developer and student looking in to these new emerging technologies, I have found this post very insightful. Thanks a lot for taking the time to break down the tech. I look forward to future posts like this.

  • http://geeks.aretotally.in/ Felipe Oliveira

    That’s correct, we are working hard at Klout to make our score more transparent. Some very exciting things are coming up, stay tuned!

  • Charlie Liang Yuan

    You guys are awesome! Keep up the great work!

  • http://www.blurbpoint.com/ Internet Marketing Company

    Now klout is on the path of providing more best things to its users and on right direction the klout is! Great .

  • http://geeks.aretotally.in/ Felipe Oliveira

    Thank you very much! We are working hard to do just that.

  • http://www.happywivesclub.com Fawn

    Can you help me with something?  Klout says my main activity being tracked is Twitter.  But I use Twitter as my main social media tool and received hundreds of comments each day.  But somehow Klout isn’t able to see that.  Is there a reason?  When I login to Klout using my FB account, it shows they are successfully connected.  But when I go to my score analysis, it tells me I still need to connect my facebook account. I’m confused. Please help.  I’m at Klout.com/#/happywivesclub.  Thank you in advance for your response.

  • http://www.happywivesclub.com Fawn

    Oh yes, and my Facebook fan page is facebook.com/HappyWivesClub.  Thanks!

  • http://www.happywivesclub.com Fawn

    Sorry, I meant to say I use Facebook as my main social media tool not Twitter.  You probably figured that out but I wanted to clarify that point.  Thanks!

  • Pingback: Sexy API from Klout « The Official Klout Blog | Klout

  • Pingback: Sexy API from Klout - Felipe Oliveira [ @_felipera | @PlayFrameworkHQ ]

  • Pingback: Sexy API from Klout « Klout Engineering