Blogspot - sujitpal.blogspot.com - Salmon Run

Latest News:

Degrees of Separation from Kevin Bacon using Cascading 23 Aug 2013 | 11:41 am

Motivation This post came about as a result of two events. First, I finished reading Paco Nathan's "Enterprise Data Workflows with Cascading" book (see my review on Amazon), and second, I started lea...

Embedding Concepts in text for smarter searching with Solr4 8 Aug 2013 | 07:57 am

Storing the concept map for a document in a payload field works well for queries that can treat the document as a bag of concepts. However, if you want to consider the concept's position(s) in the doc...

Dictionary Backed Named Entity Recognition with Lucene and LingPipe 23 Jul 2013 | 02:33 am

Domain-specific Concept Search (such as ours) typically involves recognizing entities in the query and matching them up to entities that make sense in the particular domain - in our case, the entities...

Porting Payloads to Solr4 18 Jul 2013 | 11:04 pm

This post discusses porting our Payload code, originally written against Solr/Lucene 3.2.0, to Solr/Lucene 4.3.0. The original code is described here and here. It also discusses implementing support f...

Bayesian Network Inference with R and bnlearn 6 Jul 2013 | 05:10 am

The Web Intelligence and Big Data course at Coursera had a section on Bayesian Networks. The associated programming assignment was to answer a couple of questions about a fairly well-known (in retrosp...

Better Bird Strike Visualizations with R and ggplot2 1 Jul 2013 | 04:30 am

Last week I wrote about building some graphs for the FAA Bird Strike Dataset. I used R's built-in graphics capabilities for that work. In this post, I re-do the graphs using the ggplot2 plotting syste...

Bird Strike Visualizations with R 24 Jun 2013 | 06:39 am

One of the assignments at the Introduction to Data Science course at Coursera is to design visualizations using Tableau for the FAA Bird Strike dataset. One big problem (for me) is that Tableau is Win...

Functional Chain of Responsibility implementation in Scala 10 Jun 2013 | 02:48 am

The Chain of Responsibility pattern can be very useful for building configuration driven pipeline style applications. We have made extensive use of this in both our search and indexing pipelines, and ...

MapReduce with Python and mrjob on Amazon EMR 3 Jun 2013 | 01:27 am

I've been doing the Introduction to Data Science course on Coursera, and one of the assignments involved writing and running some Pig scripts on Amazon Elastic Map Reduce (EMR). I've used EMR in the p...

Feature Selection with Scikit-Learn 26 May 2013 | 04:20 am

I am currently doing the Web Intelligence and Big Data course from Coursera, and one of the assignments was to predict a person's ethnicity from a set of about 200,000 genetic markers (provided as boo...

Related Keywords:

java tree, lucene facet, java tree structure, how to run nutch 2.0, tree in java, lucene ehcache, python tag cloud, actor java, tree object structure java

Recently parsed news:

Recent searches: