Wednesday, January 20, 2010

Django, solr and a few interesting stuff Part 1 : Solr power!!!!

****Long story alert, no code, but tech related, jump to lesson learned in the end*****

The Beginning

One of the thing I am working on now involving solr, django and a few thing. Along the way we discovered a haystack. 

So the story started when we found a project, that generate cash. After talking to the client, and looking at their system. We decided to try, and we decide build it using django, and solr, among a few component. Though we did not confirm that we will offer search for the first place...

To keep it short. Thus we begin to build it. 

To Serve Just java -jar start.jar

We started with solr, because it is easy to install anyway. Just grab solr from
What really cool about solr, within the solr folder, is example folder. That is a fully functional solr project. 
just cd into example folder. and run java -jar start.jar. Follow the tutorial to use the example folder

Since we are a lazy bunch, so we just copy the example folder into our project folder. And modify schema.xml in example/solr/conf/. Since schema.xml in the folder is usable, we just modify the fields. 


Since this project original database have quite a lot of table. So FIRST STEP is denormalize the data. Flatten the whole thing. 

And we decided to use solrpy for our script to load the data in database into solr. Which look ok until we have a situation of 1 to many relation in the database. In solrpyr, each repetitive field is put into a list and added to solr. 
This is not the worst issue. When we pull the data from solr. The field is not in order. Thus, it doesn't maintain the structure.

Thus we learn, JUST store the ID or primary key in SOLR, so that it can be refered to the database, just use the KEY to pull data from database. 

Then after many trial of error, to detect empty field etc. 

We give up, we discovered haystack!!!

Lesson Learned in Using Solr

1) Denormalized all fields to be stored in Solr.
2) Sometime, solr output does not reflect to the structure of the database(ok, we are very new in solr)
3) So if it is from a database, indexed everything  and store the key(probably a bad idea, because we are not to hit the db a lot, i don't know really)
4) Don't reinvent the wheel. Turn out that haystack have solved our problem and more.
5) Partially number 4, the client just dump us a file outputed in database, we tried to be hero, but turn out better to store in db first....because it is a mess processing the data..

Wednesday, January 06, 2010

listing open file with lsof

So the story goes that I need to process (quite) a number of files..
Which I didn't write  a output to say what is being process..

lsof save that day. lsof can monitor file being open, which includes sockets.
For my case, the usage is easy

lsof +r -p  

The -p is for opening process, +r is to repeat until no file is opened by the process.