2011 in review: The Year It Get Too Close For Comfort

For a long time geeks like me tend to hide behind the scene when it comes to politics, unlike the US there is no law like the DMCA in Malaysia. We Malaysian don't see often see a law pass that directly affect IT industry here. Until this year, suddenly we have Computing Professional Board bill.

Which you can find a lot more online, but in short it is about regulate programmers that involve in Critical National Information Infrastructure. Which the list provided is too vague for comfort.

So what I hope to do next year, involve more somehow. What this year have shown us, ignorance and apathy can bite. Not all law have an open day that we all know, like the CPB. So law can be passed under our nose, and it will bite, and bite hard.

p.s Not including my own post on the Computing Bill, because it have quite a lot of inaccuracy

2011 in Review: Awesome Book

Books I have read in 2011
  1. Makers by Cory Doctorow
  2. Little Brother by Cory Doctorow
  3. Make Magazine not book, but still
  4. Quicksilver by Neal Stephenson, have still a whole set of Baroque Cycle, still try to finish it
  5. Cryptonomicon by Neal Stephenson
  6. Snow Crash, Neal Stephenson
  7. Anathem, Neal Stephenson
  8. Javascript the Quick Reference, Douglas Crockford
  9. Make Magazine
  10. Arduino Bots and Gadgets, that actually belongs to hackerspacekl

Copy and Paste on GNU Screen

So recently I finally found out how to do copy and paste on gnu screen, previously I just do a drag on a mouse.

So copy and paste on gnu screen,

- ctrl-a [ 
- move to the first character of the string to copy, press enter
- move to the last character of the string to copy, press enter
- now to paste, press ctrl-a ]

p.s kina ashamed that it take me so long to figure out =.=

Rooting HTC Desire HD on 64bit Linux Mint

One reason why I have been resisting rooting my phone is because of the old way of rooting the HTC Desire HD. Best illustrated by the wiki on cyanogen mod. Essentially a many step process, of copying file, downgrade, copy file, reboot, and reboot.

Recently I have discovered Advanced Ace Hack Kit, basically these guys already automate everything as a 1 step process, if the firmware need to be downgrade it, will do it. But here is the big catch, it is a 32bit binary, and I installed a 64bit linux.

Thankfully mint, which essentially built on top of ubuntu(which in turn debian) have ia32-lib, which have 32bit version of library for 64bit OS. Just install it.

And read the manual, it is inside the package along with all the tools needed, unzip, and read the manual inside it. It covers everything, you need. There will be a quiz in the end ;-)

A little adventure on the bill watcher on the pdf part. One I am hoping to able to attach comment to a section on the pdf, then have a script that compile the comments, and email blast to the MP's

So a few thing I am trying

Loading pdf into iframe, 

It is actually a pretty standard approach of handling it. Except I am linux now, and i am lazy to install a plugin. 

Also iframe height is a problem, it is hard to get it work well across browser.

I don't know ways to have javascript to access the text inside the pdf. So I don't know how to attach jquery on it. 

Use pdf.js

Work well on firefox, but not anything else. Also this is also a very new library. 

While there is a text layer for text, it don't line up well in text in the bill document . I suspect it is because of badly form pdf. 

Javascript should work well on it though, as it have a  text layer div.

Convert pdf into html

Have jquery to use .load(). It work across browser, until the layout gone wrong, could be because of bootstrap(the css library). Pdftotext work very well, the html generated can be processed pretty easily. The layout on the other hand....

javascript should work well too.

load converted pdf's html into iframe

It does not solve the height problem on iframe, but javascript should able to use it. It is a bit of a pain though. Because the pdf generated is pretty well formed. Solved the issue of layout gone wrong because of css library though.

New Toy for christmas

Before Unbox

After Unbox

It is an arduino based board for android accessory development FYI. Expect new stuff coming this coming soon.

Opinion after #CPB2011 II

From Amanz

Thanks to amanz, they have capture some pictures from the Open Day.

For Item 4. Why not just take existing certification and use it? Are Malaysia that unique?
(This is actually question asked by many during the open day)
For Item 5. I did not receive a straight answer on this.
For Item 6. How does creating an extra barrier enhance supply for manpower?
For Item 7. This is disturbing. Does it mean, that refusal to register make me untrusted?

I think this is troublesome

p.s Anyone in India want to hire python programmer? I'm jobless starting on 23rd dec.

Opinion After #CPB2011 Open Day

While from their mouth, the Computing Professional will be limited to govt, I still waiting for supporting document. And from their mouth again, it is a work in progress. Interestingly no one from mosti is attended.

Paraphrase from the discussion from representative, it is to i.e paraphrase from what I understand the word they say.

  1. Raise a standard of computing professional, both in education 
  2. and b) work 
  3. Because other country have a requirement in ISO for staff with certain requirement anyway. 
  4. It is before the requirement before the govt can sign the Seoul Accord, disclaimer I do not know what is the requirement for the accord, so somebody enlighten me please. 
  5. Add accountability
  6. And it will be limited to govt CNII project only.
  7. Everybody can apply to it
  8. For IT professional to be recognized globally?
For 1, I do not believe there is a need for a board to fix it. There is an accreditation agency for university already,  and they are not doing a very good job. What make them think they can do a better job? Chances is, if a university/college have a good curriculum, there is no need to create a board for the first place, market will grab them first. SO FIX CURRICULUM FIRST!!!!!

For 2. I am not even sure what professional means. 

For 3. I believe that , it should be in the SLA. Again not a board should decide it. 

For 4, I won't comment, because I don't know enough. 

For 5, to add accountability. To be honest, I don't know understand how forming a board would help in that. There is contract, ain't that more than enough?

For 6, during the meeting, there is no clear definition of CNII for the first place. Even though the representative is assuring us that, it will be limited to govt. Unfortunately in IT, it is interrelated. A big chunk of malaysian IT, is involve in govt project either in contract or subcontract. Even though it says in govt, if tools get adopted, does it mean it falls under their jurisdiction? How about stuff that is regulated, such as system that interface with bank negara? Does it count as CNII?

For 7, if everybody can apply, then what is the point then?

For 8, that is done by, create good product, open source or not. Again not by a board.

Extra stuff
What does the cert do, that normal cisco, microsoft and other cert doesn't do? What make them less qualified that this
If everybody can apply a certificate, then why even bother? 
IT change fast, what the examination for it will based on. 

Personally, I believe IT is doing pretty well without any govt involvement. And from what I understand the justification during the open day. I still believe a board is not needed. And consider that many IT company aim for govt contract one time or another, it could spell trouble for industry in general. 

To be honest, after the open day, there is still more question than answers, many is unanswered. 

p.s I am jobless starting on 24rd Dec, anyone outside malaysia want to hire me?

#CPB2011 redux

I have been in the discussion. It is my fault in spreading some of the old fud. I here by apologized for the mistake I have done

Here is a few thing I have gathers, and I just pick up words as said by the representative.

1) CNII is limited to the govt only, for private sector it is business as usual!!!!!
3) Everybody can certified.
4) If you are a startup that are not involved CNII, it ok not to be certified!!!!

More to that later.

Let dig out a Localtunnel!!!

So as I creating a very early version of the webapp, I want to get opinion from friends. A few thing happen, I'm not quite ready to commit to a full blown installation. And I am too lazy to setup a ssh tunnel, and I didn't attempt it before.

Of course, there is option number 3. Localtunnel, basically localtunnel is a ruby script that make tunneling very easy, without you having a ssh box remotely. They provide a linux box remotely, which have no shell and only authorized_keys file in home. So you don't need a dedicated ssh box somewhere.

On linux, make sure you have ruby install, and rubygems installed. On ubuntu, on fedora just replace with yum. The rest of the instruction is on the link below.
sudo apt-get install ruby1.8 rubygems

Bills Watcher Malaysia

Recently I got involved in a Open Data Movement in Malaysia, and one of my recent project is called Bill Watcher. It is a webapp that broadcast via twitter and rss on bills that is being debated, and being passed recently.

Main Page

Bills Detail

Basically this page scraped from the malaysian parliament website. And load into a sqlite database now, which I don't really care, because I am doing it via sqlalchemy for now, it make it easy to move to other database. just read from the database via sqlalchemy, and render it. Use 960 gs to make the rendered page look nice. 

The feature of this app, is pretty small, the pdf is iframe, there is no login. Fancy sharing feature is via twitter and facebook button and RSS. Commenting will be provided by disqus, if I figure out where to put it. Javascript is only used on twitter and facebook button. 

I consider this as MVP for this, small basic feature to be extended. So feature will be added as requested, but not all will be added. Also not a lot of information is available on the parliment bills page, so feature will be based on effort needed to extract it from other source, which actually not really easy. But otherwise, we will try our best to get feature to be added inside. 

What next, we going to host it live soon. Then we will add disqus, then finalized twitter notification. To get your hand dirty now. Go to the github link 

I will transfer to the sinar repo soon. Need to do a bit update across repo. 

Recently a bill being debated intensely, shows that how many stuff we don't know about the decision process in the country, even though it is there on the parliament side. Which does not make it easy for use nor navigater around. 

What Can't One Do in #CPB2011

From this link

If this bill is pass you are not allowed to

  1. Fix your friends computer, that is considered practice of computing. And you need to pass an examination for that anyway
  2. Write your own mobile app, iOS or Android, because you need to proof that you can
  3. On top of that, you probably don't have a service provider cert anyway. You can't sell it on the app store and android marketplace
  4. So if there is new code being written on the internet, you probably cannot use it for service. Because there is no certification for it.
  5. And you can't use anything that is new, to provide a service because you didn't register it. 
  6. And the term of service is wide, you now can't sell your software
  7. Write instruction on your blog on the software
  8. You can't publish an web app, that uses your new found knowledge. 
  9. Tweet about problem
  10. Rant about the problem
  11. Commit code to github, because you are not certified
  12. Report bug, because, you didn't register that you know about other software that you use
  13. You can't even learn new technology and share your knowledge
  14. You can't share your idea, because you probably not qualified for design work
  15. I am not allowed to setup my own server at work, and have to wait for other to do it, and worst if I am the business owner, I have to pay more. 
  16. You can't even publish code unrelated to your service on github, because you are not a registered expert on that subject
  17. You can't give input to forum on solving problem
  18. You can't be a generalist. Now you need 1 sysadmin, 1 database people, 1 server people, 1 javascript people, 1 css people, 1 artist, and 1 backend guy(not sure how to split it up), 1 tester
  19. You can't write your own unit test. Because you are not a certified tester
  20. You can't do mockup
  21. You can't do prototyping on your own
  22. You can't do javascript, so no ajax or what not, because there is no cert
  23. You can't do html, because there is no cert
  24. I can't do python as a service, because there is no cert
  25. I can't even use sql, because I have no cert

Just among those I remember. Could be more

The Malaysian Computing Profession Act

One of the hottest topic today is the Computer Professional Act 2011. As a software developer, this affect me quite a lot. I for one is against such a act, for both professional, and personal reason.

A draft have been out not very long ago.


A few definition on the document

“Computing” is a goal-oriented activity to plan, architect, design, create, develop, implement, use and manage information technology or information technology systems.
So here I assume computing in general, from creation to usage.

“Computing Practitioner” means a person who has a job function in computing or qualification in computing
Here I assume this mean people that uses computer, from architect, software developer, system administrator, end user. You get the idea.

Finish a few definition. Now lets jump ahead to part 3, on registration. Lets continue
There shall be indicated against the name of each Registered Computing Practitioner and Registered Computing Professional kept in the Register a record of disciplines or specializations on computing obtained or acquired by such personnel through academic qualification or training including on-the-job training or skill or specialist or professional certifications.

So it means here, I need to declare what I know, what does it mean? Does it mean that I need to produce a cert? Does it mean my boss need to write a letter? Thing get a bit uncomfortable here.

  • If we just declare, then it means the whole process is useless? We can just claim what we know.
  • If we need to have a cert then here is an interesting part, a lot of new technology don't provide cert. Open Source Project also don't always have cert. Thing change fast enough the training will need to be updated often, and it will be expensive. 
  • On job training? My experience on job is, it tend to go safe, and rarely able to use the latest in technology
  • School? That usually is not helpful at all
  • Then if not above, that how to judge this?
Let go next 
For certifications mentioned in subsection (2), the Board may maintain a list of certifications provided by associations and bodies in Computing recognised by the Board and will keep the list updated from time to time.
What is the criteria to go into a list? Does it mean it will monitor training provider, via accreditation? Consider the countries, reputation. How not to abuse it.


Now on to qualification

14.(1) (a) Subject to this Act, a person who holds –  
(i) the qualifications required for Graduate Membership of a professional body or organisation recognized by the Board, and the qualifications are recognised by the Board; or 
(ii) any qualification in Information Technology or Computing which is recognised by the     Board; or 
(iii) any other qualifications, certifications or relevant experiences recognised by the Board,
      shall be entitled on application to be registered as a
      Registered Computing Practitioner.
So remember the definition on top, Computing Practitioner is someone that just use a computer. From my understanding, does it mean that, everybody need to register to use a software?  For i) what kind of qualification, for iii) what kind of relevant experience, and certification?

2) Subject to this Act, the following persons shall be entitled on application to be registered as a Registered Computing Professional:

(a) any person who is a Computing Graduate or any person who has other qualifications recognized by the board 
    (i) who has obtained the practical experience as prescribed under subsection (1)(b); and 
    (ii) who has passed a professional assessment examination conducted by the Board,    
Corporate Member of or is a a professional body or organisation recognized by the Board; and
    (iii) who has paid the prescribed fee and 
    (iv) who has complied with all the requirements of the Board;
On top of experience we now have examination.

  • What examination will be conducted? How many type
  • What platform? or how many
  •  Or practical knowledge on only stuff from one vendor? 
  • Does it cover just computing, then in the definition above, it can be anything, even using spreadsheet is computing. 
  • If not what does it cover?
  • If so, does it mean everybody can claim to be professional?

An extra note on, examination is that useless, there is too many tool, too many option. Limiting to one will be ridiculous and unfair to other.


Lets go to the scope of what I can or cannot do

(a) a Registered Computing Practitioner employment may take up which  requires him to perform Computing
Services subject to the 
i. work is carried out under the supervision or instruction by a Registered Computing Professional,
ii. similar work scope has been carried out by the Registered Computing Practitioner before.
A Registered Computing Professional may only provide
Computing Services in the disciplines or specialisations of Computing
 he is qualified to practise and as is shown in the Register under
subsection 12(2).
Does it mean that, I register myself as a software developer, does it mean that

  • that I can only write program. 
  • I cannot administer my own server? 
  • I cannot fix my own computer? 
  • Because I only declare that I can only do webapp, does it mean I cannot write other type of software(say web server). 
  • Now if I self taught new technology, does it mean I cannot use it at work when opportunity is given. 

And because it is based on similar work carried out before, it means that;

  • I can only do stuff that I only do before,
  • so if I started with writing webapp, does it mean I cannot be administer my own server at work? 
  • it mean I cannot write mobile application professionally? does it mean, if i ever sell my own mobile application, because I did not declare this, outside of my work?
  • Or I can only work in one industry, once in banking forever in banking

I graduated in AI in UM(don't ask), does it mean I cannot do mobile application? etc.

One more thing is how do we describe a similar scope? Does it go very granular, i.e you can only do php, because you have done php. Or it can be high level, i.e you have to do because you do web development.

Software tend to require multiple skillset, in reality a person involve in software need to have multiple skill, more obviously seen in software developer. And we have to learn on our own, the world just move fast, if we ignore the problem, the train just gone. Setting up rule like that, just make thing worst.

I believe that this act do not benefit professional like me for many reason.

  • For one, I don't see the point, what kind of qualification for one, everybody use computer now? 
  • It limit my opportunity, I need to declare what I know, somehow. 
  • Worst I can only work on stuff I declared and no more, 
  • Opportunity tend to be based on past experience. 
  • Self learning, is not covered, and will not be recognized. 
  • Worst I am not even allowed to use stuff that is new, because I am did not declare that I am know it. But I need to use to learn, to use a new technology. So does it mean I can only learn in secret. 
  • Technically I am not even allowed to sell mobile app, which nowadays is very easy to distribute
In the end of the day, I believe this bring us 10 step backward for the IT industry. And only make thing worst. 


Realized New Behavior in Virtualenv

Not too long ago, I covered virtualenv. One of the behavior is the --no-site-package option which will totally isolate the python environment. Starting from version 1.7, --no-site-package will be the default option, so you don't need this flag anymore. If you invoke virtualenv with this flag, it will display an error.

Converting PDF to Text with pdftohtml

Previously I have tried to extract pdf information by converting PDF to text, as described here.

Problem is,  a big wall of text is very hard to process.
Here come pdftohtml it is part of the poppler package on linux. But gnuwin do not have it for windows. Which is one reason I use pdftotext.

pdftohtml convert pdf to html. simple usage is
pdftohtml yourpdffile.pdf
You will get your html file. But it is a bit plain as they just extract text from it. It there is image inside pdf, or you pdf is pretty complicated, like Malaysian Hansard. You can use the -c
pdftohtml -c yourpdffile.pdf
Here is the catch, it will generate 1 html per page in the pdf, with images. But the layout is maintained. For document like Malaysian Hansard, it would be hundreds of page.

Then there is way to produce xml
pdftohtml -xml yourpdffile.pdf
You will get an xml file which the position information.

p.s I'm using this for Whether there will be result today. 

Remote Control your android with Airdroid

So I found this free android app, which allows remote control an android phone using a web browser. Once the app is install, you can start it pretty easily, with the url on displayed on the app. 

What you get after open a browser, is a desktop like interface where one can use it to do quite a lot of stuff. 

You can manage app from it, it install by redirect to the android marketplace web interface, but you can uninstall using the application Icon.

You can manage app install

You get a file manager that manipulate your sdcard, you can copy file from the PC, using the import button, and download the file using the export button on the file manager. 

File manager like what you use in normal desktop OS

Write or reply sms
There is also photo, and musics on the browser, view the contact list, etc. It is really have a lot of feature. 

For a free app, this actually offer a lot of stuff. For a webapp, it sure look like a full blown desktop. You can control the apps on the phone, but it is ok. It is useful enough for me. 

The app is free on android market place, I recommend that you guys try it out. 

This is a post to test on empire avenue


This is a line to test my blog integration with empire avenue, in my other lil experiment...

Adventure in Bottle( the web framework)

So I have been scraping data online for sometime. While scraperwiki have an API that allow third party app to get data in json/xml form. I think I can make it easier, because scraperwiki query involve doing a sql query on the sqlite datastore. Thus I take the opportunity to learn new python web framework.

The framework only need to handle request, and spit data in json(maybe xml later). It does not need a template, it is json. It don't need an ORM, the data most probably scrape from somewhere else. It do not need session, it is meant to be use by library. The data is open anyway. 
The first framework I try out is Bottle

The first thing I notice is the amount of setup that I have to do, coming from a django background. Which is well known for the big file. The amount of setup is small. Just install using 'pip install bottle'.

Essentially just an application defined, with the object Bottle()

And pass to the run function. 

By default bottle already have a default application, so you don't strictly need it, I just to put it there to show that it is there.

Another thing I have noticed is, there is no url route in a separate file. A route decorator is added to a function that I want to serve in the web app. The route is part of the application(the Bottle() object), and I can limit the type of request I can do on it, like POST/GET. I found that this approach is pretty clean, it reduces the boiler plate like in django views. 

Another thing to notice is. I do not specify a response method/object(like django). That is another nice thing about bottle. If the function returns a dict, the response will be in json. If string then the mimetype is text, etc. There is no need to specify a function for response.

Finally to run the app, just run python (or any python file with the bottle run function). You have an webapp. 

For this project I didn't test the template, but from the doc, it is specified with a view decorator, which I think is nice, but I don't need it now. From the doc, I found that it is pretty clean.

Because bottle is a micro framework, there is no script like django, no ORM, I uses sqlalchemy here. There is no session support too. But interestingly I don't feel that I missed anything. In fact, it is pretty pleasant to use. Though session will definitely bite me if I ever have to implement login, but solution is on the documentation.

Overall, it is a fun framework to use, even though this is a small project. The documentation is pretty good. I might use it for future project.

Tuesday, November 01, 2011

Using Python Function with sqlite

Note: You can find the docs in the python doc page

This is more of a experience. Not too long ago, I have scrape from the parliament website on profiles of Member's of Parliament, you can find the result here.

The thing is, as I use the data from the sqlite database, I download from the site, I realized that, the Title is part of the name of the MP's. So one would get "XXX , Y.B Tuan". Y.B Tuan is the title.

That would make query like 'select Parti from swdata where Nama=name' hard. Because this is precisely what I am looking at, for another project.

On the other hand, sqlite3 module, apart comes with python standard library since 2.6. Actually have a function called, Connection.create_function.

So I wrote a little function called get_name, and the example show how it works.

import sqlite3 
def get_name(name):
    return name.split(',')[0]
s = sqlite3.connect('dbname')
# attach the python function
# and use it
result = s.execute('select get_name(Nama) from swdata')

Just define a python function,  make sure it return datatype that is compatible with sqlite, attach it with create_function. Now you can use it in your sqlite query in python

Hope this is useful for someone. CHEERS

A little plug, this is something we try to work on in this little group call Sinar Project, and this is still in an early stage

Converting PDF to Text

So I have recently involved with a project to extract data from PDF. Which is actually evil, but that is not important now.

On linux there is a set of utilities comes with xpdf program. It should be part of the default package installation, if not, you just apt-get or yum it.

On windows you can go to the gunwin32 page, I just download the zip just so i would not have to remove it with a uninstaller.

I don't really need the layout information, on it. so I just use pdftotext.

On windows
program_location/pdftotext.exe -layout pdf_file.pdf

On linux, just
pdftotext -layout pdf_file.pdf

The -layout would maintain the layout of the text as from the pdf. Otherwise, the positioning for certain text will be inconsistent.


A scraper running on the cloud

I have been writing scraper for sometime, as you can see in some of my old post here.

So recently thanks to Kaeru, introduced to me, scraperwiki. This is basically a service for you to run scraper on the cloud, with additional benefits:

  • It runs on the cloud
  • It provide infrastructure to store the data, in form of sqlite database, which you can download.
  • It provide easy way to dump data as excel
  • It provide infrastructure to convert the data into API 
  • Somebody can fork the scraper and do enhancement on it. 
  • A web based IDE, so you just write your scraper on it. 
  • Everybody can see the code of the public scraper. 
  • Scheduled task
One very cool thing about scraper wiki is, it support a set of third large library that can be used. It support Ruby, PHP, as well as Python. The API for scraper wiki is pretty extensive, it both covers it's own scraper, geocoding function, views for the data hosted on scraper wiki etc. 

My only concern is, let say I want bring my scraper out of the service, I will need to rewrite the saving function. But on the the data can be downloaded anyway, and I use python, so it is not that big of a deal. 

Below is a scraper that I have written, on scraper wiki. While it is mostly a work in progress, it show how it would look like. 

Event in Late Sept and Early Oct

Late September and early October is a busy month for geeks.

On 21st Sept, there will be a Software Freedom Day, in UniKL

Detail in

On 24th Sept, there will be a python malaysia meetup. It will be held in fluentspace. Near Kelana Mall.

Detail :

On 29th Sept, there will be a google dev fest. It will be held in UCTI. The focus here is on android, html5 and google analytics

Detail :

On 1st Oct, there will be a geekcamp, that will be held on Itrain, near wisma mca. Thi is the the tech focus on event, barcamp style.

Detail on the page

This will be a busy period of the month, it should be fun

Accessing Server from Android

Recently I help maintain some server, sometime I tend to move around. So I decide to make my phone to be useful.

Android actually have a couple of app that is useful to remotely access a machine. Some of them is free.
For connecting to SSH, I found that connectbot works extremely well. It only does ssh and telnet, and thats about it. It is pretty straight forward to use. For accessing windows server, I use 2x client. Which again another another straight forward RDP client. Both connectbox and 2x client is free, and that is awesome.

The only issue on using android phone to access a server remotely is. I have a desire hd. While the screen is pretty large for a phone, typing command via ssh or, navigate around a windows server via RDP can be still a pain. It is still smaller that most desktop screen. And I don't have a full size keyboard on the phone. Which is another pain especially I access linux server most of the time.

So it can be a pain to use at time. But for quick fix or checking on server. This work pretty well.

I attached some links for the app below

Many Ways To Grep File Content

So not too long ago I have posted on twitter

This spin to a few other way to do grep.

A few have suggested on IRC and facebook, the i parameter is to make keyword not case sensitive.
grep -iR keyword directory 

Another suggestion on IRC. 
grep -iR --exclude=file-to-ignore keyword directory
Another tweet i have receive is,

Then the last one I discovered on google is ack-grep

ack-grep keyword directory
and again, -i make case insensitive search.
ack-grep -i keyword directory
ack-grep output is nicer, and automatically ignore binary. It is slightly different than grep. But both get the job done., to me anyway

Python Dateutil Redux

Not too long ago, I covered one use of python dateutil, on the blog here.

The library itself is pretty nifty in other case as well. In this case date difference. While python datetime module in the standard library, the datetime.timedelta is used to find difference in date, it counts up to the days. In my case, I want to count it to years.

That is where dateutil comes it. It have a module called, relativedelta. Which do actually count to years. To use it is a matter of import and use it

from dateutil.relativedelta import relativedelta
date_diff = relativedelta(date_from,date_to)
print date_diff

It as you can see does count up to years, also months. Which is useful if you wanted to find difference in date beyond just days.

Python Web Scraping

There is time where there is information in govt website of is very useful, but unfortunately the data is in form of website, it could be worst as it can be in PDF. So it can be a pain if we wanted to use information for programming, but there is no API.

On the other python is a pretty powerful language. It comes with many library, include those that can be use to do HTTP request. Introducing urllib2, it is part of standard library. To use it to download data from a website can be done in 3 line of code
import urllib2 
page = urllib2.urlopen("url")
The problem, then is you get a whole set of HTML, which a bit hard to process. Then python have a few third party library, the one I use is Beautiful Soup. Beautiful Soup is nice that it is very forgiving in processing bad markup in HTML. So you don't need to worry about bad format and focus to get things done. The library itself can also parse XML, among other thing.

To use Beautiful Soup,
from BeautifulSoup
import BeautifulSoup
page = "html goes here"soup BeautifulSoup(page)
value = soup.findAll('div')
print value[0].text
But you need to get the html first don't you?

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("url")
soup = BeautifulSoup(page)
value = soup.findAll('div')
print value[0].text
To use it, just download the data using urllib2 and pass to to beautiful soup. To use it is pretty easy, to me anyway. Though, urllib2 is going to be re organized in python 3. So code need some modification.

To see how the scraper fare, here is a real world example, in github part of a bigger project. But hey it is open source. Just fork and use it, in the this link.

So enjoy go forth and extract some data, and promise to be nice, don't hammer their server.

Usefulness of __dict__

I am writing a validation function to check for objects instance, in this case, it is a django models. The object is mostly the same, except there is a extra field that need to be checked, that field exist only in certain object, but not all object.

So being lazy, I use the __dict__ special attribute. It exist in every python object, and contains symbol tables of the object, which I can use to check for existence of a attribute. For example

so we can do thing like check a item like 'test_val' in c.__dict__ to check for a existence of a attribute. 

From the same idea, one very cool thing to do is, convert get the __dict__ and convert the attribute into a json object, with example

One idea I have been playing is, because django model instance is essentially a objects, we can use the same idea, to output the value in a single django model instance into a json string, but we need to be careful on certain data type like double etc. 

The big catch is of using this is, __dict__ only contains attributes of a objects, it does not contain built-in attributes, and it does not contain methods. So if a value is from a method, you need to think of another way. Which actually sucks, as I have a lot of such methods in my classes, well I probably figure out by that time.

So, happy coding

The Great Global Hackerspace Challenge

Not long ago, Me and the Hackerspacekl gang, join The Global Hackerspace Challenge. Basically we build a Arduino Shield that process words.

Actually the whole process is best viewed on the hackerspacekl blog.

In summary, here is what we learned.
- K.I.S.S, Keep It Small and Simple
The Atmega168 have only 16KiB of memory which the code itself taken half of it. and it is slow, it runs on such a speed 16 MHz. Modern computer is around 2 GHz. It better be simple, because debugging is hard.

- Serial is your best friend (on arduino anyway)
Unlike programming on PC or Web app. There is no print function to be used to debug, there is no debugger either. Even we have a LCD, it is not reliable. And serial pretty much built into the arduino board. So USE IT.

- Optimization matters
On dynamic language like python, or more modern language like java. There is garbage collector, even c/c++ have OS to help on that. So one don't need to worry about memory issues. But on arduino, removing unused code and code, and save a lot of memory.

- One need to think very low level.
The I2C to eeprom code is about  shifting bits/bytes to write to the EEPROM. For once we are thinking in bytes. And we need to know a bit of hardware, to write the code properly.

Overall, it is fun and a interesting experience. For a programmer that spend time on python, or doing web development. Opening one eyes to embedded programming a bit, a little bit of experience that meant a lot to me.

Let Android read my SMS

Another I tried on SL4A is to play with their SMS function. So, with the resulting code below:

  1. As usual import library, and create the Android() object
  2. and from the Android object call smsGetMessages, with a required parameter for unread message, True for unread only, false for other wise.
  3. and call the build in android Text To Speech software to read it out, by calling ttsSpeak method.

import android
droid = android.Android()
result = droid.smsGetMessages(True)
for i in result.result:

the result in the output is a list of such dictionary, in python notnion. Since I only want the message so I call it by i['body']

{u'_id': u'59',
u'address': u'Address of sender aka the phone no',
u'body': u'Message Body',
u'date': u'1300254988000',
u'read': u'1'}
the date is the datetime read and the read is 1 is read, and 0 otherwise. The ttsSpeak method is easy to use too, just pass in a string.

Originally I read all messages, and pass to the tts library. Turn out to be a bad idea, because I have no idea how to stop it from speaking once it started....

My First Day on Android Scripting

I got myself a android phone not that long ago. One reason is, it is pretty amazing piece of hardware. Unlike iphone the SDK is available on major OS, including linux. One of the many stuff I installed is SL4A, Scripting Layer for Android, and the python interpreter for android.

One cool thing SL4A do is, we test the code remotely from a python. The wiki page have a good explanation on how to do this.

One of the first thing I play around via the python interpreter on the laptop

import  android
droid = android.Android()
data = droid.getNetworkOperatorName()
print data.result
The api page have a lot of information, so that is one place that one should look. 
Not much of a program nor a post, but yeah it is a start of something beautiful. I hope

The Solvers Manifesto

The Solver Manifesto shows our dilemma, as a web developer, probably not much to the backend developer, or system developer. But definitely web developer, which is a big chunk of software development jobs.

We are not engineer, we are not paid as much as an Expert, and we don't get respected like a artist. Yet, we hired because we knows how to program, yet we have to make things nice. And yet, our opinion is not respected. Even when technicians fixing stuff, nobody question how they do it. Not for us.

p.s yes Yet Another Rant Post

Random Python Learning : partial

On the holiday I decided to learn more on python, and really there is much to learn!!!!!!
One of the module I learn about is the functools module. 

One of the interesting function that I learn is partial, basically it is partial application of a function
for example

import functools
def adder(a,b,c):
    return a+b+c
def adder2(a,b):
    return a+b
add_three = functools.partial(adder,1,2)
add_one = functools.partial(adder2,1)
print add_three(1)
# would print 4
print add_four(3)
# would print 6
So you can wrap a existing without rewriting it, but give some default value to the existing function. And is actually used in python decorators.

read more here