For a long time geeks like me tend to hide behind the scene when it comes to politics, unlike the US there is no law like the DMCA in Malaysia. We Malaysian don't see often see a law pass that directly affect IT industry here. Until this year, suddenly we have Computing Professional Board bill.
Which you can find a lot more online, but in short it is about regulate programmers that involve in Critical National Information Infrastructure. Which the list provided is too vague for comfort.
So what I hope to do next year, involve more somehow. What this year have shown us, ignorance and apathy can bite. Not all law have an open day that we all know, like the CPB. So law can be passed under our nose, and it will bite, and bite hard.
p.s Not including my own post on the Computing Bill, because it have quite a lot of inaccuracy
this is a place where i put my thought of technology and things that i do.
Saturday, December 31, 2011
Friday, December 30, 2011
2011 in Review: Awesome Book
Books I have read in 2011
- Makers by Cory Doctorow
- Little Brother by Cory Doctorow
- Make Magazine not book, but still
- Quicksilver by Neal Stephenson, have still a whole set of Baroque Cycle, still try to finish it
- Cryptonomicon by Neal Stephenson
- Snow Crash, Neal Stephenson
- Anathem, Neal Stephenson
- Javascript the Quick Reference, Douglas Crockford
- Make Magazine
- Arduino Bots and Gadgets, that actually belongs to hackerspacekl
Wednesday, December 28, 2011
2011 in Review: Achievement Unlocked
Just a list of stuff being done in 2011
- Python on android with SL4A
- Programming the arduino for The Great Hackerspace Challenge
- Scraping govt websites for hoping to the Open Data Project
- Writing software for a robot
- Pickup a new python web (micro)framework
- Opening up data with scraping knowledge
- More linux skills on diff and grep
- using android for work
- rooting android and use cyanogen
- know why pdf is evil
- and use elasticsearch(which I didn't blog yet)
Some of the cool one anyway.
Saturday, December 24, 2011
Copy and Paste on GNU Screen
So recently I finally found out how to do copy and paste on gnu screen, previously I just do a drag on a mouse.
So copy and paste on gnu screen,
- ctrl-a [
- move to the first character of the string to copy, press enter
- move to the last character of the string to copy, press enter
- now to paste, press ctrl-a ]
p.s kina ashamed that it take me so long to figure out =.=
Friday, December 23, 2011
Rooting HTC Desire HD on 64bit Linux Mint
One reason why I have been resisting rooting my phone is because of the old way of rooting the HTC Desire HD. Best illustrated by the wiki on cyanogen mod. Essentially a many step process, of copying file, downgrade, copy file, reboot, and reboot.
Recently I have discovered Advanced Ace Hack Kit, basically these guys already automate everything as a 1 step process, if the firmware need to be downgrade it, will do it. But here is the big catch, it is a 32bit binary, and I installed a 64bit linux.
Thankfully mint, which essentially built on top of ubuntu(which in turn debian) have ia32-lib, which have 32bit version of library for 64bit OS. Just install it.
And read the manual, it is inside the package along with all the tools needed, unzip, and read the manual inside it. It covers everything, you need. There will be a quiz in the end ;-)
Recently I have discovered Advanced Ace Hack Kit, basically these guys already automate everything as a 1 step process, if the firmware need to be downgrade it, will do it. But here is the big catch, it is a 32bit binary, and I installed a 64bit linux.
Thankfully mint, which essentially built on top of ubuntu(which in turn debian) have ia32-lib, which have 32bit version of library for 64bit OS. Just install it.
And read the manual, it is inside the package along with all the tools needed, unzip, and read the manual inside it. It covers everything, you need. There will be a quiz in the end ;-)
Thursday, December 22, 2011
A little adventure on the bill watcher on the pdf part. One I am hoping to able to attach comment to a section on the pdf, then have a script that compile the comments, and email blast to the MP's
So a few thing I am trying
Loading pdf into iframe,
Use pdf.js
So a few thing I am trying
Loading pdf into iframe,
It is actually a pretty standard approach of handling it. Except I am linux now, and i am lazy to install a plugin.
Also iframe height is a problem, it is hard to get it work well across browser.
I don't know ways to have javascript to access the text inside the pdf. So I don't know how to attach jquery on it.
I don't know ways to have javascript to access the text inside the pdf. So I don't know how to attach jquery on it.
Use pdf.js
Work well on firefox, but not anything else. Also this is also a very new library.
While there is a text layer for text, it don't line up well in text in the bill document . I suspect it is because of badly form pdf.
Javascript should work well on it though, as it have a text layer div.
Convert pdf into html
Convert pdf into html
Have jquery to use .load(). It work across browser, until the layout gone wrong, could be because of bootstrap(the css library). Pdftotext work very well, the html generated can be processed pretty easily. The layout on the other hand....
javascript should work well too.
load converted pdf's html into iframe
javascript should work well too.
load converted pdf's html into iframe
It does not solve the height problem on iframe, but javascript should able to use it. It is a bit of a pain though. Because the pdf generated is pretty well formed. Solved the issue of layout gone wrong because of css library though.
Saturday, December 17, 2011
New Toy for christmas
Before Unbox
After Unbox
Thursday, December 15, 2011
Hackerspacekl Meeting 2011
Hackerspacekl will have a meeting to decide on the future direction of our hackerspace. Detail of the meeting can be found in
Tuesday, December 13, 2011
Opinion after #CPB2011 II
From Amanz http://amanz.my/2011/12/lebih-info-berkaitan-akta-badan-perkomputeran-professional-diperlihatkan/ |
For Item 4. Why not just take existing certification and use it? Are Malaysia that unique?
(This is actually question asked by many during the open day)
For Item 5. I did not receive a straight answer on this.
For Item 6. How does creating an extra barrier enhance supply for manpower?
For Item 7. This is disturbing. Does it mean, that refusal to register make me untrusted?
I think this is troublesome
p.s Anyone in India want to hire python programmer? I'm jobless starting on 23rd dec.
Opinion After #CPB2011 Open Day
While from their mouth, the Computing Professional will be limited to govt, I still waiting for supporting document. And from their mouth again, it is a work in progress. Interestingly no one from mosti is attended.
Paraphrase from the discussion from representative, it is to i.e paraphrase from what I understand the word they say.
Paraphrase from the discussion from representative, it is to i.e paraphrase from what I understand the word they say.
- Raise a standard of computing professional, both in education
- and b) work
- Because other country have a requirement in ISO for staff with certain requirement anyway.
- It is before the requirement before the govt can sign the Seoul Accord, disclaimer I do not know what is the requirement for the accord, so somebody enlighten me please.
- Add accountability
- And it will be limited to govt CNII project only.
- Everybody can apply to it
- For IT professional to be recognized globally?
For 1, I do not believe there is a need for a board to fix it. There is an accreditation agency for university already, and they are not doing a very good job. What make them think they can do a better job? Chances is, if a university/college have a good curriculum, there is no need to create a board for the first place, market will grab them first. SO FIX CURRICULUM FIRST!!!!!
For 2. I am not even sure what professional means.
For 3. I believe that , it should be in the SLA. Again not a board should decide it.
For 4, I won't comment, because I don't know enough.
For 5, to add accountability. To be honest, I don't know understand how forming a board would help in that. There is contract, ain't that more than enough?
For 6, during the meeting, there is no clear definition of CNII for the first place. Even though the representative is assuring us that, it will be limited to govt. Unfortunately in IT, it is interrelated. A big chunk of malaysian IT, is involve in govt project either in contract or subcontract. Even though it says in govt, if tools get adopted, does it mean it falls under their jurisdiction? How about stuff that is regulated, such as system that interface with bank negara? Does it count as CNII?
For 7, if everybody can apply, then what is the point then?
For 8, that is done by, create good product, open source or not. Again not by a board.
Extra stuff
What does the cert do, that normal cisco, microsoft and other cert doesn't do? What make them less qualified that this
If everybody can apply a certificate, then why even bother?
IT change fast, what the examination for it will based on.
Personally, I believe IT is doing pretty well without any govt involvement. And from what I understand the justification during the open day. I still believe a board is not needed. And consider that many IT company aim for govt contract one time or another, it could spell trouble for industry in general.
To be honest, after the open day, there is still more question than answers, many is unanswered.
p.s I am jobless starting on 24rd Dec, anyone outside malaysia want to hire me?
Monday, December 12, 2011
#CPB2011 redux
I have been in the discussion. It is my fault in spreading some of the old fud. I here by apologized for the mistake I have done
Here is a few thing I have gathers, and I just pick up words as said by the representative.
1) CNII is limited to the govt only, for private sector it is business as usual!!!!!
2) IT IS A WORK IN PROGRESS!!!!!
3) Everybody can certified.
4) If you are a startup that are not involved CNII, it ok not to be certified!!!!
More to that later.
Here is a few thing I have gathers, and I just pick up words as said by the representative.
1) CNII is limited to the govt only, for private sector it is business as usual!!!!!
2) IT IS A WORK IN PROGRESS!!!!!
3) Everybody can certified.
4) If you are a startup that are not involved CNII, it ok not to be certified!!!!
More to that later.
Sunday, December 11, 2011
Let dig out a Localtunnel!!!
So as I creating a very early version of the webapp, I want to get opinion from friends. A few thing happen, I'm not quite ready to commit to a full blown installation. And I am too lazy to setup a ssh tunnel, and I didn't attempt it before.
Of course, there is option number 3. Localtunnel, basically localtunnel is a ruby script that make tunneling very easy, without you having a ssh box remotely. They provide a linux box remotely, which have no shell and only authorized_keys file in home. So you don't need a dedicated ssh box somewhere.
On linux, make sure you have ruby install, and rubygems installed. On ubuntu, on fedora just replace with yum. The rest of the instruction is on the link below.
http://progrium.com/localtunnel/
Of course, there is option number 3. Localtunnel, basically localtunnel is a ruby script that make tunneling very easy, without you having a ssh box remotely. They provide a linux box remotely, which have no shell and only authorized_keys file in home. So you don't need a dedicated ssh box somewhere.
On linux, make sure you have ruby install, and rubygems installed. On ubuntu, on fedora just replace with yum. The rest of the instruction is on the link below.
sudo apt-get install ruby1.8 rubygems
http://progrium.com/localtunnel/
Bills Watcher Malaysia
Recently I got involved in a Open Data Movement in Malaysia, and one of my recent project is called Bill Watcher. It is a webapp that broadcast via twitter and rss on bills that is being debated, and being passed recently.
Main Page |
I will transfer to the sinar repo soon. Need to do a bit update across repo.
p.s
Recently a bill being debated intensely, shows that how many stuff we don't know about the decision process in the country, even though it is there on the parliament side. Which does not make it easy for use nor navigater around.
Thursday, December 08, 2011
What Can't One Do in #CPB2011
From this link
http://www.scribd.com/doc/75107593/CPB2011-Draft
If this bill is pass you are not allowed to
http://www.scribd.com/doc/75107593/CPB2011-Draft
If this bill is pass you are not allowed to
- Fix your friends computer, that is considered practice of computing. And you need to pass an examination for that anyway
- Write your own mobile app, iOS or Android, because you need to proof that you can
- On top of that, you probably don't have a service provider cert anyway. You can't sell it on the app store and android marketplace
- So if there is new code being written on the internet, you probably cannot use it for service. Because there is no certification for it.
- And you can't use anything that is new, to provide a service because you didn't register it.
- And the term of service is wide, you now can't sell your software
- Write instruction on your blog on the software
- You can't publish an web app, that uses your new found knowledge.
- Tweet about problem
- Rant about the problem
- Commit code to github, because you are not certified
- Report bug, because, you didn't register that you know about other software that you use
- You can't even learn new technology and share your knowledge
- You can't share your idea, because you probably not qualified for design work
- I am not allowed to setup my own server at work, and have to wait for other to do it, and worst if I am the business owner, I have to pay more.
- You can't even publish code unrelated to your service on github, because you are not a registered expert on that subject
- You can't give input to forum on solving problem
- You can't be a generalist. Now you need 1 sysadmin, 1 database people, 1 server people, 1 javascript people, 1 css people, 1 artist, and 1 backend guy(not sure how to split it up), 1 tester
- You can't write your own unit test. Because you are not a certified tester
- You can't do mockup
- You can't do prototyping on your own
- You can't do javascript, so no ajax or what not, because there is no cert
- You can't do html, because there is no cert
- I can't do python as a service, because there is no cert
- I can't even use sql, because I have no cert
Just among those I remember. Could be more
The Malaysian Computing Profession Act
One of the hottest topic today is the Computer Professional Act 2011. As a software developer, this affect me quite a lot. I for one is against such a act, for both professional, and personal reason.
http://www.scribd.com/doc/75107593/CPB2011-Draft
A draft have been out not very long ago.
Definition
A few definition on the document
Registration
Finish a few definition. Now lets jump ahead to part 3, on registration. Lets continue
So it means here, I need to declare what I know, what does it mean? Does it mean that I need to produce a cert? Does it mean my boss need to write a letter? Thing get a bit uncomfortable here.
Qualification
Now on to qualification
An extra note on, examination is that useless, there is too many tool, too many option. Limiting to one will be ridiculous and unfair to other.
Scope
Lets go to the scope of what I can or cannot do
And because it is based on similar work carried out before, it means that;
I graduated in AI in UM(don't ask), does it mean I cannot do mobile application? etc.
One more thing is how do we describe a similar scope? Does it go very granular, i.e you can only do php, because you have done php. Or it can be high level, i.e you have to do asp.net because you do web development.
Software tend to require multiple skillset, in reality a person involve in software need to have multiple skill, more obviously seen in software developer. And we have to learn on our own, the world just move fast, if we ignore the problem, the train just gone. Setting up rule like that, just make thing worst.
Conclusion
I believe that this act do not benefit professional like me for many reason.
http://www.scribd.com/doc/75107593/CPB2011-Draft
A draft have been out not very long ago.
Definition
A few definition on the document
So here I assume computing in general, from creation to usage.“Computing” is a goal-oriented activity to plan, architect, design, create, develop, implement, use and manage information technology or information technology systems.
“Computing Practitioner” means a person who has a job function in computing or qualification in computingHere I assume this mean people that uses computer, from architect, software developer, system administrator, end user. You get the idea.
Registration
Finish a few definition. Now lets jump ahead to part 3, on registration. Lets continue
There shall be indicated against the name of each Registered Computing Practitioner and Registered Computing Professional kept in the Register a record of disciplines or specializations on computing obtained or acquired by such personnel through academic qualification or training including on-the-job training or skill or specialist or professional certifications.
- If we just declare, then it means the whole process is useless? We can just claim what we know.
- If we need to have a cert then here is an interesting part, a lot of new technology don't provide cert. Open Source Project also don't always have cert. Thing change fast enough the training will need to be updated often, and it will be expensive.
- On job training? My experience on job is, it tend to go safe, and rarely able to use the latest in technology
- School? That usually is not helpful at all
- Then if not above, that how to judge this?
Let go next
What is the criteria to go into a list? Does it mean it will monitor training provider, via accreditation? Consider the countries, reputation. How not to abuse it.For certifications mentioned in subsection (2), the Board may maintain a list of certifications provided by associations and bodies in Computing recognised by the Board and will keep the list updated from time to time.
Qualification
Now on to qualification
14.(1) (a) Subject to this Act, a person who holds –
(i) the qualifications required for Graduate Membership of a professional body or organisation recognized by the Board, and the qualifications are recognised by the Board; or
(ii) any qualification in Information Technology or Computing which is recognised by the Board; or
(iii) any other qualifications, certifications or relevant experiences recognised by the Board,
shall be entitled on application to be registered as aSo remember the definition on top, Computing Practitioner is someone that just use a computer. From my understanding, does it mean that, everybody need to register to use a software? For i) what kind of qualification, for iii) what kind of relevant experience, and certification?
Registered Computing Practitioner.
2) Subject to this Act, the following persons shall be entitled on application to be registered as a Registered Computing Professional:
(a) any person who is a Computing Graduate or any person who has other qualifications recognized by the board
(i) who has obtained the practical experience as prescribed under subsection (1)(b); and
(ii) who has passed a professional assessment examination conducted by the Board,
Corporate Member of or is a a professional body or organisation recognized by the Board; and
(iii) who has paid the prescribed fee and
(iv) who has complied with all the requirements of the Board;On top of experience we now have examination.
- What examination will be conducted? How many type
- What platform? or how many
- Or practical knowledge on only stuff from one vendor?
- Does it cover just computing, then in the definition above, it can be anything, even using spreadsheet is computing.
- If not what does it cover?
- If so, does it mean everybody can claim to be professional?
An extra note on, examination is that useless, there is too many tool, too many option. Limiting to one will be ridiculous and unfair to other.
Scope
Lets go to the scope of what I can or cannot do
(a) a Registered Computing Practitioner employment may take up which requires him to perform ComputingServices subject to thei. work is carried out under the supervision or instruction by a Registered Computing Professional,orii. similar work scope has been carried out by the Registered Computing Practitioner before.and
Does it mean that, I register myself as a software developer, does it mean thatA Registered Computing Professional may only provideComputing Services in the disciplines or specialisations of Computinghe is qualified to practise and as is shown in the Register undersubsection 12(2).
- that I can only write program.
- I cannot administer my own server?
- I cannot fix my own computer?
- Because I only declare that I can only do webapp, does it mean I cannot write other type of software(say web server).
- Now if I self taught new technology, does it mean I cannot use it at work when opportunity is given.
And because it is based on similar work carried out before, it means that;
- I can only do stuff that I only do before,
- so if I started with writing webapp, does it mean I cannot be administer my own server at work?
- it mean I cannot write mobile application professionally? does it mean, if i ever sell my own mobile application, because I did not declare this, outside of my work?
- Or I can only work in one industry, once in banking forever in banking
I graduated in AI in UM(don't ask), does it mean I cannot do mobile application? etc.
One more thing is how do we describe a similar scope? Does it go very granular, i.e you can only do php, because you have done php. Or it can be high level, i.e you have to do asp.net because you do web development.
Software tend to require multiple skillset, in reality a person involve in software need to have multiple skill, more obviously seen in software developer. And we have to learn on our own, the world just move fast, if we ignore the problem, the train just gone. Setting up rule like that, just make thing worst.
Conclusion
I believe that this act do not benefit professional like me for many reason.
- For one, I don't see the point, what kind of qualification for one, everybody use computer now?
- It limit my opportunity, I need to declare what I know, somehow.
- Worst I can only work on stuff I declared and no more,
- Opportunity tend to be based on past experience.
- Self learning, is not covered, and will not be recognized.
- Worst I am not even allowed to use stuff that is new, because I am did not declare that I am know it. But I need to use to learn, to use a new technology. So does it mean I can only learn in secret.
- Technically I am not even allowed to sell mobile app, which nowadays is very easy to distribute
In the end of the day, I believe this bring us 10 step backward for the IT industry. And only make thing worst.
.
Monday, December 05, 2011
Realized New Behavior in Virtualenv
Not too long ago, I covered virtualenv. One of the behavior is the --no-site-package option which will totally isolate the python environment. Starting from version 1.7, --no-site-package will be the default option, so you don't need this flag anymore. If you invoke virtualenv with this flag, it will display an error.
Saturday, December 03, 2011
Converting PDF to Text with pdftohtml
Previously I have tried to extract pdf information by converting PDF to text, as described here.
Problem is, a big wall of text is very hard to process.
Here come pdftohtml it is part of the poppler package on linux. But gnuwin do not have it for windows. Which is one reason I use pdftotext.
pdftohtml convert pdf to html. simple usage is
Then there is way to produce xml
p.s I'm using this for http://opendataday.org/ Whether there will be result today.
Problem is, a big wall of text is very hard to process.
Here come pdftohtml it is part of the poppler package on linux. But gnuwin do not have it for windows. Which is one reason I use pdftotext.
pdftohtml convert pdf to html. simple usage is
pdftohtml yourpdffile.pdfYou will get your html file. But it is a bit plain as they just extract text from it. It there is image inside pdf, or you pdf is pretty complicated, like Malaysian Hansard. You can use the -c
pdftohtml -c yourpdffile.pdfHere is the catch, it will generate 1 html per page in the pdf, with images. But the layout is maintained. For document like Malaysian Hansard, it would be hundreds of page.
Then there is way to produce xml
pdftohtml -xml yourpdffile.pdfYou will get an xml file which the position information.
p.s I'm using this for http://opendataday.org/ Whether there will be result today.
Tuesday, November 29, 2011
Remote Control your android with Airdroid
So I found this free android app, which allows remote control an android phone using a web browser. Once the app is install, you can start it pretty easily, with the url on displayed on the app.
What you get after open a browser, is a desktop like interface where one can use it to do quite a lot of stuff.
You can manage app from it, it install by redirect to the android marketplace web interface, but you can uninstall using the application Icon.
You can manage app install
You get a file manager that manipulate your sdcard, you can copy file from the PC, using the import button, and download the file using the export button on the file manager.
File manager like what you use in normal desktop OS
Write or reply sms
There is also photo, and musics on the browser, view the contact list, etc. It is really have a lot of feature.
For a free app, this actually offer a lot of stuff. For a webapp, it sure look like a full blown desktop. You can control the apps on the phone, but it is ok. It is useful enough for me.
The app is free on android market place, https://market.android.com/details?id=com.sand.airdroid. I recommend that you guys try it out.
Thursday, November 24, 2011
This is a post to test on empire avenue
{EAV:a6a709f0dbfb64d1}
This is a line to test my blog integration with empire avenue, in my other lil experiment...
This is a line to test my blog integration with empire avenue, in my other lil experiment...
Saturday, November 05, 2011
Adventure in Bottle( the web framework)
So I have been scraping data online for sometime. While scraperwiki have an API that allow third party app to get data in json/xml form. I think I can make it easier, because scraperwiki query involve doing a sql query on the sqlite datastore. Thus I take the opportunity to learn new python web framework.
For this project I didn't test the template, but from the doc, it is specified with a view decorator, which I think is nice, but I don't need it now. From the doc, I found that it is pretty clean.
Because bottle is a micro framework, there is no manage.py script like django, no ORM, I uses sqlalchemy here. There is no session support too. But interestingly I don't feel that I missed anything. In fact, it is pretty pleasant to use. Though session will definitely bite me if I ever have to implement login, but solution is on the documentation.
Overall, it is a fun framework to use, even though this is a small project. The documentation is pretty good. I might use it for future project.
The framework only need to handle request, and spit data in json(maybe xml later). It does not need a template, it is json. It don't need an ORM, the data most probably scrape from somewhere else. It do not need session, it is meant to be use by library. The data is open anyway.
The first framework I try out is Bottle.
The first thing I notice is the amount of setup that I have to do, coming from a django background. Which is well known for the big settings.py file. The amount of setup is small. Just install using 'pip install bottle'.
Essentially just an application defined, with the object Bottle()
And pass to the run function.
By default bottle already have a default application, so you don't strictly need it, I just to put it there to show that it is there.
Another thing I have noticed is, there is no url route in a separate file. A route decorator is added to a function that I want to serve in the web app. The route is part of the application(the Bottle() object), and I can limit the type of request I can do on it, like POST/GET. I found that this approach is pretty clean, it reduces the boiler plate like in django views.
Another thing to notice is. I do not specify a response method/object(like django). That is another nice thing about bottle. If the function returns a dict, the response will be in json. If string then the mimetype is text, etc. There is no need to specify a function for response.
Finally to run the app, just run python server.py (or any python file with the bottle run function). You have an webapp.
Finally to run the app, just run python server.py (or any python file with the bottle run function). You have an webapp.
For this project I didn't test the template, but from the doc, it is specified with a view decorator, which I think is nice, but I don't need it now. From the doc, I found that it is pretty clean.
Because bottle is a micro framework, there is no manage.py script like django, no ORM, I uses sqlalchemy here. There is no session support too. But interestingly I don't feel that I missed anything. In fact, it is pretty pleasant to use. Though session will definitely bite me if I ever have to implement login, but solution is on the documentation.
Overall, it is a fun framework to use, even though this is a small project. The documentation is pretty good. I might use it for future project.
Tuesday, November 01, 2011
Using Python Function with sqlite
Note: You can find the docs in the python doc page http://docs.python.org/library/sqlite3.html#sqlite3.Connection.create_function
This is more of a experience. Not too long ago, I have scrape from the parliament website on profiles of Member's of Parliament, you can find the result here.
The thing is, as I use the data from the sqlite database, I download from the site, I realized that, the Title is part of the name of the MP's. So one would get "XXX , Y.B Tuan". Y.B Tuan is the title.
That would make query like 'select Parti from swdata where Nama=name' hard. Because this is precisely what I am looking at, for another project.
On the other hand, sqlite3 module, apart comes with python standard library since 2.6. Actually have a function called, Connection.create_function.
So I wrote a little function called get_name, and the example show how it works.
Just define a python function, make sure it return datatype that is compatible with sqlite, attach it with create_function. Now you can use it in your sqlite query in python
Hope this is useful for someone. CHEERS
A little plug, this is something we try to work on in this little group call Sinar Project, and this is still in an early stage
This is more of a experience. Not too long ago, I have scrape from the parliament website on profiles of Member's of Parliament, you can find the result here.
The thing is, as I use the data from the sqlite database, I download from the site, I realized that, the Title is part of the name of the MP's. So one would get "XXX , Y.B Tuan". Y.B Tuan is the title.
That would make query like 'select Parti from swdata where Nama=name' hard. Because this is precisely what I am looking at, for another project.
On the other hand, sqlite3 module, apart comes with python standard library since 2.6. Actually have a function called, Connection.create_function.
So I wrote a little function called get_name, and the example show how it works.
import sqlite3
def get_name(name):
return name.split(',')[0]
s = sqlite3.connect('dbname')
# attach the python function
s.create_function('get_name',1,get_name)
# and use it
result = s.execute('select get_name(Nama) from swdata')
print result.next()[0]
Just define a python function, make sure it return datatype that is compatible with sqlite, attach it with create_function. Now you can use it in your sqlite query in python
Hope this is useful for someone. CHEERS
A little plug, this is something we try to work on in this little group call Sinar Project, and this is still in an early stage
Saturday, October 29, 2011
Converting PDF to Text
So I have recently involved with a project to extract data from PDF. Which is actually evil, but that is not important now.
On linux there is a set of utilities comes with xpdf program. It should be part of the default package installation, if not, you just apt-get or yum it.
On windows you can go to the gunwin32 page, I just download the zip just so i would not have to remove it with a uninstaller.
http://gnuwin32.sourceforge.net/packages/xpdf.htm
I don't really need the layout information, on it. so I just use pdftotext.
On windows
On linux, just
The -layout would maintain the layout of the text as from the pdf. Otherwise, the positioning for certain text will be inconsistent.
Cheers
On linux there is a set of utilities comes with xpdf program. It should be part of the default package installation, if not, you just apt-get or yum it.
On windows you can go to the gunwin32 page, I just download the zip just so i would not have to remove it with a uninstaller.
http://gnuwin32.sourceforge.net/packages/xpdf.htm
I don't really need the layout information, on it. so I just use pdftotext.
On windows
program_location/pdftotext.exe -layout pdf_file.pdf
On linux, just
pdftotext -layout pdf_file.pdf
The -layout would maintain the layout of the text as from the pdf. Otherwise, the positioning for certain text will be inconsistent.
Cheers
Monday, October 24, 2011
My Post On Robots Making
I have started a series of post on making robots with arduino at hackerspacekl website. You can find out more on the links :
http://www.hackerspace.my/2011/10/24/making-a-robot-with-arduino-part-1-intro-to-motor-controller/trackback
http://www.hackerspace.my/2011/10/24/making-a-robot-with-arduino-part-1-intro-to-motor-controller/trackback
Saturday, October 08, 2011
A scraper running on the cloud
I have been writing scraper for sometime, as you can see in some of my old post here.
So recently thanks to Kaeru, introduced to me, scraperwiki. This is basically a service for you to run scraper on the cloud, with additional benefits:
So recently thanks to Kaeru, introduced to me, scraperwiki. This is basically a service for you to run scraper on the cloud, with additional benefits:
- It runs on the cloud
- It provide infrastructure to store the data, in form of sqlite database, which you can download.
- It provide easy way to dump data as excel
- It provide infrastructure to convert the data into API
- Somebody can fork the scraper and do enhancement on it.
- A web based IDE, so you just write your scraper on it.
- Everybody can see the code of the public scraper.
- Scheduled task
One very cool thing about scraper wiki is, it support a set of third large library that can be used. It support Ruby, PHP, as well as Python. The API for scraper wiki is pretty extensive, it both covers it's own scraper, geocoding function, views for the data hosted on scraper wiki etc.
My only concern is, let say I want bring my scraper out of the service, I will need to rewrite the saving function. But on the the data can be downloaded anyway, and I use python, so it is not that big of a deal.
Below is a scraper that I have written, on scraper wiki. While it is mostly a work in progress, it show how it would look like.
Sunday, September 18, 2011
Event in Late Sept and Early Oct
Late September and early October is a busy month for geeks.
On 21st Sept, there will be a Software Freedom Day, in UniKL
Detail in
http://wiki.softwarefreedomday.org/2011/Malaysia/Kuala%20Lumpur/OSDCMY
http://www.facebook.com/event.php?eid=272717989419127
On 24th Sept, there will be a python malaysia meetup. It will be held in fluentspace. Near Kelana Mall.
Detail :
http://www.facebook.com/event.php?eid=246522202059172
On 29th Sept, there will be a google dev fest. It will be held in UCTI. The focus here is on android, html5 and google analytics
Detail :
http://code.google.com/events/devfests/2011/seasia.html#kualalumpur
On 1st Oct, there will be a geekcamp, that will be held on Itrain, near wisma mca. Thi is the the tech focus on event, barcamp style.
Detail on the page
http://geekcamp.my/
This will be a busy period of the month, it should be fun
On 21st Sept, there will be a Software Freedom Day, in UniKL
Detail in
http://wiki.softwarefreedomday.org/2011/Malaysia/Kuala%20Lumpur/OSDCMY
http://www.facebook.com/event.php?eid=272717989419127
On 24th Sept, there will be a python malaysia meetup. It will be held in fluentspace. Near Kelana Mall.
Detail :
http://www.facebook.com/event.php?eid=246522202059172
On 29th Sept, there will be a google dev fest. It will be held in UCTI. The focus here is on android, html5 and google analytics
Detail :
http://code.google.com/events/devfests/2011/seasia.html#kualalumpur
On 1st Oct, there will be a geekcamp, that will be held on Itrain, near wisma mca. Thi is the the tech focus on event, barcamp style.
Detail on the page
http://geekcamp.my/
This will be a busy period of the month, it should be fun
Wednesday, September 14, 2011
Accessing Server from Android
Recently I help maintain some server, sometime I tend to move around. So I decide to make my phone to be useful.
Android actually have a couple of app that is useful to remotely access a machine. Some of them is free.
For connecting to SSH, I found that connectbot works extremely well. It only does ssh and telnet, and thats about it. It is pretty straight forward to use. For accessing windows server, I use 2x client. Which again another another straight forward RDP client. Both connectbox and 2x client is free, and that is awesome.
The only issue on using android phone to access a server remotely is. I have a desire hd. While the screen is pretty large for a phone, typing command via ssh or, navigate around a windows server via RDP can be still a pain. It is still smaller that most desktop screen. And I don't have a full size keyboard on the phone. Which is another pain especially I access linux server most of the time.
So it can be a pain to use at time. But for quick fix or checking on server. This work pretty well.
I attached some links for the app below
For connecting to SSH, I found that connectbot works extremely well. It only does ssh and telnet, and thats about it. It is pretty straight forward to use. For accessing windows server, I use 2x client. Which again another another straight forward RDP client. Both connectbox and 2x client is free, and that is awesome.
The only issue on using android phone to access a server remotely is. I have a desire hd. While the screen is pretty large for a phone, typing command via ssh or, navigate around a windows server via RDP can be still a pain. It is still smaller that most desktop screen. And I don't have a full size keyboard on the phone. Which is another pain especially I access linux server most of the time.
So it can be a pain to use at time. But for quick fix or checking on server. This work pretty well.
I attached some links for the app below
Wednesday, August 31, 2011
Many Ways To Grep File Content
So not too long ago I have posted on twitter
This spin to a few other way to do grep.
A few have suggested on IRC and facebook, the i parameter is to make keyword not case sensitive.
grep -iR keyword directory
Another suggestion on IRC.
grep -iR --exclude=file-to-ignore keyword directory
Another tweet i have receive is,
Then the last one I discovered on google is ack-grep
ack-grep keyword directoryand again, -i make case insensitive search.
ack-grep -i keyword directory
ack-grep output is nicer, and automatically ignore binary. It is slightly different than grep. But both get the job done., to me anyway
Monday, August 29, 2011
Python Dateutil Redux
Not too long ago, I covered one use of python dateutil, on the blog here.
The library itself is pretty nifty in other case as well. In this case date difference. While python datetime module in the standard library, the datetime.timedelta is used to find difference in date, it counts up to the days. In my case, I want to count it to years.
That is where dateutil comes it. It have a module called, relativedelta. Which do actually count to years. To use it is a matter of import and use it
from dateutil.relativedelta import relativedelta
date_diff = relativedelta(date_from,date_to)
print date_diff
It as you can see does count up to years, also months. Which is useful if you wanted to find difference in date beyond just days.
The library itself is pretty nifty in other case as well. In this case date difference. While python datetime module in the standard library, the datetime.timedelta is used to find difference in date, it counts up to the days. In my case, I want to count it to years.
That is where dateutil comes it. It have a module called, relativedelta. Which do actually count to years. To use it is a matter of import and use it
from dateutil.relativedelta import relativedelta
date_diff = relativedelta(date_from,date_to)
print date_diff
It as you can see does count up to years, also months. Which is useful if you wanted to find difference in date beyond just days.
Python Web Scraping
There is time where there is information in govt website of is very useful, but unfortunately the data is in form of website, it could be worst as it can be in PDF. So it can be a pain if we wanted to use information for programming, but there is no API.
On the other python is a pretty powerful language. It comes with many library, include those that can be use to do HTTP request. Introducing urllib2, it is part of standard library. To use it to download data from a website can be done in 3 line of code
To use Beautiful Soup,
To see how the scraper fare, here is a real world example, in github part of a bigger project. But hey it is open source. Just fork and use it, in the this link.
So enjoy go forth and extract some data, and promise to be nice, don't hammer their server.
On the other python is a pretty powerful language. It comes with many library, include those that can be use to do HTTP request. Introducing urllib2, it is part of standard library. To use it to download data from a website can be done in 3 line of code
The problem, then is you get a whole set of HTML, which a bit hard to process. Then python have a few third party library, the one I use is Beautiful Soup. Beautiful Soup is nice that it is very forgiving in processing bad markup in HTML. So you don't need to worry about bad format and focus to get things done. The library itself can also parse XML, among other thing.import urllib2
page = urllib2.urlopen("url")
To use Beautiful Soup,
But you need to get the html first don't you?from BeautifulSoup
import BeautifulSoup
page = "html goes here"soup = BeautifulSoup(page)
value = soup.findAll('div')
print value[0].text
To use it, just download the data using urllib2 and pass to to beautiful soup. To use it is pretty easy, to me anyway. Though, urllib2 is going to be re organized in python 3. So code need some modification.import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("url")
soup = BeautifulSoup(page)
value = soup.findAll('div')
print value[0].text
To see how the scraper fare, here is a real world example, in github part of a bigger project. But hey it is open source. Just fork and use it, in the this link.
So enjoy go forth and extract some data, and promise to be nice, don't hammer their server.
Wednesday, June 22, 2011
Usefulness of __dict__
I am writing a validation function to check for objects instance, in this case, it is a django models. The object is mostly the same, except there is a extra field that need to be checked, that field exist only in certain object, but not all object.
The big catch is of using this is, __dict__ only contains attributes of a objects, it does not contain built-in attributes, and it does not contain methods. So if a value is from a method, you need to think of another way. Which actually sucks, as I have a lot of such methods in my classes, well I probably figure out by that time.
So, happy coding
So being lazy, I use the __dict__ special attribute. It exist in every python object, and contains symbol tables of the object, which I can use to check for existence of a attribute. For example
so we can do thing like check a item like 'test_val' in c.__dict__ to check for a existence of a attribute.
From the same idea, one very cool thing to do is, convert get the __dict__ and convert the attribute into a json object, with example
One idea I have been playing is, because django model instance is essentially a objects, we can use the same idea, to output the value in a single django model instance into a json string, but we need to be careful on certain data type like double etc.
The big catch is of using this is, __dict__ only contains attributes of a objects, it does not contain built-in attributes, and it does not contain methods. So if a value is from a method, you need to think of another way. Which actually sucks, as I have a lot of such methods in my classes, well I probably figure out by that time.
So, happy coding
Monday, May 09, 2011
The Great Global Hackerspace Challenge
Not long ago, Me and the Hackerspacekl gang, join The Global Hackerspace Challenge. Basically we build a Arduino Shield that process words.
Actually the whole process is best viewed on the hackerspacekl blog.
http://www.hackerspace.my/category/projects/the-story-box
In summary, here is what we learned.
- K.I.S.S, Keep It Small and Simple
The Atmega168 have only 16KiB of memory which the code itself taken half of it. and it is slow, it runs on such a speed 16 MHz. Modern computer is around 2 GHz. It better be simple, because debugging is hard.
- Serial is your best friend (on arduino anyway)
Unlike programming on PC or Web app. There is no print function to be used to debug, there is no debugger either. Even we have a LCD, it is not reliable. And serial pretty much built into the arduino board. So USE IT.
- Optimization matters
On dynamic language like python, or more modern language like java. There is garbage collector, even c/c++ have OS to help on that. So one don't need to worry about memory issues. But on arduino, removing unused code and code, and save a lot of memory.
- One need to think very low level.
The I2C to eeprom code is about shifting bits/bytes to write to the EEPROM. For once we are thinking in bytes. And we need to know a bit of hardware, to write the code properly.
Overall, it is fun and a interesting experience. For a programmer that spend time on python, or doing web development. Opening one eyes to embedded programming a bit, a little bit of experience that meant a lot to me.
Actually the whole process is best viewed on the hackerspacekl blog.
http://www.hackerspace.my/category/projects/the-story-box
In summary, here is what we learned.
- K.I.S.S, Keep It Small and Simple
The Atmega168 have only 16KiB of memory which the code itself taken half of it. and it is slow, it runs on such a speed 16 MHz. Modern computer is around 2 GHz. It better be simple, because debugging is hard.
- Serial is your best friend (on arduino anyway)
Unlike programming on PC or Web app. There is no print function to be used to debug, there is no debugger either. Even we have a LCD, it is not reliable. And serial pretty much built into the arduino board. So USE IT.
- Optimization matters
On dynamic language like python, or more modern language like java. There is garbage collector, even c/c++ have OS to help on that. So one don't need to worry about memory issues. But on arduino, removing unused code and code, and save a lot of memory.
- One need to think very low level.
The I2C to eeprom code is about shifting bits/bytes to write to the EEPROM. For once we are thinking in bytes. And we need to know a bit of hardware, to write the code properly.
Overall, it is fun and a interesting experience. For a programmer that spend time on python, or doing web development. Opening one eyes to embedded programming a bit, a little bit of experience that meant a lot to me.
Wednesday, March 16, 2011
Let Android read my SMS
Another I tried on SL4A is to play with their SMS function. So, with the resulting code below:
the result in the output is a list of such dictionary, in python notnion. Since I only want the message so I call it by i['body']
Originally I read all messages, and pass to the tts library. Turn out to be a bad idea, because I have no idea how to stop it from speaking once it started....
- As usual import library, and create the Android() object
- and from the Android object call smsGetMessages, with a required parameter for unread message, True for unread only, false for other wise.
- and call the build in android Text To Speech software to read it out, by calling ttsSpeak method.
import android
droid = android.Android()
result = droid.smsGetMessages(True)
for i in result.result:
droid.ttsSpeak(i['body'])
the result in the output is a list of such dictionary, in python notnion. Since I only want the message so I call it by i['body']
{u'_id': u'59',the date is the datetime read and the read is 1 is read, and 0 otherwise. The ttsSpeak method is easy to use too, just pass in a string.
u'address': u'Address of sender aka the phone no',
u'body': u'Message Body',
u'date': u'1300254988000',
u'read': u'1'}
Originally I read all messages, and pass to the tts library. Turn out to be a bad idea, because I have no idea how to stop it from speaking once it started....
Thursday, March 03, 2011
My First Day on Android Scripting
I got myself a android phone not that long ago. One reason is, it is pretty amazing piece of hardware. Unlike iphone the SDK is available on major OS, including linux. One of the many stuff I installed is SL4A, Scripting Layer for Android, and the python interpreter for android.
One cool thing SL4A do is, we test the code remotely from a python. The wiki page have a good explanation on how to do this.
http://code.google.com/p/android-scripting/wiki/RemoteControl
One of the first thing I play around via the python interpreter on the laptop
Not much of a program nor a post, but yeah it is a start of something beautiful. I hope
One cool thing SL4A do is, we test the code remotely from a python. The wiki page have a good explanation on how to do this.
http://code.google.com/p/android-scripting/wiki/RemoteControl
One of the first thing I play around via the python interpreter on the laptop
import androidThe api page have a lot of information, so that is one place that one should look.
droid = android.Android()
data = droid.getNetworkOperatorName()
print data.result
Not much of a program nor a post, but yeah it is a start of something beautiful. I hope
Saturday, February 19, 2011
The Solvers Manifesto
http://www.solversmanifesto.com/
The Solver Manifesto shows our dilemma, as a web developer, probably not much to the backend developer, or system developer. But definitely web developer, which is a big chunk of software development jobs.
We are not engineer, we are not paid as much as an Expert, and we don't get respected like a artist. Yet, we hired because we knows how to program, yet we have to make things nice. And yet, our opinion is not respected. Even when technicians fixing stuff, nobody question how they do it. Not for us.
p.s yes Yet Another Rant Post
The Solver Manifesto shows our dilemma, as a web developer, probably not much to the backend developer, or system developer. But definitely web developer, which is a big chunk of software development jobs.
We are not engineer, we are not paid as much as an Expert, and we don't get respected like a artist. Yet, we hired because we knows how to program, yet we have to make things nice. And yet, our opinion is not respected. Even when technicians fixing stuff, nobody question how they do it. Not for us.
p.s yes Yet Another Rant Post
Saturday, February 05, 2011
Random Python Learning : partial
On the holiday I decided to learn more on python, and really there is much to learn!!!!!!
One of the module I learn about is the functools module.
One of the interesting function that I learn is partial, basically it is partial application of a function
for example
read more here
http://docs.python.org/library/functools.html
So you can wrap a existing without rewriting it, but give some default value to the existing function. And is actually used in python decorators.import functoolsdef adder(a,b,c):return a+b+cdef adder2(a,b):return a+badd_three = functools.partial(adder,1,2)add_one = functools.partial(adder2,1)print add_three(1)# would print 4print add_four(3)# would print 6
read more here
http://docs.python.org/library/functools.html
Subscribe to:
Posts (Atom)