|
Monday, 14 March 2011 13:17 |
// Scrape government contract details from transparency.ca.gov
import yql
import csv
y = yql.Public()
query = 'select * from html where url = "http://www.transparency.ca.gov/Contracts/default.aspx?Page=0" and xpath = "//select[@id=\'ctl00_ContentPlaceHolder1_wcPage\']/option"';
result = y.execute(query)
page_count = result.count
contract_writer = csv.writer(open('ca_contracts.csv', 'wb'), delimiter=',', dialect='excel-tab', quoting=csv.QUOTE_ALL)
for i in range(0, page_count):
print "Parsing page #: %d" % (i)
query = 'select * from html where url = "http://www.transparency.ca.gov/Contracts/default.aspx?Page=0" and xpath = "//div[@class=\'module dynamicHide\']"';
result = y.execute(query)
for j in range(1, 51): # 50 contracts per page
# PRIMARY DATA FIELDS
number = result.rows[j]['table']['tr']['td'][0]['p'] #contract number
dept = result.rows[j]['table']['tr']['td'][1]['p'] #department
price = result.rows[j]['table']['tr']['td'][2]['p'].replace(u'\xa0', '') #price
name = result.rows[j]['div']['p'][0]['content'].strip() #supplier name
dates = result.rows[j]['div']['div'][0]['p']['content'].strip() #dates
class_codes = result.rows[j]['div']['label'][1]['content'].replace("\n", ' ') #supplier classification codes
instruct = result.rows[j]['div']['div'][1]['label']['content'].replace("\n", ' ') #special instructions
ac_type = result.rows[j]['div']['p'][1]['content'].split("\n")[0] #acquisition type
ac_method = result.rows[j]['div']['p'][1]['content'].split("\n")[1].strip() #acquisition method
# SECONDARY DATA FIELDS
category = result.rows[j]['div']['div'][2]['ul']['li']['h5']['content'] #category
descr = result.rows[j]['div']['div'][2]['ul']['li']['div'][1]['p'] #classification
contract_writer.writerow([number, dept, price, name, dates, class_codes, instruct, ac_type, ac_method, category, descr])
 Read more: |
|
Friday, 03 December 2010 03:00 |
 | About Taco HTML Edit
A full-featured HTML editor and PHP editor. As an HTML editor, Taco HTML Edit empowers its users to rapidly create their own web sites. It is designed exclusively for Mac OS X and has many advanced features including spell checking, live browser previewing, PHP previewing, syntax checking, and much more.
The Component Library, new in Taco HTML Edit, allows you to select one of 22 components, customize it, and insert it into an HTML document. From Slideshows to Pie Charts, from Accordion Controls to Scrollable Tables, the Component Library has the widgets that web designers have often wanted to add to web pages, but until now have been very encumbering. The Component Library revolutionizes how web designers create web pages. |
Read more: |
|
|
Friday, 29 October 2010 04:00 |
 | About PDF Converter
A 6-in-1 PDF Converter which helps Mac Users converts PDF to Office, EPUB, Text and HTML. Users can edit and reuse PDF contents easily!
Key features:
- Convert PDF to editable Word, Excel, PowerPoint, EPUB, Text and HTML on Mac OS X.
- Preserve text, hyperlinks, images, layouts, tables, columns, graphics in converted Office files, HTML pages and EPUB eBooks
- Batch Conversion: convert maximum 50 PDF files at one time
- Partial Conversion: select specific page/ page range from each PDF file to convert
- Support encrypted PDF files conversion
- Customize the text and background color in the converted HTML and EPUB files
- Standalone, do not require Adobe Reader or Acrobat or Microsoft Office for Mac
- Support 9 multiple languages: English, Turkish, Thai, Latin, Korean, Greek, Cyrillic, Japanese and Chinese |
Read more: |
|
Monday, 25 October 2010 04:00 |
 | About Taco HTML Edit
A full-featured HTML editor and PHP editor. As an HTML editor, Taco HTML Edit empowers its users to rapidly create their own web sites. It is designed exclusively for Mac OS X and has many advanced features including spell checking, live browser previewing, PHP previewing, syntax checking, and much more.
The Component Library, new in Taco HTML Edit, allows you to select one of 22 components, customize it, and insert it into an HTML document. From Slideshows to Pie Charts, from Accordion Controls to Scrollable Tables, the Component Library has the widgets that web designers have often wanted to add to web pages, but until now have been very encumbering. The Component Library revolutionizes how web designers create web pages. |
Read more: |
|