Calendar
Code
This code helped me to automate most of the work. Last year I used gdata directly, but this year I got so frustrated that I decided I need something simpler. The tradeoff is that the events are not described in details as they used to be. This year I'm not making a mistake of not keeping the code, so here it is.
There are basically three steps:
- Scrape the website to gather information
- Build a shell command to call googlecl - google command line interface that provides some ability to use Google services from command line.
- Run the command in the terminal. Keep retrying if it failed for any reason.
There were some issues:
- Doubled entries that migrated from calendar "Naukowa 2" to "Naukowa", but showed up in their proper place too.
- Sometimes starting times where off by an hour or two. For example by 1 hour on Friday, 1 hour on Saturday, 2 hours on Sunday. WTF? I noticed it happened on calendars that have numbers in their names, but most of them do, so I might be just imagining it.
- Some problems with encoding. I could never understand when I need to decode/encode a string.
#!/usr/bin/python
# -*- coding:utf8 -*-
from BeautifulSoup import BeautifulSoup
import urllib2
import re
import subprocess
# Download the page
# You may want to save the page in the browser and use a local copy
# for example: 'file:///home/daniel/Pobrane/pyrkon.html'
page = urllib2.urlopen('http://www.pyrkon.pl/2011/index.php?go2=program')
soup = BeautifulSoup(page)
# Find div with the content
content = soup.find('div', id='content')
# Get all his children which are divs too
divs = content.findAll('div')
# Set starting index in case you wanted to start in the middle after some interruption
start_from = 0
i = 0
l = len(divs) - start_from
for div in divs[start_from:]:
# Name and lecturer are easy
tytul = div.contents[1].b.string
prowadzacy = div.contents[1].i.string
# I can never understand when I need to decode/encode from/to utf-8.
# This was done by trial and error.
# Madafaking new lines are contents too, so
# div.contents[2] == u'\n'
# Place
miejsce = re.search('^<b>miejsce: </b>(?P<miejsce>.+?)<br />', div.contents[3].renderContents(), re.M).group('miejsce')
miejsce = miejsce.decode('utf-8')
# Show some progress information
i += 1
print '[%d/%d] %s: %s' % (i, l, miejsce, tytul)
# Event start time
czas = re.search('^<b>termin: </b> (?P<dzien>pią|sob|nd)(\s*)(?P<godzina>\d{2}):(?P<minuta>\d{2})', div.contents[3].renderContents(), re.M)
dzien, godzina, minuta = czas.group('dzien', 'godzina', 'minuta')
godzina = int(godzina)
minuta = int(minuta)
# Conversion from name of the day to number of the day
if dzien == 'pią':
dzien = 25
elif dzien == 'sob':
dzien = 26
elif dzien == 'nd':
dzien = 27
else:
raise ValueError('Błędny dzień')
# How long it lasts
dlugosc = re.search('^<b>czas trwania: </b>(?P<godzin>\d+):(?P<minut>\d{2}) h<br />', div.contents[3].renderContents(), re.M)
godzin, minut = dlugosc.group('godzin', 'minut')
godzin = int(godzin)
minut = int(minut)
# Build shell command for googlecl - google command line interface (available at code.google.com)
# uses "Quick Add" syntax
polecenie = '''google calendar add --cal='%s' '%s - %s on %d/03/2011 %d:%02d for %d minutes in %s' ''' % (miejsce, tytul, prowadzacy, dzien, godzina, minuta, godzin * 60 + minut, miejsce)
# Keep calling shell command until it succeeds
# Sometimes it throws gdata.service.RequestError with status 302 and reason 'Redirect received, but redirects_remaining <= 0'
return_code = 1
while return_code != 0:
print polecenie
return_code = subprocess.call(polecenie, shell=True)
Brak komentarzy:
Prześlij komentarz