broken links when fetching url with urllib2 python ISO-8859-15 to utf-8

I had a little annoying issue which was a simple encoding error. I’m not a python expert and i tried to read a webpage via urllib2 in python. Let’s see what i did:

import urllib2
req = urllib2.Request(url)
response = urllib2.urlopen(req)
content = response.read()
response.close()

When i tried to work with my webpage i got in “content” i had a really weird problem. The webpage i got was different from the webpage in the browser.

print response.headers.getheader('Content-Type')
text/html; charset=ISO-8859-15

The charset of the response was “ISO-8859-15”. My browser used “UTF-8”, so we add now a useragent to ‘convert’ our charset from ISO-8859-15 to utf-8:

import urllib2
req = urllib2.Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0')
response = urllib2.urlopen(req)
content = response.read()
response.close()

print response.headers.getheader('Content-Type')
text/html;charset=UTF-8

This is what I got. Then I replaced the ‘&’ and all german special characters like this:

content = content.replace('&', '&')
#for german special characters
content = content.replace('ü', 'ü')
content = content.replace('ä', 'ä')
content = content.replace('ö', 'ö')
content = content.replace('ß', 'ß')

Maybe there is another more beautiful way for this, but this worked for my and i can go on with my actually work.