爬虫-编码检测应用

上一篇我们安装编码检测工具chardet, 小甲鱼老师有道题要求用户输入任意网址，我们通过脚本判断出该网站使用的编码方式。

题目演示：

下面是我的代码：

'''本脚本是利用文本编码检测工具chardet检测用户输入的网站所使用的编码。'''
import urllib.request as ur
import chardet as ch

#定义一个函数用于接收网站数据
def source():
    url = input('Please input the URL you want to detect: ')
    try:#检测用户输入的网址是否正确
        content = ur.urlopen(url).read()
    except:
        print('The URL is wrong or it\'s not available.')
    else:
        return content

#检测网站的编码
def detect():
    try:#如果用户输入的网址有问题这里就会抛出异常，为了使脚本运行正常，这里进行了处理，并提示用户。
        sources = source()
        result = ch.detect(sources)
    except:
        print('You should restart the application again.')
    else:
        result = result['encoding']
        if result == 'GB2312':
            result = 'GBK'
        print('The encoding way is %s.' % result)

#如果作为单独脚本运行时
if __name__ == '__main__':
    detect()

参考关于GB2312 和 GBK

总结：

要注意

1	__name__ == '__main__':

的使用