HTTP协议、Wireshark抓包工具、json解析、天气爬虫
HTTP超文本传输协议
-
HTTP(Hyper Text Transfer Protocol): 全称超文本传输协议,是用于从万维网(WWW:World Wide Web )服务器传输超文本到本地浏览器的传送协议。
-
HTTP 协议的重要特点: 一发一收,一问一答
-
明文传输 ,https密文 (加密了的)
现在的主流网站都是https
网址:URL 统一资源定位符
URL
互联网上每个文件都有一个唯一的 URL,它包含的信息指出文件的位置以及浏览器应该怎么处理它。
<协议>://<主机>:<端口>/<路径>
协议:HTTP 80 TCP
HTTPS 443 TCP
主机: 域名 -> 域名解析服务器 -> IP地址
端口: 可以省略, HTTP 80
HTTPS 443
路径: 想要获得对应的资源
https://www.example.com/path/to/resource?query=parameter#fragment
•https 是协议,
•www.example.com 是主机名,域名
•/path/to/resource 是路径,
•query=parameter 是查询参数,
•#fragment 是片段标识符。
HTTP交互过程:
1.建立TCP连接
2.发送HTTP请求报文
3.回复HTTP相应报文
4.关闭TCP连接
要一次建立一次tcp连接,关闭一次连接
HTTP请求报文的格式:
HTTP响应报文格式:
GET 方法
基本介绍:
以下几种方式都会触发 GET 方法的请求
Wireshark 抓包工具
sudo apt-get install wireshark
为什么很多软件都可以用apt-get install安装?
apt-get 是 Debian 及其衍生系统(如 Ubuntu、Linux Mint 等)中的高级包工具(Advanced Package Tool)的命令行接口。
1.集中化的软件仓库: Debian 和其衍生系统维护着庞大的软件仓库,这些仓库包含了成千上万个预编译的软件包。当软件开发者将他们的软件贡献给这些仓库时,软件就变得可供所有使用相同系统的用户通过 apt-get 安装。
2. 依赖关系管理: apt-get 能够解析和处理软件包之间的依赖关系。这意味着当你安装一个软件包时,apt-get 会自动安装所有必要的依赖包,确保软件能够正常运行。这种依赖关系管理大大简化了安装过程。
3.社区贡献和维护: 一个活跃的社区持续为 Debian 和其衍生系统贡献软件包。这些贡献者不仅提供新的软件包,还负责保持现有软件包的更新和维护,确保它们与最新系统兼容。
一般过滤条件先选择协议名 然后用端口过滤。
先运行网站如何抓取http格式的数据
Nowapi的ip地址和端口号 为 103.205.5.228:80
GET /?app=weather.today&weaid=%E8%A5%BF%E5%AE%89&appkey=72317&sign=be43b728a4f27463d34f4fcbfea69134&format=json HTTP/1.1
Host: api.k780.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 18 Jul 2024 08:23:48 GMT
Content-Type: application/json; charset=utf-8;
Transfer-Encoding: chunked
Connection: keep-alive
Access-Control-Allow-Origin: *
{"success":"1","result":{"weaid":"316","days":"2024-07-18","week":".........","cityno":"xian","citynm":"......","cityid":"101110101","temperature":"33.../24...","temperature_curr":"33...","humidity":"75%","aqi":"40","weather":"......","weather_curr":"...","weather_icon":"http://api.k780.com/upload/weather/d/2.gif","weather_icon1":"","wind":"......","winp":"2...","temp_high":"33","temp_low":"24","temp_curr":"33","humi_high":"0","humi_low":"0","weatid":"3","weatid1":"","windid":"8","winpid":"2","weather_iconid":"2"}}
用httl协议获取今日天气的信息(json格式)
#include"head.h"
int CreatTcpClient(char *pip,int port)
{
int ret = 0;
int sockfd = 0;
struct sockaddr_in seraddr;
sockfd = socket(AF_INET,SOCK_STREAM,0);
if(-1 == sockfd)
{
perror("fail to socket");
return -1;
}
seraddr.sin_family = AF_INET;
seraddr.sin_port = htons(port);
seraddr.sin_addr.s_addr = inet_addr(pip);
ret = connect(sockfd,(struct sockaddr *)&seraddr,sizeof(seraddr));
if(-1 == ret)
{
perror("fail to connect ");
return -1;
}
return sockfd;
}
int SendHttpRequest(int sockfd,char *purl)
{
char tmpbuff[4096] = {0};
ssize_t nsize = 0;
sprintf(tmpbuff,"GET %s HTTP/1.1\r\n",purl); //%s 用于将 purl里存放的内容拼接到 tmpbuff
sprintf(tmpbuff,"%sHost: api.k780.com\r\n",tmpbuff);
sprintf(tmpbuff,"%sUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0\r\n",tmpbuff);
sprintf(tmpbuff,"%sAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8\r\n",tmpbuff);
sprintf(tmpbuff,"%sAccept-Language: en-US,en;q=0.5\r\n",tmpbuff);
sprintf(tmpbuff,"%sConnection: keep-alive\r\n\r\n",tmpbuff);
nsize = send(sockfd,tmpbuff,strlen(tmpbuff),0);
if(-1 == nsize)
{
perror("fail to send");
return -1;
}
return 0;
}
int main(int argc, const char *argv[])
{
int sockfd = 0;
ssize_t nsize = 0;
char tmpbuff[4096] = {0};
sockfd = CreatTcpClient("103.205.5.228",80);
SendHttpRequest(sockfd,"/?app=weather.today&weaid=%E8%A5%BF%E5%AE%89&appkey=72317&sign=be43b728a4f27463d34f4fcbfea69134&format=json");
nsize = recv(sockfd,tmpbuff,sizeof(tmpbuff),0);
if(-1 == nsize)
{
perror("fial to recv");
return -1;
}
printf("*********RECV***********\n");
printf("%s\n",tmpbuff);
printf("************************\n");
close(sockfd);
return 0;
}
json数据解析
{ "success":"1", "result":{ "weaid":"316", "days":"2024-07-18", "week":"星期四", "cityno":"xian", "citynm":"西安", "cityid":"101110101", "temperature":"33℃/24℃", "temperature_curr":"32℃", "humidity":"82%", "aqi":"41", "weather":"多云", "weather_curr":"多云", "weather_icon":"http://api.k780.com/upload/weather/d/1.gif", "weather_icon1":"", "wind":"西南风", "winp":"1级", "temp_high":"33", "temp_low":"24", "temp_curr":"32", "humi_high":"0", "humi_low":"0", "weatid":"2", "weatid1":"", "windid":"5", "winpid":"1", "weather_iconid":"1" } }
这是一个两层的
JSON格式
一种数据格式
JSON(JavaScript Object Notation)通常用于Web应用程序之间以及客户端与服务器之间的数据交换。
JSON的基本结构包括对象和数组:‘
•对象是一系列无序的键值对。对象以 { 开始 ,以 } 结束。每个键后面跟一个冒号 : 每个键值对之间用逗号 , 分隔。
•数组是值的有序集合。数组以 [ 开始,以 ] 结束。值之间使用逗号 , 分隔。
字符串:
{
"name":"code",
"gender":"male"
}
数字:
{
"key1":10,
"key2":20.0
}
数组:
{
"key1" : [0, 1],
"key2" : [2, 3]
}
允许嵌套
{
"name": "John Doe",
"age": 30,
"isEmployed": true,
"address": {
"street": "123 Main St",
"city": "Anytown"
},
"phoneNumbers": [
{
"type": "home",
"number": "555-1234"
},
{
"type": "mobile",
"number": "555-5678"
}
]
}
name、age 和 isEmployed 是简单的键值对。address 是一个嵌套的对象,phoneNumbers 是一个包含对象的数组。
利用cJOSN解析json格式:
NowAPI天气爬虫
json请求
URL: http://api.k780.com/?app=weather.today&weaId=1&appkey=APPKEY&sign=SIGN&format=json
注意:
appkey:换成自己的APPKey
sign:换成自己的sign标识
1.正常返回 { "success": "1", "result": { "weaid": "1", "days": "2014-07-30", "week": "星期三", "cityno": "beijing", "citynm": "北京", "cityid": "101010100", "temperature": "31℃/24℃", /*白天 夜间温度 (注: 夜间只有一个温度如24℃/24℃)*/ "temperature_curr": "25℃", /*当前温度*/ "humidity": "50%",/*湿度*/ "aqi": "100",/*pm2.5 说明详见weather.pm25*/ "weather": "多云转晴", /*天气*/ "weather_icon": "http://api.k780.com/upload/weather/d/1.gif", /*气象图标 全部气象图标下载*/ "weather_icon1": "", /*无意义不必理会*/ "wind": "微风",/*风向*/ "winp": "小于3级", /*风力*/ "temp_high": "31", /*最高温度*/ "temp_low": "24", /*最低温度*/ "humi_high": "87.8", /*最大湿度 [历史遗留栏位不再更新]*/ "humi_low": "75.2", /*最小湿度 [历史遗留栏位不再更新]*/ "weatid": "2", /*天气ID,可对照weather.wtype接口中weaid*/ "weatid1": "", /*无意义不必理会*/ "windid": "1", /*风向ID(暂无对照表)*/ "winpid": "2" /*风力ID(暂无对照表)*/ "weather_iconid": "1" /*气象图标编号,对应weather_icon 1.gif*/ } }
请求报文格式:
GET 是最常用的 HTTP 方法,常用于获取服务器上的某个资源。
在浏览器中直接输入 URL 回车或点击浏览器收藏夹中的链接,此时浏览器就会发送出一个 GET 请求。
响应报文格式:
HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Fri, 08 Mar 2024 06:33:44 GMT\r\n
Content-Type: application/json; charset=utf-8;\r\n
Transfer-Encoding: chunked\r\n
Connection: keep-alive\r\n
Access-Control-Allow-Origin: *\r\n
\r\n
{"success":"1","result":{"weaid":"316","days":"2024-03-08","week":".........","cityno":"xian","citynm":"......","cityid":"101110101","temperature":"13.../0...","temperature_curr":"12...","humidity":"29%","aqi":"65","weather":"............","weather_curr":"...","weather_icon":"http://api.k780.com/upload/weather/d/0.gif","weather_icon1":"","wind":"......","winp":"2...","temp_high":"13","temp_low":"0","temp_curr":"12","humi_high":"0","humi_low":"0","weatid":"1","weatid1":"","windid":"4","winpid":"2","weather_iconid":"0"}}\r\n
- HTTP 在传输层依赖 TCP 协议,TCP 是面向字节流的。如果没有这个空行,就会出现”粘包问题“所以报文中会有很多 \r\n
- 因为 HTTP 协议并没有规定报头部分的键值对有多少个,使用空行就相当于是报文的结束标记或报文和正文之间的分隔符
原文地址:https://blog.csdn.net/l22221/article/details/140521865
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!