自学内容网 自学内容网

Java爬虫中,怎样设置请求重试次数?

在Java爬虫中设置请求重试次数是一种常见的做法,可以帮助程序在遇到临时的网络问题或服务器响应超时时,自动重新发起请求,从而提高爬虫的稳定性和成功率。以下是一些常见的方法来设置请求重试次数,包括使用原生Java代码、Apache HttpClient以及OkHttp等库。

一、使用原生Java代码

在使用原生Java的HttpURLConnection时,可以通过简单的循环来实现重试机制。以下是一个示例:

import java.net.HttpURLConnection;
import java.net.URL;

public class HttpRequestWithRetry {
    public static void main(String[] args) {
        String urlString = "http://example.com";
        int connectionTimeout = 5000; // 连接超时时间(毫秒)
        int readTimeout = 5000; // 读取超时时间(毫秒)
        int maxRetries = 3; // 最大重试次数

        for (int attempt = 0; attempt < maxRetries; attempt++) {
            try {
                URL url = new URL(urlString);
                HttpURLConnection connection = (HttpURLConnection) url.openConnection();
                connection.setConnectTimeout(connectionTimeout);
                connection.setReadTimeout(readTimeout);
                connection.setRequestMethod("GET");

                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // 处理响应数据
                    System.out.println("请求成功!");
                    break; // 成功,退出重试循环
                } else {
                    System.out.println("请求失败,响应码:" + responseCode);
                }
            } catch (Exception e) {
                System.out.println("请求失败,重试次数:" + (attempt + 1));
                if (attempt == maxRetries - 1) {
                    // 最后一次重试失败,处理错误
                    e.printStackTrace();
                }
            }
        }
    }
}

二、使用Apache HttpClient

Apache HttpClient提供了更强大的功能,包括内置的重试机制。可以通过HttpRequestRetryHandler接口来实现自定义的重试逻辑。以下是一个示例:

import org.apache.http.client.HttpRequestRetryHandler;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.protocol.HttpContext;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

public class HttpClientRequestWithRetry {
    public static void main(String[] args) {
        String url = "http://example.com";
        int connectionTimeout = 5000; // 连接超时时间(毫秒)
        int socketTimeout = 5000; // 读取超时时间(毫秒)
        int maxRetries = 3; // 最大重试次数

        HttpRequestRetryHandler retryHandler = (exception, retryCount, context) -> {
            if (retryCount >= maxRetries) {
                // 超过最大重试次数,不再重试
                return false;
            }
            if (exception instanceof IOException) {
                // 对于IO异常,进行重试
                return true;
            }
            return false;
        };

        try (CloseableHttpClient httpClient = HttpClients.custom()
                .setRetryHandler(retryHandler)
                .build()) {

            HttpGet request = new HttpGet(url);
            RequestConfig config = RequestConfig.custom()
                    .setConnectTimeout(connectionTimeout)
                    .setSocketTimeout(socketTimeout)
                    .build();
            request.setConfig(config);

            CloseableHttpResponse response = httpClient.execute(request);
            if (response.getStatusLine().getStatusCode() == 200) {
                String responseBody = EntityUtils.toString(response.getEntity());
                System.out.println("请求成功:" + responseBody);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

三、使用OkHttp

OkHttp也支持重试机制,可以通过RetryInterceptor来实现自定义的重试逻辑。以下是一个示例:

import okhttp3.*;

public class OkHttpRequestWithRetry {
    public static void main(String[] args) {
        String url = "http://example.com";
        int connectionTimeout = 5000; // 连接超时时间(毫秒)
        int readTimeout = 5000; // 读取超时时间(毫秒)
        int maxRetries = 3; // 最大重试次数

        OkHttpClient client = new OkHttpClient.Builder()
                .connectTimeout(connectionTimeout, java.util.concurrent.TimeUnit.MILLISECONDS)
                .readTimeout(readTimeout, java.util.concurrent.TimeUnit.MILLISECONDS)
                .addInterceptor(new RetryInterceptor(maxRetries))
                .build();

        Request request = new Request.Builder().url(url).build();
        try (Response response = client.newCall(request).execute()) {
            if (response.isSuccessful()) {
                System.out.println("请求成功:" + response.body().string());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    static class RetryInterceptor implements Interceptor {
        private final int maxRetries;

        RetryInterceptor(int maxRetries) {
            this.maxRetries = maxRetries;
        }

        @Override
        public Response intercept(Chain chain) throws IOException {
            Request request = chain.request();
            int attempt = 0;
            while (true) {
                try {
                    Response response = chain.proceed(request);
                    if (response.isSuccessful()) {
                        return response;
                    }
                } catch (IOException e) {
                    attempt++;
                    if (attempt > maxRetries) {
                        throw e; // 超过最大重试次数,抛出异常
                    }
                    System.out.println("请求失败,重试次数:" + attempt);
                }
            }
        }
    }
}

四、使用Spring Retry

如果你的项目使用了Spring框架,可以利用Spring Retry来实现重试机制。Spring Retry提供了注解和编程式两种方式来实现重试。以下是一个使用注解的示例:

首先,添加Spring Retry依赖:

<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
    <version>1.3.1</version>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-aspects</artifactId>
    <version>5.3.10</version>
</dependency>

然后,配置Spring Retry:

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.retry.annotation.EnableRetry;
import org.springframework.retry.backoff.FixedBackOffPolicy;
import org.springframework.retry.policy.SimpleRetryPolicy;
import org.springframework.retry.support.RetryTemplate;

@Configuration
@EnableRetry
public class RetryConfig {
    @Bean
    public RetryTemplate retryTemplate() {
        RetryTemplate retryTemplate = new RetryTemplate();

        // 设置重试策略
        SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
        retryPolicy.setMaxAttempts(3); // 最大重试次数
        retryTemplate.setRetryPolicy(retryPolicy);

        // 设置重试间隔
        FixedBackOffPolicy backOffPolicy = new FixedBackOffPolicy();
        backOffPolicy.setBackOffPeriod(1000); // 重试间隔时间(毫秒)
        retryTemplate.setBackOffPolicy(backOffPolicy);

        return retryTemplate;
    }
}

最后,使用@Retryable注解来标记需要重试的方法:

import org.springframework.retry.annotation.Retryable;
import org.springframework.stereotype.Service;

@Service
public class HttpService {

    @Retryable(maxAttempts = 3, backoff = @Backoff(delay = 1000))
    public String sendRequest(String url) throws IOException {
        // 发送HTTP请求
        // ...
        return "请求成功";
    }
}

五、总结

通过以上几种方法,可以在Java爬虫中设置请求重试次数,提高爬虫的稳定性和成功率。选择哪种方法取决于你的具体需求和项目环境。对于简单的项目,使用原生Java代码或Apache HttpClient可能就足够了;对于更复杂的项目,特别是使用了Spring框架的项目,Spring Retry是一个非常强大的选择。


原文地址:https://blog.csdn.net/2401_87195067/article/details/145117814

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!