I was doing some tests to make a simple web crawler in PHP and I want to share with you get_headers()
function.
<?php
function get_http_response_code($headers) {
return intval(substr($headers[0], 9, 3));
}
$url = 'http://tig.pt';
$headers = get_headers($url, 1);
if ($headers){
var_dump(get_http_response_code($headers));
var_dump($headers['Location']);
var_dump($headers);
} else {
echo 'Error loading header - failed to open stream';
}
?>
With this simple example, we can test the get_headers()
function and extract some information from it.
Since my blog has a http
to https
redirect, this will get us a 301 Moved Permanently
redirect and we can extract the new Location
directly from $headers
variable, that will be https://tig.pt
.
Sadly I didn't find any direct way to get the response code without making a new request, and a new request could give a different response from the first one, but we can create a simple get_http_response_code
function that parses it from the string with intval(substr($headers[0], 9, 3))
getting the 3 numbers code that starts at character 9, and convert it to int for simple handling.
I also added a simple if ($headers)
because if theres any error in the request, no response at all, or url doesn't exist, the function returns NULL
.
This can be used to handle any kind of content and it's a simple way to test an up status, or content modified date if you are doing cache control.
I will use it to feed my crawler with information about broken links.
You can run this example on terminal with:
curl https://static.tig.pt/php/get_headers.php
Don't forget to use <pre>
html tags on your vardumps to easily read on the browser.
If you prefer other way to do it, or want to share your work, just contact me.