as.list()
method for robots.txtget_robotstxt()
that defaults to “UTF-8” which does the content function anyways - but now it will not complain about itpaths_allowed
and robotstxt
.future::future_lapply()
to future.apply::future_lapply()
to make package compatible with versions of future after 1.8.1get_robotstxts()
function wich is a ‘vectorized’ version of get_robotstxt()
paths_allowed()
now allows checking via either robotstxt parsed robots.txt files or via functionality provided by the spiderbar package (the latter should be faster by approximatly factor 10)sessionInfo()$R.version$version.string
get_robotstxt() tests for HTTP errors and handles them, warnings might be suppressed while un-plausible HTTP status codes will lead to stoping the function https://github.com/ropenscilabs/robotstxt#5
dropping R6 dependency and use list implementation instead https://github.com/ropenscilabs/robotstxt#6
use caching for get_robotstxt() https://github.com/ropenscilabs/robotstxt#7 / https://github.com/ropenscilabs/robotstxt/commit/90ad735b8c2663367db6a9d5dedbad8df2bc0d23
make explicit, less error prone usage of httr::content(rtxt) https://github.com/ropenscilabs/robotstxt#
replace usage of missing for parameter check with explicit NULL as default value for parameter https://github.com/ropenscilabs/robotstxt#9
partial match useragent / useragents https://github.com/ropenscilabs/robotstxt#10
explicit declaration encoding: encoding=“UTF-8” in httr::content() https://github.com/ropenscilabs/robotstxt#11