The hackeRnews
package was created in order to simplify the process of getting data from Hacker News. Hacker News is a user-generated content website that focuses on stories related to computer science. The website is composed of user submitted stories where each one provides a link to the original data source. Moreover, users have the ability to upvote a story if they have found it interesting. Each story contains a comment section which allows users to discuss about the presented subject. Besides news stories Hacker News contains the following sections:
The Hacker News API official documentation can be found here. The API serves data in JSON format. The hackeRnews
package allows the retrieve this data in form of convenient R objects. Each object (story, comment, …) has a unique id and can be retrieved using this id. The API also provides a way to fetch up to 500 top and new stories, latest best stories, ask stories, show stories and job stories.
Examples of using the hackeRnews
package to retrieve data from the official Hacker News API are presented below:
To fetch best/new/top stories the user can use the get_*_stories
function. Each function takes one optional argument max_items
that limits the number of returned stories.
For example to fetch the top 5 best stories:
best_stories <- get_best_stories(max_items=5)
best_stories[[1]]
#> List of 9
#> $ by : chr "nachtigall"
#> $ descendants: int 279
#> $ id : int 21655958
#> $ kids : int [1:49] 21659972 21656187 21656020 21656376 21656726 21656022 21659484 21659549 21656045 21656717 ...
#> $ score : int 1560
#> $ time : POSIXct[1:1], format: "2019-11-28 10:59:47"
#> $ title : chr "Firefox Replay"
#> $ type : chr "story"
#> $ url : chr "https://firefox-replay.com/"
#> - attr(*, "class")= chr "hn_item"
There is a method that allows to fetch just raw ids of best/new/top stories as well get_*_stories_ids()
Similar to news stories. There are get_latest_*_stories
that returns latest * stories and get_latest_*_stories_ids
that returns latest * stories ids.
For example to fetch the 3 latest ask stories:
ask_stories <- get_latest_ask_stories(max_items=3)
ask_stories[[1]]
#> List of 9
#> $ by : chr "aosaigh"
#> $ descendants: int 7
#> $ id : int 21669408
#> $ kids : int [1:6] 21670878 21670876 21670253 21669578 21670316 21670317
#> $ score : int 13
#> $ text : chr "I'm the only developer on a small web application that is targetting enterprise customers.<p>After meeting"| __truncated__
#> $ time : POSIXct[1:1], format: "2019-11-30 11:33:25"
#> $ title : chr "Ask HN: Where to start with enterprise-level security and procedures?"
#> $ type : chr "story"
#> - attr(*, "class")= chr "hn_item"
To fetch data about user ‘jl’ just use the get_user_by_username
function:
user <- get_user_by_username('jl')
user
#> List of 5
#> $ about : chr "This is a test"
#> $ created : POSIXct[1:1], format: "2007-03-15 02:50:46"
#> $ id : chr "jl"
#> $ karma : int 4226
#> $ submitted: int [1:846] 19464269 18498213 16659709 16659632 16659556 14237416 11871616 11483492 11435082 10985073 ...
#> - attr(*, "class")= chr "hn_user"
It’s possible to iterate over latest items by fetching the id of the latest item by using the get_max_item_id
function and then walking backwards to discover latest items. Using that method it’s possible to fetch all items on Hacker News.
For example to fetch 10 latest items:
Latest items and profile changes can be retrieved using get_updates
comments
The discussion in story threads is represented as system of comments. Each story has top level comments ids stored under the
kids
property. Each comment post can have it’s own set of comments ids underkids
property (sub-comments) and so on. In order to retrieve all of the comments of a specific story, just use theget_comments
function.