iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
😶‍🌫️

Building an Automated Information Gathering Tool

に公開

1. Introduction

Working as an engineer, a massive amount of technical information flows in every day.
Especially in this age of Generative AI, what is the truly necessary ability for an engineer amidst such rapid change?
Initially, I thought it was about acquiring technical skills, but given the current situation where AI is replacing technical aspects, I've come to feel that what's really needed is the "ability to catch up on information."

So, this story is about how I created a tool to automate information catch-up and build my own original information collection hub.

2. What I Created

Here is what I actually built.
It's a tool that automatically retrieves article lists from RSS feeds of target technical blogs and allows you to check them on GitHub Pages.

https://is0383kk.github.io/your-feed-hub/

Repository

https://github.com/is0383kk/your-feed-hub

■ Mechanism

The automation follows this flow:

  1. Define the RSS of the target sites in a JSON file (can also be used for hobby information gathering)
{
  "categories": [
  ・・・
    {
      "name": "AWS",
      "id": "aws",
      "feedUrl": "https://aws.amazon.com/jp/about-aws/whats-new/recent/feed/",
      "webhookEnvKey": "DISCORD_WEBHOOK_AWS",
      "siteName": "AWS What's New"
    },
    {
      "name": "ゲーム",
      "id": "game",
      "feedUrl": "https://automaton-media.com/feed/",
      "webhookEnvKey": "",
      "siteName": "AUTOMATON"
    }
  ]
}
  1. Retrieve the article list from the above JSON file within a GitHub Actions workflow
  2. (Optional) Notify a channel via Discord's Webhooks feature
  3. Publish the retrieved article list as a website on GitHub Pages

This flow is automated periodically to ensure the latest information is always collected automatically.


Supplement: How to create your own information collection page using the tool

  1. Clone or fork the following repository

https://github.com/is0383kk/your-feed-hub

  1. Define the RSS information for collection in categories.json

https://github.com/is0383kk/your-feed-hub/blob/main/categories.json

  1. Reflect the changes in your own repository

After that, you can wait for the workflow (feed-collector-and-poster) to run automatically or execute it manually.

3. Key Features

I would like to introduce some of the points I focused on while creating this tool.

■ System for building your own original information hub

The biggest feature of this tool is that you can easily create your own information collection hub.

You can define the collection targets in JSON (categories.json) and freely customize them according to your interests.
Once you push this JSON file to your repository, the rest is handled automatically—gathering information and publishing it as a web page.

■ A cost-free system

Since I didn’t want to spend money on information gathering, I kept everything within GitHub’s free features.
I use GitHub Actions as the execution environment and GitHub Pages as the destination for publishing the collected article lists.

  • Information collection + article list generation (GitHub Actions)
  • Publishing as a website (GitHub Pages)

As this tool is still new, I also focused on a configuration where "you can try it out and stop immediately if it doesn't suit you."

■ Data management strategy

A common pitfall in personal development is the issue of accumulating too much data.
With this tool, the article list information retrieved from RSS would keep growing, causing the repository to bloat.
Even though it’s cost-free, there are constraints when relying entirely on a GitHub repository.

Therefore, I’ve implemented measures to prevent the repository from bloating.
For example, I limit the scope of resources published as GitHub Pages,

- name: Upload artifact for GitHub Pages
  uses: actions/upload-pages-artifact@v3
  with:
    path: "./docs"

- name: Deploy to GitHub Pages
  id: deployment
  uses: actions/deploy-pages@v4

and I’ve implemented a feature in the JavaScript running on GitHub Actions to exclude article information older than a certain period.

4. Conclusion

Finally, I'll list a few things I would like to add in the future.

  • Support for sites without RSS
    Currently, it only targets sites with RSS, but in the future, I'd like to support sites without RSS as well (e.g., using sitemaps or HTML parsing).

  • Expansion of notification features
    Although I only briefly mentioned it in this article, I have implemented a feature to automatically post collected article information to a Discord channel. Therefore, I'm considering expanding this to platforms other than Discord (such as Slack or Teams).

I'm also thinking about minor bug fixes and feature additions, but since the primary goal is information gathering, I want to be careful not to get too absorbed in the development work itself.

While it's still a simple setup, I hope to grow it into an "information collection hub that's just right for me" by expanding it little by little.

Discussion