iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
😎

Querying Google Gemini API with Full Repository Source Code

に公開

Gemini 1.5 Pro features a 128,000 token context window. I thought, "With this, maybe I don't even need to build an index and search; I can just include everything," so I tried sending it all.

https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

An example of sending everything under src/ of htmx:

$ node askrepo.js 'What is the code for?' src/

HTTP Status Code: 200

Analysis of Provided Code Snippets:

Based on the file names and source code provided, it seems you're dealing with a collection of HTMX extensions. HTMX is a library that allows you to access modern browser features directly from HTML, using attributes for things like AJAX, CSS Transitions, WebSockets, and Server-Sent Events.

Here's a breakdown of each file's purpose:

  • src/ext/ajax-header.js: This extension ensures that the "X-Requested-With" header is set to "XMLHttpRequest" for HTMX-driven requests. This can be helpful for server-side code to distinguish between regular and AJAX requests.
  • src/ext/alpine-morph.js: Integrates with the Alpine.js library to provide "morphing" transitions when swapping content. Alpine.js is another library for declarative interactions within HTML.
  • src/ext/class-tools.js: Provides tools for manipulating classes on elements. It allows you to add, remove, toggle, and even set up sequences of class changes with delays using the classes attribute.
  • src/ext/client-side-templates.js: Allows the use of client-side templating libraries like Mustache, Handlebars, and Nunjucks to render responses from the server before they are swapped into the DOM.
  • src/ext/debug.js: A simple extension for debugging purposes. It logs HTMX events to the console, aiding in understanding the library's behavior.
  • src/ext/disable-element.js: Allows you to disable elements (like submit buttons) during an HTMX request to prevent multiple submissions.
  • src/ext/event-header.js: Adds information about the triggering event to the request headers, allowing the server to have more context about the action that led to the request.
  • src/ext/head-support.js: Enables merging of \u003chead\u003e tag content from HTMX responses into the current document's \u003chead\u003e. This allows for dynamic updates of things like title, meta tags, and stylesheets.
  • src/ext/include-vals.js: Allows you to include additional values in HTMX requests based on the include-vals attribute of an element.
  • src/ext/json-enc.js: Forces HTMX requests to use JSON encoding for parameters and sets the appropriate content type header.
  • src/ext/loading-states.js: Provides ways to show loading indicators while HTMX requests are in progress. You can control the visibility, classes, and aria-busy attributes of elements during requests.
  • src/ext/method-override.js: Allows you to use HTTP verbs beyond GET and POST with HTMX by setting the "X-HTTP-Method-Override" header.
  • src/ext/morphdom-swap.js: Uses the morphdom library to provide more efficient DOM diffing and updating when swapping content.
  • src/ext/multi-swap.js: Allows for swapping multiple elements within a single HTMX request using different swap styles for each element.
  • src/ext/path-deps.js: Enables refreshing elements based on changes to specific URL paths.
  • src/ext/path-params.js: Allows you to specify parameters within the URL path of an HTMX request.
  • src/ext/preload.js: Enables preloading of HTMX targets based on various triggers, like hovering or clicking on elements.
  • src/ext/remove-me.js: Allows you to set up elements to be automatically removed from the DOM after a certain delay.
  • src/ext/response-targets.js: Enables targeting different elements based on the HTTP response code of an HTMX request.
  • src/ext/restored.js: Provides a way to trigger events on elements that were restored from the history cache.
  • src/ext/sse.js: Adds support for Server-Sent Events (SSE) to HTMX, allowing for real-time updates from the server.
  • src/ext/ws.js: Adds support for WebSockets to HTMX, enabling bi-directional communication with the server.
  • src/htmx.d.ts: This file is a TypeScript definition file, which provides type information for the HTMX library for use in TypeScript projects.
  • src/htmx.js: This is the core HTMX library code, containing the main functionality and logic.
  • src/htmx.test.ts: This file appears to be a collection of examples demonstrating the usage of various HTMX functions and features.

~~https://gist.github.com/laiso/baa0763cee4e9b0534a72c8ab0cf6c97~~

You can obtain the GOOGLE_API_KEY from https://aistudio.google.com/app/apikey

I wrote it in CommonJS to make it easier to integrate into actions/github-script. I also avoided external library dependencies.

I used this to create a BOT that replies to newly opened Issues by referring to the source code. It is running here:

https://github.com/laiso/github-issue-bot-google-gemini-api/actions/runs/8724784938/workflow

The input size is roughly 10,000 to 20,000 lines of source code. When I tried to include everything from large OSS repositories (such as React or LangChain), I hit the Gemini API limit: Request payload size exceeds the limit: 20971520 bytes.

Therefore, I made it possible to specify a directory. Below is the result of inputting a specific directory from Next.js:

$ node askrepo.js askrepo.js 'what is ppr?' packages/next/src/client/

Understanding PPR in Next.js

Based on the provided source code snippets, PPR likely refers to Partial Page Rendering or a similar concept within Next.js. Let's analyze the code to understand its implementation and implications.

Evidence of PPR:

  • packages/next/src/client/components/router-reducer/navigate-reducer.ts: This file contains two implementations of the navigateReducer function: navigateReducer_PPR and navigateReducer_noPPR. The active implementation is chosen based on the value of the environment variable process.env.__NEXT_PPR. This suggests PPR is a feature toggle related to navigation and state updates.
  • packages/next/src/client/components/router-reducer/restore-reducer.ts: Similar to the navigateReducer, this file also checks process.env.__NEXT_PPR to determine how the cache is handled during restoration. This reinforces the idea of PPR being related to state management and optimization.

Potential Implications of PPR:

  • Optimized Navigation: The presence of different implementations based on PPR suggests that it might optimize navigation by only updating the parts of the page that have changed, instead of re-rendering the entire page. This could lead to faster transitions and improved performance.
  • Cache Management: The restoreReducer implementation for PPR updates the cache to drop prefetch data for segments with already received dynamic data. This hints at PPR impacting cache management during navigation, potentially preventing unnecessary re-fetching and state flashes.

Further Investigation:

To fully understand PPR, it would be helpful to:

  • Analyze the navigateReducer_PPR and updateCacheNodeOnNavigation functions: These functions likely contain the core logic of how PPR works, including how it determines which parts of the page to update and how it manages the cache.
  • Investigate the usage of process.env.__NEXT_PPR: Look for other instances of this environment variable to identify additional components affected by PPR and gain a more comprehensive understanding of its scope.

Conclusion:

While the provided snippets offer strong evidence for PPR being related to Partial Page Rendering or a similar optimization technique, further investigation is necessary to fully grasp its workings and impact on Next.js applications.

Update: Sat May 18 18:51:13

  • Enabled the askrepo command to accept questions in Japanese.
  • Switched to using Gemini 1.5 Flash, which has the same context window but lower cost.

Added @google/generative-ai as a dependency, so it is no longer a standalone script; it now needs to be installed and called.

git clone https://gist.github.com/laiso/3b5e87ad8b92aece3f6b7ecdecef3606 askrepo
cd askrepo
npm i
npm i -g .
which askrepo
export GOOGLE_API_KEY=XXX # https://aistudio.google.com/app/apikey

Sample (executed on jina-ai/reader/tree/main/backend)

$ askrepo "Explain the design of this code" ./backend/

This
code implements a backend service using Firebase Cloud Functions to provide web page crawling and
search functionality.

File Structure:

  • backend/.firebaserc: This file defines the configuration for the Firebase project. It specifies using the reader-6b 7dc project as the default.
  • backend/.gitignore: This file defines files and folders to be ignored by the Git repository
    . It includes log files, Firebase cache and configuration files, and runtime data.
  • backend/firebase.json: This file defines the deployment settings for Firebase Cloud Functions. It includes Firestore
    rule and index files, and Functions deployment settings.
  • backend/functions/.editorconfig: This file defines the code editor configuration and is used to
    unify the code style.
  • backend/functions/.env.example: This file defines a sample of environment variables. The actual environment variables are set in the backend/functions/.env file.
  • backend/functions/.puppeteerrc.cjs: This file defines the configuration for Puppeteer
    . It changes the Puppeteer cache directory to node_modules/puppeteer/walk-around-lame-gcp-build.
  • backend/functions/integrity-check.cjs: This file performs an integrity check on the GeoLite2-City.mmdb file
    . If this file does not exist, it prints an error message and terminates the process.
  • backend/functions/package-lock.json: This file locks the dependencies of the npm packages.
  • backend/functions/src/cloud-functions/crawler.ts: This file implements the web
    page crawling functionality.
  • backend/functions/src/cloud-functions/searcher.ts: This file implements the web page search functionality.
  • backend/functions/src/db/crawled.ts: This file defines the model for saving crawled web
    page data to Firestore.
  • backend/functions/src/db/img-alt.ts: This file defines the model for saving image alt text data to Firestore.
  • backend/functions/src/db/searched.ts
    :
    This file defines the model for saving search result data to Firestore.
  • backend/functions/src/fetch.d.ts: This file defines the type definitions for the undici library.
  • backend/functions/src/index.ts:
    This file is the entry point for Firebase Cloud Functions. It loads other modules and initializes Firebase services.
  • backend/functions/src/services/alt-text.ts: This file implements the service for generating image alt text.

backend/functions/src/services/brave-search.ts: This file implements the service for searching web pages using the Brave Search API.

  • backend/functions/src/services/geoip.ts: This file implements the service for retrieving geographical information, such as city, from IP addresses
    .
  • backend/functions/src/services/puppeteer.ts: This file implements the service for crawling web pages using Puppeteer.
  • backend/functions/src/types.d.ts: This file defines the type definitions for the langdetect library and jsdom library.
  • backend/functions/src/utils/markdown.ts: This file defines utility functions for formatting Markdown text.
  • backend/functions/src/utils/misc
    .ts:
    This file defines general utility functions.
  • backend/functions/tsconfig.json: This file defines the configuration for the TypeScript compiler.
  • backend/storage.rules: This file defines the security rules for Firebase Storage.

**
Design Overview:**

  1. Firebase Cloud Functions: This code runs on Firebase Cloud Functions. This achieves a serverless architecture, improving scalability and reliability.
  2. Firestore: Firestore is used to store crawled web page data
    , image alt text data, and search result data.
  3. Puppeteer: Puppeteer is used to crawl web pages and capture screenshots.
  4. Brave Search API: Brave Search API is used to search web pages
    .
  5. GeoIP Service: GeoIP Service retrieves geographical information, such as city, from IP addresses using Maxmind's GeoLite2-City.mmdb database.
  6. AltText Service: AltText Service resizes
    images using Canvas and generates image captions using the Blip API to generate image alt text.
  7. Caching: This code improves performance by caching crawled web page data and search result data.
  8. Rate Limiting: Rate Limiting is used to prevent excessive
    requests and maintain system stability.
  9. Authorization: This code uses the Jina Embeddings authorization service.

Specific Code Examples:

  • backend/functions/src/cloud-functions/crawler.ts:
    • The CrawlerHost class is initialized as a singleton with the @singleton () decorator.
    • The crawl() function is defined as an HTTP request handler with the @CloudHTTPv2() decorator.
    • The formatSnapshot() function converts crawled web page
      data into formats like Markdown, HTML, text, and screenshots.
  • backend/functions/src/cloud-functions/searcher.ts:
    • The SearcherHost class is initialized as a singleton with the @singleton() decorator.
    • The search () function is defined as an HTTP request handler with the @CloudHTTPv2() decorator.
    • The fetchSearchResults() function uses the Brave Search API to get search results, retrieves crawled web page data, and formats the search results.

This code's design has the following characteristics:

  • Modularization: Each feature is implemented as an independent module, increasing code readability and maintainability.
  • Dependency Injection: Tsyringe is used to inject dependencies
    , improving ease of testing.
  • Caching: Caching data improves performance.
  • Rate Limiting: Rate limiting functions maintain system stability.
  • Authorization: Security is strengthened by using the Jina Embeddings authorization service.

This code is suitable for the following use cases:

  • Applications that need to collect information from web pages
  • Applications that need to search web pages
  • Applications that need to generate alt text for images

This code has the following extensibility:

  • New Crawl Targets: New crawl rules can be added to support crawling new web pages.
  • New Search Engines: New search services can be implemented to support new search engines.
  • New Formats: New format
    conversion functions can be implemented to support new output formats.

Summary:

This code implements a backend service that provides web page crawling and search functionality using Firebase Cloud Functions. This code features modularization, dependency injection, caching, rate limiting, and authorization, and is suitable for various use cases.

Discussion