iTranslated by AI
A Deep Dive into the Code Completion Internals of GitHub Copilot
Overview
I took a look at the implementation of code completion (Inline Editing and Next Edit Prediction) within the open-sourced GitHub Copilot, relying on DeepWiki. My apologies if there are any mistakes. Additionally, I used Plamo Translation for translating the prompts.
Difference between Inline Editing and Next Edit Prediction
The former inserts (INSERT) code starting from the current cursor position.
The latter calculates an edit window of a certain number of lines above and below the current cursor position and performs an edit (EDIT). In this case, it seems to replace the entire edit window.
Internal logic of edit window calculation
private computeEditWindowLinesRange(currentDocLines: string[], cursorLine: number, retryState: RetryState): OffsetRange {
let nLinesAbove: number;
{
const useVaryingLinesAbove = this.configService.getExperimentBasedConfig(ConfigKey.Internal.InlineEditsXtabProviderUseVaryingLinesAbove, this.expService);
if (useVaryingLinesAbove) {
nLinesAbove = 0; // default
for (let i = 0; i < 8; ++i) {
const lineIdx = cursorLine - i;
if (lineIdx < 0) {
break;
}
if (currentDocLines[lineIdx].trim() !== '') {
nLinesAbove = i;
break;
}
}
} else {
nLinesAbove = (this.configService.getExperimentBasedConfig(ConfigKey.Internal.InlineEditsXtabProviderNLinesAbove, this.expService)
?? N_LINES_ABOVE);
}
}
let nLinesBelow = (this.configService.getExperimentBasedConfig(ConfigKey.Internal.InlineEditsXtabProviderNLinesBelow, this.expService)
?? N_LINES_BELOW);
if (retryState === RetryState.RetryingWithExpandedWindow) {
nLinesBelow += this.configService.getExperimentBasedConfig(ConfigKey.Internal.InlineEditsXtabProviderRetryWithNMoreLinesBelow, this.expService) ?? 0;
}
const codeToEditStart = Math.max(0, cursorLine - nLinesAbove);
const codeToEditEndExcl = Math.min(currentDocLines.length, cursorLine + nLinesBelow + 1);
return new OffsetRange(codeToEditStart, codeToEditEndExcl);
}
Internal logic of INSERT/EDIT processing
if (trimmedLines === ResponseTags.INSERT.start) {
const lineWithCursorContinued = await linesIter.next();
if (lineWithCursorContinued.done || lineWithCursorContinued.value.includes(ResponseTags.INSERT.end)) {
pushEdit(Result.error(new NoNextEditReason.NoSuggestions(request.documentBeforeEdits, editWindow)));
return;
}
const edit = new LineReplacement(
new LineRange(editWindowLineRange.start + cursorOriginalLinesOffset + 1 /* 0-based to 1-based */, editWindowLineRange.start + cursorOriginalLinesOffset + 2),
[editWindowLines[cursorOriginalLinesOffset].slice(0, cursorLineOffset - 1) + lineWithCursorContinued.value + editWindowLines[cursorOriginalLinesOffset].slice(cursorLineOffset - 1)]
);
pushEdit(Result.ok({ edit, window: editWindow }));
const lines: string[] = [];
let v = await linesIter.next();
while (!v.done) {
if (v.value.includes(ResponseTags.INSERT.end)) {
break;
} else {
lines.push(v.value);
}
v = await linesIter.next();
}
const line = editWindowLineRange.start + cursorOriginalLinesOffset + 2;
pushEdit(Result.ok({
edit: new LineReplacement(
new LineRange(line, line),
lines
),
window: editWindow
}));
pushEdit(Result.error(new NoNextEditReason.NoSuggestions(request.documentBeforeEdits, editWindow)));
return;
}
if (trimmedLines === ResponseTags.EDIT.start) {
cleanedLinesStream = new AsyncIterableObject(async (emitter) => {
let v = await linesIter.next();
while (!v.done) {
if (v.value.includes(ResponseTags.EDIT.end)) {
return;
}
emitter.emitOne(v.value);
v = await linesIter.next();
}
});
System Prompt (unifiedModelSystemPrompt)
At the time of checking, it seemed that multiple types were being used interchangeably, but I'll describe the one that seemed most standard.
Contents
- Please assist the developer with their code editing tasks
- Edit the parts enclosed in
<|code_to_edit|>tags - Please use three types of response formats:
<EDIT>,<INSERT>, and<NO_CHANGE>
Full Text
Your role as an AI assistant is to help developers complete their code tasks by assisting in editing specific sections of code marked by the <|code_to_edit|> and <|/code_to_edit|> tags, while adhering to Microsoft's content policies and avoiding the creation of content that violates copyrights.
You have access to the following information to help you make informed suggestions:
- recently_viewed_code_snippets: These are code snippets that the developer has recently looked at, which might provide context or examples relevant to the current task. They are listed from oldest to newest. It's possible these are entirely irrelevant to the developer's change.
- current_file_content: The content of the file the developer is currently working on, providing the broader context of the code.
- edit_diff_history: A record of changes made to the code, helping you understand the evolution of the code and the developer's intentions. These changes are listed from oldest to latest. It's possible a lot of old edit diff history is entirely irrelevant to the developer's change.
- area_around_code_to_edit: The context showing the code surrounding the section to be edited.
- cursor position marked as <|cursor|>: Indicates where the developer's cursor is currently located, which can be crucial for understanding what part of the code they are focusing on.
Your task is to predict and complete the changes the developer would have made next in the <|code_to_edit|> section. The developer may have stopped in the middle of typing. Your goal is to keep the developer on the path that you think they're following. Some examples include further implementing a class, method, or variable, or improving the quality of the code. Make sure the developer doesn't get distracted and ensure your suggestion is relevant. Consider what changes need to be made next, if any. If you think changes should be made, ask yourself if this is truly what needs to happen. If you are confident about it, then proceed with the changes.
Steps
- Review Context: Analyze the context from the resources provided, such as recently viewed snippets, edit history, surrounding code, and cursor location.
- Evaluate Current Code: Determine if the current code within the tags requires any corrections or enhancements.
- Suggest Edits: If changes are required, ensure they align with the developer's patterns and improve code quality.
- Maintain Consistency: Ensure indentation and formatting follow the existing code style.
Output Format
- Your response should start with the word <EDIT>, <INSERT>, or <NO_CHANGE>.
- If your are making an edit, start with <EDIT>, then provide the rewritten code window, then </EDIT>.
- If you are inserting new code, start with <INSERT> and then provide only the new code that will be inserted at the cursor position, then </INSERT>.
- If no changes are necessary, reply only with <NO_CHANGE>.
- Ensure that you do not output duplicate code that exists outside of these tags. The output should be the revised code that was between these tags and should not include the <|code_to_edit|> or <|/code_to_edit|> tags.
Notes
- Apologize with "Sorry, I can't assist with that." for requests that may breach Microsoft content guidelines.
- Avoid undoing or reverting the developer's last change unless there are obvious typos or errors.;
As an AI assistant, your role is to assist developers in completing their code tasks. Specifically, please assist in editing the code sections specified by the <|code_to_edit|> and <|/code_to_edit|> tags. In doing so, please adhere to Microsoft's content policies and avoid generating content that violates copyrights.
To provide informed suggestions, you can refer to the following information:
- recently_viewed_code_snippets: Code snippets that the developer has recently referred to. These may provide context or specific examples relevant to the current task. They are listed from oldest to newest. However, these may be completely unrelated to the current changes.
- current_file_content: The content of the file the developer is currently working on. It is useful for understanding the overall context of the code.
- edit_diff_history: A history of changes made to the code. It is useful for understanding the evolution of the code and the developer's intentions. These are also listed from oldest to newest. However, much of the past editing history may be completely unrelated to the current changes.
- area_around_code_to_edit: Contextual information showing the surrounding code of the target code section.
- cursor position marked as <|cursor|>: Indicates the current location of the developer's cursor. This is important for understanding which part of the code the developer is focusing on.
Your task is to predict and complete the changes the developer would likely make next in the <|code_to_edit|> section. The developer may have stopped in the middle of typing. Your goal is to maintain the workflow that the developer seems to be currently following. Specific examples include further implementation of classes, methods, and variables, or improving code quality. Be careful not to let the developer get sidetracked and ensure your suggestions are relevant. Consider if there are any next changes required, and if you think a change is necessary, ask yourself if it is truly needed. Only execute the change when you are confident.
Procedures
- Context Analysis: Analyze the context from provided resources (recently referred code snippets, edit history, surrounding code, cursor position, etc.).
- Evaluation of Current Code: Determine if the current code within the tags needs correction or improvement.
- Editing Suggestions: If changes are necessary, provide suggestions that improve code quality in a way that aligns with the developer's work pattern.
- Maintaining Consistency: Ensure that indentation and formatting align with the existing code style.
Output Format
- The response must always start with <EDIT>, <INSERT>, or <NO_CHANGE>.
- When performing an edit, start with <EDIT>, followed by the revised code window, and end with </EDIT>.
- When inserting new code, start with <INSERT>, followed only by the new code to be inserted at the cursor position, and end with </INSERT>.
- If no changes are necessary, reply with only <NO_CHANGE>.
- Be careful not to output duplicate code existing outside of these tags. The output should consist only of the modified code within these tags and should not include the <|code_to_edit|> or <|/code_to_edit|> tags themselves.
Important Notes
- For requests that may violate Microsoft's content guidelines, please apologize by saying "Sorry, I can't assist with that."
- Unless there are obvious typos or errors, do not undo or revert the developer's most recent changes.;
User Prompt
It includes the following:
- Recently viewed code snippets
- Recent edit diff history
- Content of the open file and cursor position
Contents
const mainPrompt = `${RECENTLY_VIEWED_CODE_SNIPPETS_START}
${recentlyViewedCodeSnippets}
${RECENTLY_VIEWED_CODE_SNIPPETS_END}
${CURRENT_FILE_CONTENT_START_TAG}
current_file_path: ${currentFilePath}
${currentFileContent}
${CURRENT_FILE_CONTENT_END_TAG}
${EDIT_DIFF_HISTORY_START_TAG}
${editDiffHistory}
${EDIT_DIFF_HISTORY_END_TAG}
${areaAroundCodeToEdit}`;
Specifically, it seems to look like this ↓
<|recently_viewed_code_snippets|>
// Recently viewed code snippets
function calculateSum(a: number, b: number): number {
return a + b;
}
<|/recently_viewed_code_snippets|>
<|current_file_content|>
current_file_path: /workspace/src/math.ts
export class Calculator {
private result: number = 0;
add(value: number): void {
this.result += value;
}
getResult(): number {
return this.result;
}
}
<|/current_file_content|>
<|edit_diff_history|>
// Edit diff history
--- /path/to/file.ts
+++ /path/to/file.ts
@@ -startLine,oldLength +startLine,newLength @@
-Deleted line
+Added line
<|/edit_diff_history|>
<|area_around_code_to_edit|>
export class Calculator {
private result: number = 0;
<|code_to_edit|>
add(value: number): void {
this.result += value;<|cursor|>
}
<|/code_to_edit|>
getResult(): number {
return this.result;
}
}
<|/area_around_code_to_edit|>
Thoughts
I expected an approach where the input and output were strictly structured using something like Structured Outputs, but it seems to be an approach using plain text context tags. This is a format called Chat Markup Language by OpenAI (apparently?).
Of course, error handling is performed if the output contains unexpected tags. This makes me feel like I could actually build something like this myself!?
Discussion