We love automation. We use it to power our infrastructure, to scale workloads down to zero, and—increasingly—to shrink the amount of human attention needed to ship high-quality code. One place that still felt stubbornly manual was pull-request reviews. Between Cursor as our IDE, ChatGPT/Codex for prototyping, and gemini-cli
for quick checks, our local workflows were fast—but CI still waited for a human.
So we asked a simple question: could we let a large language model read the diff, spot issues, and comment directly on the PR?
Turns out: yes. It took just a few lines of GitHub Actions glue to get helpful, structured reviews on every pull request.
We weren’t trying to replace humans. We wanted a first pass that:
If a change is fine, we want the bot to simply say so and get out of the way.
@google/gemini-cli
inside CI to run the automated review step.gh
) to comment on the PR.Here’s the full Action we’re running. Drop it into .github/workflows/gemini-pr.yml
:
name: gemini-pr
on:
workflow_dispatch:
pull_request:
jobs:
build:
permissions: write-all
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: 'true'
fetch-depth: 0
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
cache: false
- uses: actions/setup-node@v4
with:
node-version: 20
- name: install gemini
run: |
npm install -g @google/gemini-cli
- name: gemini
run: |
echo "merging into ${{ github.base_ref }}"
git diff origin/${{ github.base_ref }} > pr.diff
echo $PROMPT | gemini -a > review.md
cat review.md >> $GITHUB_STEP_SUMMARY
gh pr comment ${{ github.event.pull_request.number }} --body-file review.md
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PROMPT: >
please review the changes of @pr.diff (this pull request) and suggest improvements or provide insights into potential issues.
do not document or comment on existing changes, if everything looks good, just say so.
can you categorise the changes and improvesments into low, medium and high priority?
Whenever you find an issue, please always provide an file and line number as reference information. if multiple files are affected, please provide a list of files and line numbers.
provide the output in markdown format and do not include any other text.
Checkout with fetch-depth: 0
so we can diff against the PR’s base branch reliably.
Rust toolchain installs rustfmt
and clippy
because our repos often include Rust code; those run elsewhere in our pipeline, but keeping toolchain setup here avoids surprises.
Node is required for the gemini-cli
.
We install @google/gemini-cli
globally inside the runner.
We create a diff file:
git diff origin/${{ github.base_ref }} > pr.diff
This ensures the model sees only the changes under review.
We pipe the prompt into gemini -a
(the CLI reads @pr.diff
inline as a file reference) and capture the model’s markdown output into review.md
.
We append the review to the Job Summary ($GITHUB_STEP_SUMMARY
) so it’s visible in the Actions UI.
We comment on the PR using gh pr comment … --body-file review.md
.
LLM outputs are only as good as the instructions. Ours keeps things practical:
We iterated a bit to reach this. The most impactful tweaks were: insisting on file/line references and forbidding extra prose.
On a typical PR, we see sections like:
If everything’s fine, we get a one-liner: “Looks good.” Perfect—that’s exactly what we want.
GEMINI_API_KEY
and GITHUB_TOKEN
in repo or org secrets. Keep scopes tight. The Action sets permissions: write-all
because it posts a comment; restrict this if your policy requires it.git diff origin/${{ github.base_ref }}
gives the right context. If your workflow fetches only the merge commit, make sure the base branch is available or adjust to github.event.pull_request.base.sha
.pull_request_target
with careful hardening, or gate the review behind labels.pull_request
(not every push).Automated reviews make humans more selective with their attention. We spend less time on “rename this variable” and more time on architecture, data flows, and security boundaries. That means:
It’s also surprisingly good at consistency. An LLM won’t forget the agreed-upon error-handling pattern between services or our preferred log structure; it applies those checks uniformly on every PR.
This pattern works with almost any model or CLI. A few easy extensions:
failed
to block merges until addressed.gh
CLI supports this) for even tighter feedback.ai-review
label, or auto-add a needs-attention
label when high-priority findings appear.None of this replaces a human approving a merge. It’s a lightweight filter that pays for itself on day one.