We love automation. We use it to power our infrastructure, to scale workloads down to zero, and—increasingly—to shrink the amount of human attention needed to ship high-quality code. One place that still felt stubbornly manual was pull-request reviews. Between Cursor as our IDE, ChatGPT/Codex for prototyping, and gemini-cli for quick checks, our local workflows were fast—but CI still waited for a human.
So we asked a simple question: could we let a large language model read the diff, spot issues, and comment directly on the PR?
Turns out: yes. It took just a few lines of GitHub Actions glue to get helpful, structured reviews on every pull request.
We weren’t trying to replace humans. We wanted a first pass that:
If a change is fine, we want the bot to simply say so and get out of the way.
@google/gemini-cli inside CI to run the automated review step.gh) to comment on the PR.Here’s the full Action we’re running. Drop it into .github/workflows/gemini-pr.yml:
name: gemini-pr
on:
workflow_dispatch:
pull_request:
jobs:
build:
permissions: write-all
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
submodules: 'true'
fetch-depth: 0
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
cache: false
- uses: actions/setup-node@v4
with:
node-version: 20
- name: install gemini
run: |
npm install -g @google/gemini-cli
- name: gemini
run: |
echo "merging into ${{ github.base_ref }}"
git diff origin/${{ github.base_ref }} > pr.diff
echo $PROMPT | gemini > review.md
cat review.md >> $GITHUB_STEP_SUMMARY
gh pr comment ${{ github.event.pull_request.number }} --body-file review.md
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PROMPT: >
please review the changes of @pr.diff (this pull request) and suggest improvements or provide insights into potential issues.
do not document or comment on existing changes, if everything looks good, just say so.
can you categorise the changes and improvesments into low, medium and high priority?
Whenever you find an issue, please always provide an file and line number as reference information. if multiple files are affected, please provide a list of files and line numbers.
provide the output in markdown format and do not include any other text.
Checkout with fetch-depth: 0 so we can diff against the PR’s base branch reliably.
Rust toolchain installs rustfmt and clippy because our repos often include Rust code; those run elsewhere in our pipeline, but keeping toolchain setup here avoids surprises.
Node is required for the gemini-cli.
We install @google/gemini-cli globally inside the runner.
We create a diff file:
git diff origin/${{ github.base_ref }} > pr.diff
This ensures the model sees only the changes under review.
We pipe the prompt into gemini (the CLI reads @pr.diff inline as a file reference) and capture the model’s markdown output into review.md.
We append the review to the Job Summary ($GITHUB_STEP_SUMMARY) so it’s visible in the Actions UI.
We comment on the PR using gh pr comment … --body-file review.md.
LLM outputs are only as good as the instructions. Ours keeps things practical:
We iterated a bit to reach this. The most impactful tweaks were: insisting on file/line references and forbidding extra prose.
On a typical PR, we see sections like:
If everything’s fine, we get a one-liner: “Looks good.” Perfect—that’s exactly what we want.
GEMINI_API_KEY and GITHUB_TOKEN in repo or org secrets. Keep scopes tight. The Action sets permissions: write-all because it posts a comment; restrict this if your policy requires it.git diff origin/${{ github.base_ref }} gives the right context. If your workflow fetches only the merge commit, make sure the base branch is available or adjust to github.event.pull_request.base.sha.pull_request_target with careful hardening, or gate the review behind labels.pull_request (not every push).Automated reviews make humans more selective with their attention. We spend less time on “rename this variable” and more time on architecture, data flows, and security boundaries. That means:
It’s also surprisingly good at consistency. An LLM won’t forget the agreed-upon error-handling pattern between services or our preferred log structure; it applies those checks uniformly on every PR.
This pattern works with almost any model or CLI. A few easy extensions:
failed to block merges until addressed.gh CLI supports this) for even tighter feedback.ai-review label, or auto-add a needs-attention label when high-priority findings appear.None of this replaces a human approving a merge. It’s a lightweight filter that pays for itself on day one.