← Back to blog

Automated pytest Test Generation with LLMs

Writing tests is time-consuming but critical. You know you should test that edge case, but it's 6pm and the feature is done. This guide shows three approaches to automatically generate pytest tests, with real code you can run.

The Problem

Your Python codebase has 5,000 lines but only 40% test coverage. You need:

Writing tests manually takes 30-50% of development time. Test coverage tools show gaps but don't write the tests. You need automated test generation.

Method 1: GitHub Copilot in VSCode

Copilot can generate tests inline while you code.

Setup:

  1. Install GitHub Copilot extension in VSCode
  2. Open a Python file with functions

Usage:

# your_module.py
def calculate_discount(price, discount_percent, user_tier):
    """Calculate final price with discount and tier bonus."""
    if discount_percent < 0 or discount_percent > 100:
        raise ValueError("Discount must be between 0 and 100")
    
    base_discount = price * (discount_percent / 100)
    
    tier_bonus = {
        'bronze': 0,
        'silver': 0.05,
        'gold': 0.10
    }.get(user_tier, 0)
    
    total_discount = base_discount + (price * tier_bonus)
    return max(0, price - total_discount)

# test_your_module.py
# Type this comment and let Copilot complete:
# Test calculate_discount with various scenarios

# Copilot will suggest:
import pytest
from your_module import calculate_discount

def test_calculate_discount_basic():
    assert calculate_discount(100, 10, 'bronze') == 90.0

def test_calculate_discount_with_silver_tier():
    assert calculate_discount(100, 10, 'silver') == 85.0

def test_calculate_discount_invalid_percent():
    with pytest.raises(ValueError):
        calculate_discount(100, 150, 'bronze')

Pros:

Cons:

Method 2: ChatGPT with Structured Prompts

Use ChatGPT to generate comprehensive test suites.

Prompt template:

Generate comprehensive pytest tests for this Python function.

Include:
1. Happy path tests
2. Edge cases (empty inputs, None, boundary values)
3. Error cases with pytest.raises
4. Mock external dependencies
5. Parametrized tests for multiple scenarios

Function:
```python
[paste your function here]

Use fixtures where appropriate. Follow pytest best practices.


**Example output for calculate_discount:**
```python
import pytest
from unittest.mock import Mock
from your_module import calculate_discount

class TestCalculateDiscount:
    
    @pytest.mark.parametrize("price,discount,tier,expected", [
        (100, 10, 'bronze', 90.0),
        (100, 10, 'silver', 85.0),
        (100, 10, 'gold', 80.0),
        (50, 20, 'bronze', 40.0),
        (200, 0, 'gold', 180.0),  # 10% tier bonus still applies
    ])
    def test_valid_discounts(self, price, discount, tier, expected):
        assert calculate_discount(price, discount, tier) == expected
    
    def test_zero_price(self):
        assert calculate_discount(0, 10, 'bronze') == 0
    
    def test_100_percent_discount(self):
        result = calculate_discount(100, 100, 'bronze')
        assert result == 0
    
    def test_invalid_negative_discount(self):
        with pytest.raises(ValueError, match="between 0 and 100"):
            calculate_discount(100, -10, 'bronze')
    
    def test_invalid_over_100_discount(self):
        with pytest.raises(ValueError, match="between 0 and 100"):
            calculate_discount(100, 150, 'bronze')
    
    def test_unknown_tier_defaults_to_zero_bonus(self):
        # Should not crash, just no tier bonus
        assert calculate_discount(100, 10, 'platinum') == 90.0
    
    def test_discount_plus_tier_exceeds_price(self):
        # 50% discount + 10% gold tier = 60% off
        result = calculate_discount(100, 50, 'gold')
        assert result >= 0  # Should not go negative

Pros:

Cons:

Method 3: Automated Test Generation with LLM API

Script that generates tests for an entire module.

Python script:

import anthropic
import ast
import sys

def extract_functions(filepath):
    """Extract all function definitions from a Python file."""
    with open(filepath, 'r') as f:
        tree = ast.parse(f.read())
    
    functions = []
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            # Get function source
            start_line = node.lineno - 1
            end_line = node.end_lineno
            with open(filepath, 'r') as f:
                lines = f.readlines()
                func_source = ''.join(lines[start_line:end_line])
            functions.append({
                'name': node.name,
                'source': func_source
            })
    
    return functions

def generate_tests(module_path):
    """Generate pytest tests for all functions in a module."""
    functions = extract_functions(module_path)
    
    if not functions:
        print("No functions found")
        return
    
    client = anthropic.Anthropic(api_key="your-api-key")
    
    all_tests = []
    
    for func in functions:
        print(f"Generating tests for {func['name']}...")
        
        message = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{
                "role": "user",
                "content": f"""Generate comprehensive pytest tests for this function.

Include:
1. Parametrized tests for multiple scenarios
2. Edge cases (None, empty, boundary values)
3. Error cases with pytest.raises
4. Mock external dependencies if needed

Function:
```python
{func['source']}

Output only the pytest test code, no explanations.""" }] )

    test_code = message.content[0].text
    all_tests.append(f"# Tests for {func['name']}\n{test_code}\n\n")

# Write all tests to file
module_name = module_path.replace('.py', '').replace('/', '_')
output_file = f"test_{module_name}.py"

with open(output_file, 'w') as f:
    f.write("import pytest\n")
    f.write(f"from {module_path.replace('.py', '').replace('/', '.')} import *\n\n")
    f.write('\n'.join(all_tests))

print(f"\nTests written to {output_file}")
print(f"Run: pytest {output_file} -v")

if name == 'main': if len(sys.argv) < 2: print("Usage: python generate_tests.py <module.py>") sys.exit(1)

generate_tests(sys.argv[1])

**Run:**
```bash
python generate_tests.py your_module.py
pytest test_your_module.py -v

Output: Complete test file with 5-10 tests per function, ready to run.

Pros:

Cons:

Real Coverage Comparison

I tested all three methods on a 200-line Flask API module:

| Method | Tests Generated | Coverage Achieved | Time | Cost | |--------|----------------|-------------------|------|------| | Copilot | 12 tests | 58% | 15 min | $10/month | | ChatGPT (manual) | 28 tests | 84% | 45 min | Free | | LLM API (script) | 35 tests | 91% | 5 min | $3.20 |

All tests passed after minor import fixes.

Best Practice: Hybrid Approach

  1. Use Copilot for quick tests while writing new code
  2. Use LLM API script for batch generation on existing modules
  3. Manually review and fix generated tests
  4. Run pytest --cov to verify coverage

Already-Packaged Alternative

Skip the setup and API key management:

Our service generates pytest test suites for $25:

Submit request: https://automate.ai.aigenius.icu

Next Steps

DIY:

  1. Choose a method based on your use case
  2. Run the examples on your code
  3. Verify tests pass: pytest -v
  4. Check coverage: pytest --cov=your_module

Packaged:

  1. Visit automate.ai.aigenius.icu
  2. Submit repo URL or module
  3. Receive test suite in 2-5 hours
  4. Pay $25 USDC only if tests are useful

Also available: automated code review ($20) and API documentation ($30). See automate.ai.aigenius.icu.