By clicking “Accept”, you agree to the storing of cookies and pixels on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Prompt Engineering for QA: How Testers Can Leverage LLMs

Logo of Freeport Metrics
Freeport Metrics Team
August 2025

1. Writing Test Cases with LLMs

Crafting test cases manually can be tedious, especially for complex systems. LLMs can generate detailed test cases –  functional, non-functional, and edge-case – based on feature descriptions or user stories, saving time and ensuring thoroughness.

How to Do It

Provide a detailed feature overview, including inputs, expected outputs, and constraints. Specify the desired output format (e.g., table or list) and types of test cases (e.g., positive, negative, edge).

Example Prompt:

As a QA engineer, generate 5 test cases in a table format for the feature below. Include 2 positive, 2 negative, and 1 edge case, with test case ID, description, steps, expected result, and priority.

Feature: A multi-tenant cloud-based payroll system for processing employee salaries. The system supports multiple currencies (USD, EUR, GBP), handles tax calculations based on country-specific regulations (e.g., US federal/state taxes, EU VAT), and integrates with a third-party API for real-time exchange rate updates. Users input employee hours, bonuses, and deductions; the system validates inputs and generates a payroll report. Constraints: Employee hours (0-80 per week), bonus (0-10,000 in base currency), deductions (0-50% of gross pay). The system must handle concurrent users and maintain data isolation between tenants.

Sample Output (Generated by LLM)

Payroll System Test Cases
Test Case ID Description Steps Expected Result Priority
TC_01 Positive: Process payroll for US employee with bonus and deductions 1. Log in as US tenant admin.
2. Enter employee data: 40 hours, $500 bonus, 10% deduction.
3. Select USD currency.
4. Submit payroll.
Payroll report generated with correct US federal/state tax calculations, including bonus and deduction, in USD. High
TC_02 Positive: Process payroll for EU employee with multi-currency bonus 1. Log in as EU tenant admin.
2. Enter employee data: 35 hours, €1,000 bonus (in EUR), 5% deduction.
3. Select GBP as output currency.
4. Submit payroll.
Payroll report generated with correct EU VAT applied, bonus converted to GBP using real-time exchange rate, and deduction applied. High
TC_03 Negative: Invalid employee hours input 1. Log in as US tenant admin.
2. Enter employee data: 90 hours, $200 bonus, 5% deduction.
3. Submit payroll.
Error message: "Employee hours exceed the maximum limit of 80 hours per week." Medium
TC_04 Negative: Deduction exceeds limit 1. Log in as EU tenant admin.
2. Enter employee data: 40 hours, €500 bonus, 60% deduction.
3. Submit payroll.
Error message: "Deduction cannot exceed 50% of gross pay." Medium
TC_05 Edge: Concurrent payroll submissions across tenants 1. Log in as two tenant admins (US and EU) simultaneously.
2. Both submit payrolls with 50 employee records each at the same time.
3. Monitor data isolation and API performance.
Both payrolls processed successfully with no data leakage between tenants; exchange rate API responds within 2 seconds. Critical

Tips for Success

  • Detail Matters: Include constraints like field limits, API dependencies, or regulatory requirements.
  • Refine Iteratively: If scenarios miss concurrency or compliance cases, add context (e.g., "Include multi-tenant isolation tests").
  • Review Outputs: Validate generated test cases for relevance and accuracy, especially for regulatory compliance.

3. Simulating Edge-Case User Behavior

Edge cases – rare or extreme user actions – can expose system vulnerabilities. LLMs can generate creative edge-case scenarios to help QAs test beyond typical use cases.

How to Do It

Describe the feature and ask for edge-case scenarios, emphasizing boundary conditions, invalid inputs, or unusual behaviors.

Example Prompt:

As a QA engineer, generate 5 edge-case scenarios for the following feature that could cause system issues. Include a description and potential impact.

Feature: A healthcare appointment booking system allowing patients to schedule, reschedule, or cancel appointments with doctors. The system supports multiple languages (English, Spanish, Mandarin), enforces doctor availability (8 AM–6 PM, max 10 appointments/day), and sends SMS/email notifications. Data is stored in a HIPAA-compliant database.

Sample Output (Generated by LLM)

  • Scenario: Scheduling an appointment at the exact boundary of doctor availability
    • Description: Patient attempts to book an appointment at 6:00 PM for a doctor with 10 appointments already scheduled.
    • Impact: System may allow overbooking, violating the 10-appointment limit, or fail to display an error, causing scheduling conflicts.
  • Scenario: Submitting a booking request with mixed-language inputs
    • Description: Patient enters name in Mandarin, address in Spanish, and notes in English.
    • Impact: System may fail to process multi-language inputs, leading to data corruption or display issues on the UI.
  • Scenario: Rapid rescheduling during a database maintenance window
    • Description: Patient reschedules an appointment 5 times in 1 minute during a scheduled database maintenance.
    • Impact: System may lose rescheduling requests or create duplicate bookings, violating HIPAA data integrity.
  • Scenario: Canceling an appointment with a failing notification service
    • Description: Patient cancels an appointment while the SMS/email notification service is down.
    • Impact: System may fail to log the cancellation or send delayed notifications, causing confusion for patients or doctors.
  • Scenario: Booking with an expired doctor availability slot
    • Description: Patient books an appointment for a slot that expires due to a time zone mismatch (e.g., patient in UTC vs. doctor in PST).
    • Impact: System may allow invalid bookings, leading to scheduling errors or missed appointments.

Tips for Success

  • Define Boundaries: Specify constraints like language support, time zones, or compliance rules.
  • Focus on Impact: Request potential system impacts to prioritize testing.
  • Integrate with Testing: Use edge cases to inform test case development.

4. Common Mistakes and How to Avoid Them

While LLMs are powerful tools for QA, improper use of prompt engineering can lead to inefficiencies or flawed outputs. Below are common mistakes QA engineers make when leveraging LLMs and strategies to avoid them.

Mistake 1: Overly Generic Prompts

Problem: Using vague prompts like “Generate test cases for a login feature” without specifying details (e.g., input fields, validation rules, or test types) often results in generic or incomplete outputs that miss critical scenarios.

Solution: Always include specific details about the feature, such as field constraints, user roles, or system behavior. For example, instead of “Generate test cases,” use “Generate 5 test cases for a login feature with username (6-20 characters), password (8-16 characters, 1 uppercase, 1 number), and two-factor authentication enabled.”

Mistake 2: Over-Reliance on LLM Outputs

Problem: Accepting LLM-generated test cases, acceptance criteria, or edge cases without validation can lead to errors, as LLMs may produce irrelevant or incorrect content based on misinterpretations.

Solution: Treat LLM outputs as a starting point. Cross-check generated content against requirements, user stories, or domain knowledge to ensure accuracy and relevance. For example, verify that generated test cases cover all acceptance criteria and align with the system’s technical constraints.

Mistake 3: Ignoring Prompt Iteration

Problem: When QA engineers use an LLM, their first prompt might not produce perfect results (e.g., test cases might miss specific scenarios like performance or security tests, or they might be too vague). If they stick with the initial prompt without refining it, the LLM will keep generating suboptimal or incomplete outputs, leading to gaps in testing or wasted effort.

Solution: Iteratively refine prompts based on the quality of LLM outputs. If the initial test cases lack performance or security scenarios, adjust the prompt to explicitly request these (e.g., “Include performance and security test cases”). Keep a record of successful prompt structures for reuse in similar tasks.

Mistake 4: Neglecting Domain-Specific Context

Problem: LLMs may generate outputs that are too generic or misaligned with the project’s domain, such as overlooking industry-specific regulations (e.g., GDPR for data privacy) or technical constraints (e.g., legacy system compatibility).

Solution: Provide domain-specific context in prompts, such as regulatory requirements, system architecture, or user demographics. For example, include “Ensure compliance with GDPR for user data handling” or “Account for a legacy database with a 255-character limit” to guide the LLM toward relevant outputs.

Mistake 5: Inconsistent Output Formats

Problem: Without specifying a desired output format, LLMs may produce inconsistent or unstructured results, making it harder to integrate outputs into QA workflows (e.g., test management tools).

Solution: Clearly define the output format in the prompt, such as “Provide test cases in a JSON format compatible with TestRail” or “List edge cases in a numbered list with headings for description and impact.” This ensures outputs are immediately usable and aligned with existing processes.

Tips for Success

  • Test Prompts in Small Batches: Experiment with small sets of prompts to gauge output quality before scaling up for larger tasks.
  • Document Successful Prompts: Maintain a repository of effective prompts tailored to specific QA tasks for consistency across projects.
  • Collaborate with Stakeholders: Share LLM-generated outputs with developers or product owners to validate relevance and catch domain-specific issues early.

Best Practices for QA Prompt Engineering

  1. Start Simple, Then Refine: Begin with basic prompts and add details if outputs lack specificity.
  2. Use Structured Formats: Request tables, lists, or JSON for organized results.
  3. Provide Context: Include feature details, user personas, or technical constraints.
  4. Validate Outputs: Always review generated content for accuracy.
  5. Set the Role: Use phrases like “Act as a senior QA engineer” to enhance output quality.

Challenges and Limitations

  • Inaccuracies: LLMs may generate irrelevant or incorrect scenarios. Cross-check outputs with requirements.
  • Context Dependency: Without detailed inputs, results may miss project-specific nuances.
  • Not a Replacement: LLMs support, but don’t replace, critical thinking and domain expertise.

Conclusion

Prompt engineering enables QA engineers to work more efficiently and effectively. By designing targeted prompts, testers can leverage LLMs to generate test cases, refine acceptance criteria, and explore edge cases, ultimately enhancing software quality. Experiment with tools and refine your prompts to suit your project’s needs. With practice, you’ll unlock the full potential of LLMs in your QA toolkit.