Multi-Agent Document Translation App with Google ADK and A2A Protocol#4
Open
codegen-sh[bot] wants to merge 1 commit intomasterfrom
Open
Multi-Agent Document Translation App with Google ADK and A2A Protocol#4codegen-sh[bot] wants to merge 1 commit intomasterfrom
codegen-sh[bot] wants to merge 1 commit intomasterfrom
Conversation
…ocol - Implemented 3-agent architecture for layout-preserving document translation - Agent 1: Document-to-Image Converter (PDF, DOCX, TXT support) - Agent 2: Multimodal Translation Agent using Google Gemini Vision - Agent 3: Quality Validation Agent with layout preservation checks - Added FastAPI web service and Streamlit UI - Comprehensive configuration system with environment variables - Batch processing capabilities and usage examples - Full test suite for agents and orchestrator - Support for 12 languages with auto-detection - Quality assessment with layout similarity metrics
Reviewer's GuideThis PR implements a full multi-agent document translation pipeline using Google’s ADK and A2A protocol: it converts input documents to images, translates them via Google Gemini Vision while preserving layout, validates translation quality and layout fidelity, and exposes the workflow via FastAPI and Streamlit interfaces. Sequence diagram for document translation workflowsequenceDiagram
actor User
participant UI as Web UI/API
participant Orchestrator
participant Converter as DocumentConverterAgent
participant Translator as TranslationAgent
participant Validator as ValidationAgent
participant Gemini as Google Gemini Vision API
User->>UI: Upload document & request translation
UI->>Orchestrator: translate_document(document, target_lang)
Orchestrator->>Converter: process(document)
Converter-->>Orchestrator: images
Orchestrator->>Translator: process(images, target_lang)
Translator->>Gemini: generate_content(prompt, image)
Gemini-->>Translator: translation response
Translator-->>Orchestrator: translated images, metadata
Orchestrator->>Validator: process(original images, translated images, metadata)
Validator->>Gemini: generate_content(validation prompt, images)
Gemini-->>Validator: validation response
Validator-->>Orchestrator: validation results
Orchestrator-->>UI: results (output files, quality, etc.)
UI-->>User: Download/display translated document
Class diagram for agent classes and orchestratorclassDiagram
class BaseAgent {
+agent_id: str
+config: dict
+is_running: bool
+start()
+stop()
+process(input_data)
}
class DocumentConverterAgent {
+process(input_data)
}
class TranslationAgent {
+process(input_data)
}
class ValidationAgent {
+process(input_data)
}
class TranslationOrchestrator {
+agents: dict
+initialize()
+shutdown()
+translate_document(...)
+get_supported_languages()
+get_system_status()
}
BaseAgent <|-- DocumentConverterAgent
BaseAgent <|-- TranslationAgent
BaseAgent <|-- ValidationAgent
TranslationOrchestrator o-- DocumentConverterAgent
TranslationOrchestrator o-- TranslationAgent
TranslationOrchestrator o-- ValidationAgent
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
| st.markdown( | ||
| f'<div class="agent-status {status_class}">' | ||
| f'{status_icon} {agent_name.title()}' | ||
| f'</div>', |
There was a problem hiding this comment.
Suggested change
| f'</div>', | |
| '</div>', |
f-string is unnecessary here. This can just be a string. More info.
| f'<div class="{quality_class}">' | ||
| f'**Quality Assessment:** {quality["grade"]} ' | ||
| f'({quality.get("overall_score", 0):.2f})' | ||
| f'</div>', |
There was a problem hiding this comment.
Suggested change
| f'</div>', | |
| '</div>', |
Likewise, f-string is unnecessary here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🌐 Multi-Agent Document Translation App
This PR introduces a sophisticated document translation system that preserves layout integrity using Google's Agent Development Kit (ADK) and A2A protocol.
🎯 Problem Solved
Standard document translation tools often fail with visually complex documents (technical manuals, marketing brochures, academic papers with diagrams). They extract text, translate it, and try to reflow it into the document, which:
🏗️ Solution Architecture
3-Agent System:
📄 Document-to-Image Converter Agent
🌐 Multimodal Translation Agent
✅ Quality Validation Agent
🚀 Features
📁 Key Files
multi_agent_document_translator/orchestrator.py- Main orchestration logicmulti_agent_document_translator/agents/- Individual agent implementationsmulti_agent_document_translator/api.py- FastAPI web servicemulti_agent_document_translator/streamlit_app.py- Web UImulti_agent_document_translator/config.py- Configuration management🛠️ Usage
Simple Usage:
Web API:
python multi_agent_document_translator/run_api.py # Visit http://localhost:8000/docs for API documentationWeb UI:
python multi_agent_document_translator/run_streamlit.py # Visit http://localhost:8501 for web interface🧪 Testing
Comprehensive test suite included:
📋 Requirements
requirements.txtfor full dependencies🔧 Configuration
Copy
.env.exampleto.envand configure:GEMINI_API_KEY: Your Gemini API keyGOOGLE_CLOUD_PROJECT: Your GCP project IDThis implementation provides a production-ready solution for layout-preserving document translation using cutting-edge AI and multi-agent architecture.
💻 View my work • About Codegen
Summary by Sourcery
Introduce a production-ready multi-agent document translation application using Google ADK and A2A protocol with image-based translation, layout preservation, and quality validation, exposed via FastAPI and Streamlit interfaces with full documentation, examples, and testing
New Features:
Documentation:
Tests:
Chores:
.env.examplefor environment configuration