Spaces:
Running
A newer version of the Gradio SDK is available:
6.0.2
SERPER Web Search Implementation Plan
Executive Summary
This plan details the implementation of SERPER-based web search by vendoring code from folder/tools/web_search.py into src/tools/, creating a protocol-compliant SerperWebSearchTool, fixing the existing WebSearchTool, and integrating both into the main search flow.
Project Structure
Project 1: Vendor and Refactor Core Web Search Components
Goal: Extract and vendor Serper/SearchXNG search logic from folder/tools/web_search.py into src/tools/
Project 2: Create Protocol-Compliant SerperWebSearchTool
Goal: Implement SerperWebSearchTool class that fully complies with SearchTool protocol
Project 3: Fix Existing WebSearchTool Protocol Compliance
Goal: Make existing WebSearchTool (DuckDuckGo) protocol-compliant
Project 4: Integrate Web Search into SearchHandler
Goal: Add web search tools to main search flow in src/app.py
Project 5: Update Callers and Dependencies
Goal: Update all code that uses web search to work with new implementation
Project 6: Testing and Validation
Goal: Add comprehensive tests for all web search implementations
Detailed Implementation Plan
PROJECT 1: Vendor and Refactor Core Web Search Components
Activity 1.1: Create Vendor Module Structure
File: src/tools/vendored/__init__.py
- Task 1.1.1: Create
src/tools/vendored/directory - Task 1.1.2: Create
__init__.pywith exports
File: src/tools/vendored/web_search_core.py
- Task 1.1.3: Vendor
ScrapeResult,WebpageSnippet,SearchResultsmodels fromfolder/tools/web_search.py(lines 23-37) - Task 1.1.4: Vendor
scrape_urls()function (lines 274-299) - Task 1.1.5: Vendor
fetch_and_process_url()function (lines 302-348) - Task 1.1.6: Vendor
html_to_text()function (lines 351-368) - Task 1.1.7: Vendor
is_valid_url()function (lines 371-410) - Task 1.1.8: Vendor
ssl_contextsetup (lines 115-120) - Task 1.1.9: Add imports:
aiohttp,asyncio,BeautifulSoup,ssl - Task 1.1.10: Add
CONTENT_LENGTH_LIMIT = 10000constant - Task 1.1.11: Add type hints following project standards
- Task 1.1.12: Add structlog logging
- Task 1.1.13: Replace
print()statements withloggercalls
File: src/tools/vendored/serper_client.py
- Task 1.1.14: Vendor
SerperClientclass fromfolder/tools/web_search.py(lines 123-196) - Task 1.1.15: Remove dependency on
ResearchAgentandResearchRunner - Task 1.1.16: Replace filter agent with simple relevance filtering or remove it
- Task 1.1.17: Add
__init__that takesapi_key: str | Noneparameter - Task 1.1.18: Update
search()method to returnlist[WebpageSnippet]without filtering - Task 1.1.19: Remove
_filter_results()method (or make it optional) - Task 1.1.20: Add error handling with
SearchErrorandRateLimitError - Task 1.1.21: Add structlog logging
- Task 1.1.22: Add type hints
File: src/tools/vendored/searchxng_client.py
- Task 1.1.23: Vendor
SearchXNGClientclass fromfolder/tools/web_search.py(lines 199-271) - Task 1.1.24: Remove dependency on
ResearchAgentandResearchRunner - Task 1.1.25: Replace filter agent with simple relevance filtering or remove it
- Task 1.1.26: Add
__init__that takeshost: strparameter - Task 1.1.27: Update
search()method to returnlist[WebpageSnippet]without filtering - Task 1.1.28: Remove
_filter_results()method (or make it optional) - Task 1.1.29: Add error handling with
SearchErrorandRateLimitError - Task 1.1.30: Add structlog logging
- Task 1.1.31: Add type hints
Activity 1.2: Create Rate Limiting for Web Search
File: src/tools/rate_limiter.py
- Task 1.2.1: Add
get_serper_limiter()function (rate: "10/second" with API key) - Task 1.2.2: Add
get_searchxng_limiter()function (rate: "5/second") - Task 1.2.3: Use
RateLimiterFactory.get()pattern
PROJECT 2: Create Protocol-Compliant SerperWebSearchTool
Activity 2.1: Implement SerperWebSearchTool Class
File: src/tools/serper_web_search.py
Task 2.1.1: Create new file
src/tools/serper_web_search.pyTask 2.1.2: Add imports:
from src.tools.base import SearchToolfrom src.tools.vendored.serper_client import SerperClientfrom src.tools.vendored.web_search_core import scrape_urls, WebpageSnippetfrom src.tools.rate_limiter import get_serper_limiterfrom src.tools.query_utils import preprocess_queryfrom src.utils.config import settingsfrom src.utils.exceptions import SearchError, RateLimitErrorfrom src.utils.models import Citation, Evidenceimport structlogfrom tenacity import retry, stop_after_attempt, wait_exponential
Task 2.1.3: Create
SerperWebSearchToolclassTask 2.1.4: Add
__init__(self, api_key: str | None = None)method- Line 2.1.4.1: Get API key from parameter or
settings.serper_api_key - Line 2.1.4.2: Validate API key is not None, raise
ConfigurationErrorif missing - Line 2.1.4.3: Initialize
SerperClient(api_key=self.api_key) - Line 2.1.4.4: Get rate limiter:
self._limiter = get_serper_limiter(self.api_key)
- Line 2.1.4.1: Get API key from parameter or
Task 2.1.5: Add
@property def name(self) -> str:returning"serper"Task 2.1.6: Add
async def _rate_limit(self) -> None:method- Line 2.1.6.1: Call
await self._limiter.acquire()
- Line 2.1.6.1: Call
Task 2.1.7: Add
@retry(...)decorator with exponential backoffTask 2.1.8: Add
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:method- Line 2.1.8.1: Call
await self._rate_limit() - Line 2.1.8.2: Preprocess query:
clean_query = preprocess_query(query) - Line 2.1.8.3: Use
clean_query if clean_query else query - Line 2.1.8.4: Call
search_results = await self._client.search(query, filter_for_relevance=False, max_results=max_results) - Line 2.1.8.5: Call
scraped = await scrape_urls(search_results) - Line 2.1.8.6: Convert
ScrapeResulttoEvidenceobjects:- Line 2.1.8.6.1: Create
Citationwithtitle,url,source="serper",date="Unknown",authors=[] - Line 2.1.8.6.2: Create
Evidencewithcontent=scraped.text,citation,relevance=0.0
- Line 2.1.8.6.1: Create
- Line 2.1.8.7: Return
list[Evidence] - Line 2.1.8.8: Add try/except for
httpx.HTTPStatusError:- Line 2.1.8.8.1: Check for 429 status, raise
RateLimitError - Line 2.1.8.8.2: Otherwise raise
SearchError
- Line 2.1.8.8.1: Check for 429 status, raise
- Line 2.1.8.9: Add try/except for
httpx.TimeoutException, raiseSearchError - Line 2.1.8.10: Add generic exception handler, log and raise
SearchError
- Line 2.1.8.1: Call
Activity 2.2: Implement SearchXNGWebSearchTool Class
File: src/tools/searchxng_web_search.py
Task 2.2.1: Create new file
src/tools/searchxng_web_search.pyTask 2.2.2: Add imports (similar to SerperWebSearchTool)
Task 2.2.3: Create
SearchXNGWebSearchToolclassTask 2.2.4: Add
__init__(self, host: str | None = None)method- Line 2.2.4.1: Get host from parameter or
settings.searchxng_host - Line 2.2.4.2: Validate host is not None, raise
ConfigurationErrorif missing - Line 2.2.4.3: Initialize
SearchXNGClient(host=self.host) - Line 2.2.4.4: Get rate limiter:
self._limiter = get_searchxng_limiter()
- Line 2.2.4.1: Get host from parameter or
Task 2.2.5: Add
@property def name(self) -> str:returning"searchxng"Task 2.2.6: Add
async def _rate_limit(self) -> None:methodTask 2.2.7: Add
@retry(...)decoratorTask 2.2.8: Add
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:method- Line 2.2.8.1-2.2.8.10: Similar structure to SerperWebSearchTool
PROJECT 3: Fix Existing WebSearchTool Protocol Compliance
Activity 3.1: Update WebSearchTool Class
File: src/tools/web_search.py
Task 3.1.1: Add
@property def name(self) -> str:method returning"duckduckgo"(after line 17)Task 3.1.2: Change
search()return type fromSearchResulttolist[Evidence](line 19)Task 3.1.3: Update
search()method body:- Line 3.1.3.1: Keep existing search logic (lines 21-43)
- Line 3.1.3.2: Instead of returning
SearchResult, returnevidencelist directly (line 44) - Line 3.1.3.3: Update exception handler to return empty list
[]instead ofSearchResult(line 51)
Task 3.1.4: Add imports if needed:
- Line 3.1.4.1:
from src.utils.exceptions import SearchError - Line 3.1.4.2: Update exception handling to raise
SearchErrorinstead of returning errorSearchResult
- Line 3.1.4.1:
Task 3.1.5: Add query preprocessing:
- Line 3.1.5.1: Import
from src.tools.query_utils import preprocess_query - Line 3.1.5.2: Add
clean_query = preprocess_query(query)before search - Line 3.1.5.3: Use
clean_query if clean_query else query
- Line 3.1.5.1: Import
Activity 3.2: Update Retrieval Agent Caller
File: src/agents/retrieval_agent.py
- Task 3.2.1: Update
search_web()function (line 31):- Line 3.2.1.1: Change
results = await _web_search.search(query, max_results) - Line 3.2.1.2: Change to
evidence = await _web_search.search(query, max_results) - Line 3.2.1.3: Update check:
if not evidence:instead ofif not results.evidence: - Line 3.2.1.4: Update state update:
new_count = state.add_evidence(evidence)instead ofresults.evidence - Line 3.2.1.5: Update logging:
results_found=len(evidence)instead oflen(results.evidence) - Line 3.2.1.6: Update output formatting:
for i, r in enumerate(evidence[:max_results], 1):instead ofresults.evidence[:max_results] - Line 3.2.1.7: Update deduplication:
await state.embedding_service.deduplicate(evidence)instead ofresults.evidence - Line 3.2.1.8: Update output message:
Found {len(evidence)} web resultsinstead oflen(results.evidence)
- Line 3.2.1.1: Change
PROJECT 4: Integrate Web Search into SearchHandler
Activity 4.1: Create Web Search Tool Factory
File: src/tools/web_search_factory.py
Task 4.1.1: Create new file
src/tools/web_search_factory.pyTask 4.1.2: Add imports:
from src.tools.web_search import WebSearchToolfrom src.tools.serper_web_search import SerperWebSearchToolfrom src.tools.searchxng_web_search import SearchXNGWebSearchToolfrom src.utils.config import settingsfrom src.utils.exceptions import ConfigurationErrorimport structlog
Task 4.1.3: Add
logger = structlog.get_logger()Task 4.1.4: Create
def create_web_search_tool() -> SearchTool | None:function- Line 4.1.4.1: Check
settings.web_search_provider - Line 4.1.4.2: If
"serper":- Line 4.1.4.2.1: Check
settings.serper_api_keyorsettings.web_search_available() - Line 4.1.4.2.2: If available, return
SerperWebSearchTool() - Line 4.1.4.2.3: Else log warning and return
None
- Line 4.1.4.2.1: Check
- Line 4.1.4.3: If
"searchxng":- Line 4.1.4.3.1: Check
settings.searchxng_hostorsettings.web_search_available() - Line 4.1.4.3.2: If available, return
SearchXNGWebSearchTool() - Line 4.1.4.3.3: Else log warning and return
None
- Line 4.1.4.3.1: Check
- Line 4.1.4.4: If
"duckduckgo":- Line 4.1.4.4.1: Return
WebSearchTool()(always available)
- Line 4.1.4.4.1: Return
- Line 4.1.4.5: If
"brave"or"tavily":- Line 4.1.4.5.1: Log warning "Not yet implemented"
- Line 4.1.4.5.2: Return
None
- Line 4.1.4.6: Default: return
WebSearchTool()(fallback to DuckDuckGo)
- Line 4.1.4.1: Check
Activity 4.2: Update SearchHandler Initialization
File: src/app.py
Task 4.2.1: Add import:
from src.tools.web_search_factory import create_web_search_toolTask 4.2.2: Update
configure_orchestrator()function (around line 73):- Line 4.2.2.1: Before creating
SearchHandler, callweb_search_tool = create_web_search_tool() - Line 4.2.2.2: Create tools list:
tools = [PubMedTool(), ClinicalTrialsTool(), EuropePMCTool()] - Line 4.2.2.3: If
web_search_tool is not None:- Line 4.2.2.3.1: Append
web_search_toolto tools list - Line 4.2.2.3.2: Log info: "Web search tool added to search handler"
- Line 4.2.2.3.1: Append
- Line 4.2.2.4: Update
SearchHandlerinitialization to usetoolslist
- Line 4.2.2.1: Before creating
PROJECT 5: Update Callers and Dependencies
Activity 5.1: Update web_search_adapter
File: src/tools/web_search_adapter.py
- Task 5.1.1: Update
web_search()function to use new implementation:- Line 5.1.1.1: Import
from src.tools.web_search_factory import create_web_search_tool - Line 5.1.1.2: Remove dependency on
folder.tools.web_search - Line 5.1.1.3: Get tool:
tool = create_web_search_tool() - Line 5.1.1.4: If
tool is None, return error message - Line 5.1.1.5: Call
evidence = await tool.search(query, max_results=5) - Line 5.1.1.6: Convert
Evidenceobjects to formatted string:- Line 5.1.1.6.1: Format each evidence with title, URL, content preview
- Line 5.1.1.7: Return formatted string
- Line 5.1.1.1: Import
Activity 5.2: Update Tool Executor
File: src/tools/tool_executor.py
- Task 5.2.1: Verify
web_search_adapter.web_search()usage (line 86) still works - Task 5.2.2: No changes needed if adapter is updated correctly
Activity 5.3: Update Planner Agent
File: src/orchestrator/planner_agent.py
- Task 5.3.1: Verify
web_search_adapter.web_search()usage (line 14) still works - Task 5.3.2: No changes needed if adapter is updated correctly
Activity 5.4: Remove Legacy Dependencies
File: src/tools/web_search_adapter.py
- Task 5.4.1: Remove import of
folder.llm_configandfolder.tools.web_search - Task 5.4.2: Update error messages to reflect new implementation
PROJECT 6: Testing and Validation
Activity 6.1: Unit Tests for Vendored Components
File: tests/unit/tools/test_vendored_web_search_core.py
- Task 6.1.1: Test
scrape_urls()function - Task 6.1.2: Test
fetch_and_process_url()function - Task 6.1.3: Test
html_to_text()function - Task 6.1.4: Test
is_valid_url()function
File: tests/unit/tools/test_vendored_serper_client.py
- Task 6.1.5: Mock SerperClient API calls
- Task 6.1.6: Test successful search
- Task 6.1.7: Test error handling
- Task 6.1.8: Test rate limiting
File: tests/unit/tools/test_vendored_searchxng_client.py
- Task 6.1.9: Mock SearchXNGClient API calls
- Task 6.1.10: Test successful search
- Task 6.1.11: Test error handling
- Task 6.1.12: Test rate limiting
Activity 6.2: Unit Tests for Web Search Tools
File: tests/unit/tools/test_serper_web_search.py
- Task 6.2.1: Test
SerperWebSearchTool.__init__()with valid API key - Task 6.2.2: Test
SerperWebSearchTool.__init__()without API key (should raise) - Task 6.2.3: Test
nameproperty returns"serper" - Task 6.2.4: Test
search()returnslist[Evidence] - Task 6.2.5: Test
search()with mocked SerperClient - Task 6.2.6: Test error handling (SearchError, RateLimitError)
- Task 6.2.7: Test query preprocessing
- Task 6.2.8: Test rate limiting
File: tests/unit/tools/test_searchxng_web_search.py
- Task 6.2.9: Similar tests for SearchXNGWebSearchTool
File: tests/unit/tools/test_web_search.py
- Task 6.2.10: Test
WebSearchTool.nameproperty returns"duckduckgo" - Task 6.2.11: Test
WebSearchTool.search()returnslist[Evidence] - Task 6.2.12: Test
WebSearchTool.search()with mocked DDGS - Task 6.2.13: Test error handling
- Task 6.2.14: Test query preprocessing
Activity 6.3: Integration Tests
File: tests/integration/test_web_search_integration.py
- Task 6.3.1: Test
SerperWebSearchToolwith real API (marked@pytest.mark.integration) - Task 6.3.2: Test
SearchXNGWebSearchToolwith real API (marked@pytest.mark.integration) - Task 6.3.3: Test
WebSearchToolwith real DuckDuckGo (marked@pytest.mark.integration) - Task 6.3.4: Test
create_web_search_tool()factory function - Task 6.3.5: Test SearchHandler with web search tool
Activity 6.4: Update Existing Tests
File: tests/unit/agents/test_retrieval_agent.py
- Task 6.4.1: Update tests to expect
list[Evidence]instead ofSearchResult - Task 6.4.2: Mock
WebSearchTool.search()to returnlist[Evidence]
File: tests/unit/tools/test_tool_executor.py
- Task 6.4.3: Verify tests still pass with updated
web_search_adapter
Implementation Order
- PROJECT 1: Vendor core components (foundation)
- PROJECT 3: Fix existing WebSearchTool (quick win, unblocks retrieval agent)
- PROJECT 2: Create SerperWebSearchTool (new functionality)
- PROJECT 4: Integrate into SearchHandler (main integration)
- PROJECT 5: Update callers (cleanup dependencies)
- PROJECT 6: Testing (validation)
Dependencies and Prerequisites
External Dependencies
aiohttp- Already in requirementsbeautifulsoup4- Already in requirementsduckduckgo-search- Already in requirementstenacity- Already in requirementsstructlog- Already in requirements
Internal Dependencies
src/tools/base.py- SearchTool protocolsrc/tools/rate_limiter.py- Rate limiting utilitiessrc/tools/query_utils.py- Query preprocessingsrc/utils/config.py- Settings and configurationsrc/utils/exceptions.py- Custom exceptionssrc/utils/models.py- Evidence, Citation models
Configuration Requirements
SERPER_API_KEY- For Serper providerSEARCHXNG_HOST- For SearchXNG providerWEB_SEARCH_PROVIDER- Environment variable (default: "duckduckgo")
Risk Assessment
High Risk
- Breaking changes to retrieval_agent.py: Must update carefully to handle
list[Evidence]instead ofSearchResult - Legacy folder dependencies: Need to ensure all code is properly vendored
Medium Risk
- Rate limiting: Serper API may have different limits than expected
- Error handling: Need to handle API failures gracefully
Low Risk
- Query preprocessing: May need adjustment for web search vs PubMed
- Testing: Integration tests require API keys
Success Criteria
- β
SerperWebSearchToolimplementsSearchToolprotocol correctly - β
WebSearchToolimplementsSearchToolprotocol correctly - β
Both tools can be added to
SearchHandler - β
web_search_adapterworks with new implementation - β
retrieval_agentworks with updatedWebSearchTool - β All unit tests pass
- β Integration tests pass (with API keys)
- β
No dependencies on
folder/tools/web_search.pyinsrc/code - β Configuration supports multiple providers
- β Error handling is robust
Notes
- The vendored code should be self-contained and not depend on
folder/modules - Filter agent functionality from original code is removed (can be added later if needed)
- Rate limiting follows the same pattern as PubMed tool
- Query preprocessing may need web-specific adjustments (less aggressive than PubMed)
- Consider adding relevance scoring in the future