I am a researcher conducting a large-scale literature review based on academic papers (PDFs).
I already have a predefined set of research questions. I am not looking for AI to write the review. I am looking to build a robust, reproducible system that:
• Applies my question set to each paper
• Extracts answers directly from the documents
• Provides supporting real page references
• Flags missing or ambiguous information
• Outputs everything into a structured master Excel file
This system will function as an additional analytical layer — an “extra pair of eyes” — to support, our review process.
Deliverables
The final output must be a master Excel file with a structured format. In addition, I will also need:
• Clean, documented source code
• Clear instructions to re-run the pipeline
• Ability to add new PDFs and re-run
• Ability to modify or add questions
You should be comfortable with:
• Scientific PDF i
• Extracting data from tables inside PDF
• LLM outputs with enforced JSON schema
• Hallucination mitigation strategies
• Citation grounding at chunk level
Python preferred.(others are also ok)
To Apply, Please include:
1. A description of a similar system you built.
2. Your proposed architecture and tool stack.
3. How you will prevent hallucinations.
4. Estimated timeline