We are seeking a skilled Web Scraper Developer to automate the extraction of event data from a list of specified websites on a daily basis. This position requires expertise in web scraping, data categorization, and database management. The ideal candidate will be responsible for developing and maintaining a robust scraper that collects comprehensive event details and other relevant data, ensuring high data integrity by removing outdated and duplicate entries. The collected data will be stored in an associated MongoDB database.
Responsibilities:
- Develop and maintain a web scraper that extracts event data from multiple websites daily.
- Implement logic to categorize events accurately based on predefined criteria.
- Store the extracted event details and related information in a MongoDB database.
- Ensure that the scraper handles dynamic content, such as JavaScript rendering, and handles pagination, if required.
- Remove outdated events from the database and ensure the integrity of data by identifying and eliminating duplicates from multiple sources.
- Regularly monitor the scraper to ensure it’s functioning efficiently, troubleshooting and fixing any issues that arise.
- Implement error handling and logging to capture and resolve scraping failures.
- Optimize the scraper for speed, efficiency, and scalability.
- Integrate with external APIs, if necessary, for additional data enrichment or verification.
- Ensure that all web scraping activities comply with legal and ethical standards, including respecting website terms of use.
- Work closely with the database team to ensure data is stored in a clean, organized, and accessible manner.
Requirements:
- 2+ years of experience in web scraping, preferably with event data extraction.
- Strong proficiency in Python or Node.js for web scraping tasks.
- Experience with web scraping frameworks like Scrapy, BeautifulSoup, Selenium, or Puppeteer.
- Solid understanding of handling and parsing HTML, CSS, and JavaScript content.
- Experience with MongoDB or other NoSQL databases for storing scraped data.
- Familiarity with APIs, both for integrating additional data and ensuring accurate categorization of events.
- Knowledge of data validation techniques to identify duplicate events and remove outdated ones.
- Familiarity with handling CAPTCHA, rate limiting, and anti-bot protections during scraping tasks.
- Experience in scheduling and automating web scraping tasks using tools like Cron, Celery, or task queues.
- Strong problem-solving skills and ability to handle large volumes of data.
- Ability to write clean, maintainable, and efficient code with proper error handling.
- Good understanding of data privacy and legal considerations when scraping websites.
Preferred Skills:
- Experience with cloud-based deployment platforms (AWS, Azure, Google Cloud).
- Familiarity with data enrichment tools and techniques.
- Experience with machine learning or AI techniques to improve categorization accuracy.
- Familiarity with version control tools like Git.
- Prior experience with event management platforms or similar industries.
Job Type: Full-time
Pay: Rs45,000.00 – Rs60,000.00 per month
Ability to commute/relocate:
- Islamabad G-15 Sector: Reliably commute or planning to relocate before starting work (Preferred)
Education:
- Bachelor’s (Preferred)
Experience:
- Working: 2 years (Preferred)
Language:
- English (Preferred)
Work Location: In person
Application Deadline: 03/02/2025