Alternative Assets • Specialty Financing • Asset Management
11 - 50
💰 $100M Venture Round on 2019-09
3 days ago
AWS
Cloud
Django
Docker
Google Cloud Platform
GraphQL
Jenkins
Kafka
Kubernetes
Microservices
Puppeteer
Python
RabbitMQ
Selenium
SQL
Go
Alternative Assets • Specialty Financing • Asset Management
11 - 50
💰 $100M Venture Round on 2019-09
•Intro description: Legalist is an institutional alternative asset management firm. Founded in 2016 and incubated at Y Combinator, the firm uses data-driven technology to invest in credit assets at scale. We are always looking for talented people to join our team. •Where You Come In: •Help to design and implement the architecture of a large-scale crawling system •Design, implement, and maintain various components of our data acquisition infrastructure (building new crawlers, maintain existing crawlers, data cleaners & loaders) •Work on developing tools to facilitate the scraping at scale, monitor the health of crawlers and ensure data quality of the scraped items. •Collaborate with our product and business teams to understand / anticipate requirements to strive for greater functionality and impact in our data gathering systems
•3+ Years experience with Python for data wrangling and cleaning •2+ Years experience with data crawling & scraping at scale (100+ spiders at least) •Productionized experience with Scrapy is mandatory. Distributed crawling and advanced scrapy experience are a plus. •Familiarity with scraping libraries and monitoring tools highly recommended (BeautifulSoup, Xpaths, Selenium, Puppeteer, Splash) •Familiarity with data pipelining to integrate scraped items into existing data pipelines. •Experience extracting data from multiple disparate sources including HTML, XML, REST, GraphQL, PDF, and spreadsheets. •Experience running, monitoring and maintaining a large set of broad crawlers (100+ spiders) •Sound Knowledge in bypassing Bot Detection Techniques •Experience using techniques to protect web scrapers against site ban, IP leak, browser crash, CAPTCHA and proxy failure. •Experience with cloud environments like GCP, AWS, as well as containerization tools like Docker and orchestration such as kubernetes or others. •Ability to maintain all aspects of a scraping pipeline end to end (building and maintaining spiers, avoiding bot prevention techniques, data cleaning and pipelining, monitoring spider health and performance). •OOP, SQL and Django ORM basics
Apply Now3 days ago
1001 - 5000
Engineer for Omnichannel Services at DriveTime, focusing on telephony technologies.
3 days ago
11 - 50
Seeking an AI expert to enhance user experiences through innovative projects.
3 days ago
2 - 10
Deploy, customize, and maintain tech solutions for healthcare organizations.
🇺🇸 United States – Remote
💵 $110k - $140k / year
💰 $2.7M Venture Round on 2022-10
⏰ Full Time
🟡 Mid-level
🟠 Senior