
10,000+ employees
Founded 1971
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
Artificial Intelligence • B2B • SaaS
Grupo Protege is an AI training data platform that connects AI developers with high-quality, ethically sourced training data. It serves both AI developers by providing a vast and rich collection of data for model training and data holders by enabling them to monetize their data while maintaining governance and control. The platform aims to streamline the data procurement process significantly, making it easier for developers to access the data they need efficiently.
🕒 May 13
Improve your chances of getting an interview by checking your resume score before you apply.

10,000+ employees
Founded 1971
🤖 Artificial Intelligence
🤝 B2B
☁️ SaaS
Artificial Intelligence • B2B • SaaS
Grupo Protege is an AI training data platform that connects AI developers with high-quality, ethically sourced training data. It serves both AI developers by providing a vast and rich collection of data for model training and data holders by enabling them to monetize their data while maintaining governance and control. The platform aims to streamline the data procurement process significantly, making it easier for developers to access the data they need efficiently.
• Help design, construct, and validate complex healthcare data cohorts used for AI model training • Act as a technical partner for complex data problems, including cohort construction and data validation • Translate research and customer requirements into practical dataset definitions • Build SQL and analysis needed to create datasets • Collaborate with delivery engineers to implement solutions requiring data pipeline changes • Validate datasets for quality and acceptance criteria • Work with AI researchers to translate goals into practical dataset specifications • Analyze partner datasets for schema understanding and data quality
• Experience working with large structured healthcare datasets • Strong SQL and python skills and experience writing complex queries • Experience using Claude Code / Codex • Experience joining and transforming large datasets • Experience performing data validation and exploratory analysis • Strong Python skills for data analysis and scripting • Experience working with structured file formats (CSV, Parquet, etc.) • Ability to translate ambiguous requirements into concrete data logic • Strong communication skills and ability to collaborate with technical and non-technical stakeholders
• Health insurance • Professional development opportunities
Apply Now🕒 May 11
Data Scientist focusing on Generative AI for process automation in a modular program. Collaborates across technical and business teams in a remote work setting.
🗣️🇧🇷🇵🇹 Portuguese Required
🕒 May 8
Data Scientist role at Omie focusing on AI applications for commercial efficiency and lead generation. Transforming business problems into practical AI solutions.
🗣️🇧🇷🇵🇹 Portuguese Required
🕒 May 7
Data Scientist developing cutting-edge AI/machine learning models at Trustly for risk management. Collaborating with ML engineers and using advanced data analysis to optimize payment strategies.
🗣️🇧🇷🇵🇹 Portuguese Required
🕒 May 6
Senior Data Scientist at Clearco, shaping data science and machine learning models for eCommerce funding decisions. Partnering across teams to turn ambiguous problems into production-grade solutions.
🕒 May 6
Lead Data Science solutions and architect Machine Learning at Afya. Drive strategic decisions for digital health products across teams in a remote setup.
🗣️🇧🇷🇵🇹 Portuguese Required