AI / ML Engineer · Based in Brazil — Open to relocation (Canada) Q2 2027

I build AI systems
that work in production.

Six years taking hard problems from whiteboard to deployment. I think in trade-offs, own the full stack, and tie every model decision to business outcomes.

6+
Years in ML
6+
Projects Shipped (2025 - 2026)
ML EngineeringEnd-to-end: architecture, training pipelines, deployment, monitoring.
NLP & LLM SystemsRAG, fine-tuning, prompt engineering — grounded in eval data.
MLOps & InfrastructureFeature stores, CI/CD for models, cloud-native at scale.
Applied ResearchI know when to use SOTA and when to ship something simpler.
What sets me apart — click to expand
01 I think in trade-offs
Every model decision has a cost. I document them, communicate them clearly, and own the consequences — latency vs. accuracy, cost vs. performance, complexity vs. maintainability.
02 Production-first mindset
A model that can't scale, monitor or fail gracefully is a prototype. I engineer for the real world from the first design decision — not as an afterthought.
03 Business-driven metrics
I tie model performance to business outcomes. The question is never just "what's the accuracy?" — it's "does moving this metric 5% actually change what matters to the company?"
04 Clear across every room
I can explain a transformer to a CTO and a deployment pipeline to a DevOps engineer — both precisely, neither condescendingly. Communication is part of the job.
Experience
2023 - 2026
Senior ML Engineer
Design and deployment of end-to-end AI systems across multiple industrial domains, including production, logistics, R&D, and commercial operations.

Led and mentored a small technical team, driving data infrastructure, model development, and system integration. Worked with high autonomy, directly supporting strategic decisions with the executive and IT leadership.

Delivered predictive and analytical systems that improved operational efficiency, identified production bottlenecks, and reduced hidden costs. Projects ranged from forecasting models to automation pipelines and internal web applications.

Strong focus on solving real-world problems with noisy and unstructured data in complex industrial environments.
2024 - 2026
ML Engineer
Founder of a B2B AI company focused on building intelligent systems and automation solutions for businesses.

Designed and developed AI-driven products and custom solutions, leveraging LLMs, predictive models, and computer vision. Built scalable APIs enabling seamless integration with client systems.

Worked across a wide range of industries, delivering tailored solutions that transform manual processes into automated, data-driven systems.

Strong emphasis on product development, system design, and applying AI to real business challenges.
2020 - 2023
LLM Engineer
Ingenium ARS
Developed early-stage AI solutions focused on conversational systems using public and self-hosted LLMs.

Built integrations and chat-based applications, enabling natural language interaction for real-world use cases.

This role marked the transition from theoretical knowledge to hands-on experience in applied AI, establishing the foundation for future work in machine learning and intelligent systems.
Technical Stack
Languages
PythonSQLBashJavaScript
ML / Frameworks
PyTorchHuggingFacescikit-learnPandasLangChainTensorFlowKerasFlask
MLOps / Cloud
Google CloudMLflowDockerKubernetesAirflow
Data
SparkMySQLSnowflakeKafkaQdrantRedis
What I'm looking for

Seeking a role where technical depth is valued, not just output speed. I thrive when there are real, difficult problems — not just dashboards to maintain or models to retrain on a schedule.

Ideally: a product-focused team in Canada, working on applied AI with meaningful scale. I want ownership and colleagues who hold each other to high standards.

Open to full-time or contract, remote or hybrid. Not the right fit for pure analyst roles or organizations where ML is still in the "exploration" phase with no deployment path.

01
Real-Time Raw Material Unloading Forecast
Manufacturing · 2025/2026 · Production
Live
+
Problem
Raw material unloading times were highly unpredictable due to variability in physicochemical properties (pH, Brix, humidity, density, viscosity) and inconsistent operational conditions. This lack of predictability led to inefficient production planning, frequent overtime, and poor scheduling decisions by the PCP team. Historical data was fragmented, partially manual, and often unreliable, making accurate forecasting difficult.
Solution
Developed a regression model using XGBoost to predict unloading duration based on material quality metrics, supplier patterns, and operational variables. Performed extensive data cleaning and standardization, consolidating fragmented records (including manual inputs) into a structured dataset. Evaluated multiple models (XGBoost vs Random Forest), selecting XGBoost for its superior performance and balance between accuracy and inference speed. The model was integrated into an internal system (custom-built), enabling real-time predictions with ~2.5s latency. Implemented an automated weekly retraining pipeline using predicted vs actual unloading data to continuously improve performance.
Stack
PythonXGBoostPandasScikit-learnGoogle-ColabREST IntegrationInternal System
Results
97%
reducing the average error in time forecasting
−overtime
Significant reduction in overtime (HH) through better scheduling decisions
25
Improved planning accuracy for ~25 truck unloadings/month
2.5s
average prediction latency
Kronos system
.
.
Engineering Thinking — decisions & trade-offs
Why XGBoost?
XGBoost was the best model based on r2, MDAE and accuracy.
Trade-off
Weekly retraining over online learning — more predictable, auditable, and easy to roll back. Online learning adds risk of concept drift poisoning at production scale.
Limitations
Struggles with coordinated fraud rings below individual thresholds. A graph-based layer would add ~15% recall — on the v2 roadmap.
[ System Architecture Diagram — to be added ]
02
Multi-Domain Predictive Intelligence System
Manufacturing - Logistic - Sales · 2025/2026 · Production
Live
+
Problem
Key business areas — logistics, sales, and production — relied on reactive decision-making due to lack of predictive visibility.

| Freight costs fluctuated without reliable forecasting

| Sales teams lacked visibility into customer repurchase timing

| Production planning suffered from inaccurate demand estimates

Data was fragmented across multiple systems, often inconsistent and partially unreliable, making unified forecasting extremely challenging.
Solution
Designed and implemented a multi-domain predictive system composed of independent ML models integrated into a unified dashboard platform.

| Logistics: regression model predicting freight cost per ton and variation based on distance, supplier, fuel, region, and cargo characteristics

| Sales: time-to-event model estimating next purchase window for each client based on historical consumption patterns

| Production: regression model forecasting monthly production volume by product and category

Built a Python backend (Flask) with batch pipelines running daily (overnight), integrating data from multiple sources (MySQL, PostgreSQL). Delivered insights through custom dashboards embedded in the company’s internal system.
Stack
PythonScikit-learnPandasFlaskMySQLPostgreSQLBatch PipelineCustom dashboard
Results
94%
accuracy in customer purchase prediction (sales)
91%
accuracy in production volume forecasting
85%
accuracy in freight cost prediction
⬇️
Reduction in operational inefficiencies (HH) across planning teams
⬇️
Improved inventory planning (less overstock / shortages)
⬆️
Better commercial timing → reduced margin loss
RAG Chat Interface
.
.
Engineering Thinking — decisions & trade-offs
WHY REGRESSION MODELS?
Tested multiple approaches and selected regression-based models due to strong performance across R², MAE, and stability in production. Prioritized consistency and interpretability over model complexity.
Trade-off
Chose batch processing (daily retraining/inference) over real-time systems to ensure data consistency and reduce operational risk. Real-time pipelines would increase complexity without proportional business value in this context.
SYSTEM COMPLEXITY
The main challenge was not modeling, but system integration:

| Multiple data sources across departments

| Inconsistent and unreliable data

| Strong dependency on domain knowledge from different teams

| Required cross-functional alignment and extensive data validation before modeling.
LIMITATIONS
| Model performance depends heavily on data quality and consistency

| Sales predictions are sensitive to unpredictable human behavior

| External factors (market shifts, pricing changes, logistics disruptions) are not fully captured
[ RAG Pipeline Diagram — to be added ]
03
Graph-Based Chemical Property Prediction System
R&D Laboratory FertMinas · 2024/2025 · Production
Shipped
+
Problem
Chemical formulation development required extensive laboratory testing to validate properties such as stability, solubility, and thermal behavior.

Testing cycles could take up to 3 weeks per formulation, with a high rate of failure due to unstable or non-viable compositions. This resulted in wasted time, inefficient experimentation, and excessive use of laboratory resources.

Additionally, internal datasets were fragmented, noisy, and partially unreliable, making predictive modeling particularly challenging.
Solution
Developed a graph-based machine learning system to predict key chemical properties from formulation inputs.Combined public molecular datasets (MoleculeNet, DeepChem) with internal laboratory data to train models capable of predicting:

| Chemical stability (scored output)

| Solubility (continuous value)

| Flash point (continuous value)

| Stability temperature range (min/max interval)

Used hybrid molecular representations:

| SMILES for capturing structural and spatial dependencies

| Molecular fingerprints for efficient feature encoding in less complex properties

The system evaluates candidate formulations and returns predictions in ~30 minutes, enabling pre-selection of high-probability candidates before lab validation.
Stack
PythonDeepChemGraph-Based ModelsRDKitSMILESMolecular fingerprintsMoleculeNet
Results
⏱️
Reduced formulation validation cycle from weeks → ~30 minutes (simulation)
⬇️
Significant reduction in unnecessary lab experiments
⬆️
Higher success rate by focusing on high-probability candidate formulations
Accelerated R&D decision-making process
Forecast Visualization
.
.
Engineering Thinking — decisions & trade-offs
WHY GRAPH-BASED MODELS?
Chemical properties are inherently dependent on molecular structure. Graph-based approaches allowed modeling relationships between atoms and bonds more effectively than traditional tabular methods.
Trade-off
Used different representations depending on the target:

| SMILES for properties sensitive to molecular structure and spatial relationships

| Fingerprints for faster computation in less complex predictions

This hybrid approach balanced performance and computational cost.
DATA CHALLENGE
The primary challenge was data quality and heterogeneity:

| Noisy and inconsistent internal datasets

| Need for domain-specific chemical understanding

| Integration of public and proprietary data sources

Significant preprocessing and validation were required before modeling.
Limitations
| Model accuracy decreases with increasing molecular complexity

| Performance depends on similarity between training data and new formulations

| External experimental conditions are not fully captured
[ Forecasting Pipeline Diagram — to be added ]
04
Automated Wildlife Image Classification Pipeline
Environmental R&D · 2026 · Production
In Progress
+
Problem
Wildlife monitoring relied on camera traps generating hundreds of images per field cycle. Image classification was performed manually, requiring human operators to:

| filter irrelevant images (no animal present)

| categorize by animal type (birds, insects, reptiles, etc.)

| identify species by scientific name

This process was time-consuming, error-prone, and not scalable, significantly slowing down environmental reporting and certification workflows.
Solution
Developed an automated computer vision pipeline to classify and organize wildlife images in multiple stages:

|Image filtering — detection of images containing animals vs empty frames

| Coarse classification — grouping into animal categories (birds, insects, reptiles, etc.)

| Fine classification — species-level identification (scientific naming)

The system automatically collects images from camera trap sources, processes them through the classification pipeline, and organizes outputs into structured folders in Google Drive for direct use by researchers.
Stack
PythonTensorFlowPyTorchOpenCVGoogle Drive APICNN ModelsImage Processing Pipelines
Results
⬇️
Eliminated manual filtering of hundreds of images per cycle
⬆️
Significant reduction in classification time
⬆️
Faster environmental reporting and certification processes
📂
Fully automated organization of image datasets
Forecast Visualization
🖼
[ Charts — to be added ]
Engineering Thinking — decisions & trade-offs
MULTI-STAGE PIPELINE
Instead of a single model, implemented a staged approach:

| Binary classification (animal vs no animal)

| Category classification

| Species-level classification

This improved accuracy and reduced model complexity at each stage.
Trade-off
Chose a modular pipeline over a single end-to-end model to increase maintainability and allow independent improvements per stage.
DATA CHALLENGE
| High variability in image quality (lighting, motion blur, occlusion)

| Imbalanced dataset across species

| Need for consistent labeling for scientific classification
Limitations
| Performance depends on image quality and visibility of the animal

| Rare species may have lower classification accuracy

| Similar-looking species can introduce classification ambiguity
[ Forecasting Pipeline Diagram — to be added ]
05
LLM-Powered Membership Automation System (WhatsApp)
Business Automation Uirapuru · 2026 · Production
Shipped
+
Problem
The membership onboarding and renewal process was entirely manual and managed through WhatsApp and spreadsheets.The workflow involved:

| manually collecting student data via chat

| recording information in spreadsheets

| tracking payments communicated informally by the club

| sending renewal messages individually or in bulk

As the number of members grew, the process became disorganized, error-prone, and difficult to scale.
Solution
Developed an automated membership management system powered by LLMs and messaging APIs. Integrated:

| Meta WhatsApp API for communication

| Google Gemini (LLM) for natural language interaction

| Python backend for orchestration and data handling

| Google Sheets as a lightweight database

The system:

| automatically collects and structures user data via WhatsApp

| manages membership status and payment tracking

| sends automated renewal notifications

| maintains synchronized records without manual input
Stack
PythonGoogle GeminiMeta WhatsApp APIGoogle Sheets APIAutomation Pipelines
Results
⬇️
Eliminated manual data entry via WhatsApp
⬆️
Organized and centralized membership management
⬆️
Improved response time and communication consistency
⬇️
Reduced operational overhead for client
Forecast Visualization
.
.
.
Engineering Thinking — decisions & trade-offs
WHY LLM?
Used an LLM (Gemini) to handle unstructured WhatsApp conversations, enabling flexible and natural user interaction without rigid input formats.
LIGHTWEIGHT ARCHITECTURE
Chose Google Sheets as a backend datastore instead of a full database to reduce complexity and deployment overhead, given the scale of the operation.
Trade-off
Prioritized simplicity and maintainability over building a complex system. A more robust architecture (e.g., full DB + dashboard) could improve scalability but was unnecessary for the use case.
Limitations
| Dependent on WhatsApp API reliability

| Limited scalability compared to full database systems
[ Forecasting Pipeline Diagram — to be added ]
06
AI-Powered Meeting Intelligence System (Speech-to-Insight)
Enterprise Productivity FertMinas · 2025 · Production
Shipped
+
Problem
Important information discussed in meetings was often lost or poorly documented. Manual note-taking was inconsistent and inefficient, while participants had no easy way to:

| revisit key decisions

| extract insights

| search past discussions

This resulted in knowledge loss and reduced operational efficiency.
Solution
Developed an end-to-end AI system to capture, transcribe, summarize, and query meeting content.The system:

1 - Captures audio streams
| system audio (other participants)
| microphone audio (local user)

2 - Processes audio in backend
| speech-to-text conversion
| text normalization and structuring

3 - Applies LLM processing
| generates structured summaries
| extracts key topics and decisions

4 - Enables conversational access
| users can query past meetings via an LLM-powered chat interface

Fully integrated into the company’s internal system, allowing seamless access to meeting insights.
Stack
PythonSpeech-to-text modelLocal LLM IntegrationBackend Processing Pipelines
Results
⬇️
Eliminated need for manual note-taking
⬆️
Improved access to meeting knowledge and decisions
⬆️
Faster retrieval of information via conversational queries
⬆️
Better documentation consistency across teams
Forecast Visualization
.
Engineering Thinking — decisions & trade-offs
MULTI-AUDIO CAPTURE
Captured both system audio and microphone input to ensure complete meeting context, avoiding partial transcription issues.
LLM AS INTERFACE LAYER
Used LLM not only for summarization, but as a query interface, enabling natural language interaction with meeting data.
Trade-off
Chose batch post-processing over real-time summarization to ensure transcription quality and reduce computational cost.
Limitations
| Dependent on audio quality for accurate transcription

| Background noise and overlapping speech may reduce accuracy
| LLM summaries may omit low-relevance details
[ Forecasting Pipeline Diagram — to be added ]