Measuring embedding drift in unstructured data is a complex task due to its inherent characteristics, making traditional measures from structured data unsuitable. Approaches are needed to comprehend the alterations in relationships within unstructured data itself. Detecting unstructured drift aims to determine if two datasets are dissimilar and provide methods to understand the reasons behind their differences. Teams often encounter various image data issues, including quality problems and unexpected objects not part of the original training set. Text drift poses significant challenges for natural language processing models due to changes in terminology, context, or meaning over time, low-resource languages, and cultural speech gaps. These issues can lead to reduced model performance when encountering new, unseen data.
The text highlights the progress made in machine learning (ML) in 2022, including significant advancements in generative AI, robotics transformers, and genome studies. However, it also notes that many ML teams are struggling to keep up with the rapid pace of innovation, citing issues such as model bias, lack of diversity in hiring and ethics, and inadequate monitoring tools. The report card on last year's predictions shows mixed results, with some predictions proving true (AI fairness getting worse before better, ML infrastructure ecosystem becoming more crowded and complex, ML engineering jobs outpacing available talent) and others being partially credited or false (enterprises shipping AI blind, the citizen data scientist rising). Looking ahead to 2023, the text predicts that generative AI will become mainstream but also faces growing pains, economic uncertainty will impact the ML infrastructure market, best-of-breed platforms will chip away at legacy players, and working with unstructured data will no longer be optional. Overall, while there are challenges ahead, the future of AI and ML teams holds promise for growth and improvement.
Generative AI is a rapidly evolving technology that has the potential to revolutionize various industries by producing content from text prompts. It is distinct from previous digital transformations due to its accelerating adoption, technical depth, and ability to use pre-trained models as a starting point for innovation. However, businesses should be aware of potential issues such as outages, biases, and the need for human verification. Proper governance and legal certainty are also crucial for the maturation of generative AI. Enterprises should closely monitor new developments in this field to leverage its benefits effectively.
Hugging Face and Arize are partnering to democratize state-of-the-art machine learning by providing a platform for organizations to train unstructured models, monitor their performance, and troubleshoot issues in production. Hugging Face offers a community-driven hub with pre-trained models and datasets, while Arize provides an ML observability platform that enables teams to log models with structured and unstructured data, detect and root cause model performance issues faster, and visualize high-dimensional data using interactive UMAP visualization. Together, they aim to improve the transparency and accountability of AI systems, making them more explainable and trustworthy. By leveraging Hugging Face's Hub and Arize's platform, teams can identify problems with their models and datasets, make public changes, and contribute to a better understanding of AI in various industries.
Arize's platform now supports custom metrics, enabling businesses to tailor any metric to their ML monitoring needs, automate AI ROI calculations, and reduce overall costs. The current challenge in calculating AI ROI is that it's often bespoke, nuanced, and complex for businesses, with 54% of data scientists and ML engineers reporting difficulties in quantifying the ROI of ML initiatives. However, custom metrics can provide a holistic view of all model inference data and performance, enabling real-time views of AI ROI with automatic calculations based on key performance metrics defined by the user. This feature enables teams to prioritize investments, reduce spending, improve accuracy, and inform stakeholders with flexible and shareable dashboards.
BentoML and Arize AI have partnered to streamline the MLOps toolchain, allowing teams to build, ship, and maintain business-critical models more efficiently. Leveraging Bento's ML service platform, users can easily turn ML models into production-worthy prediction services. Once in production, Arize's ML observability platform provides visibility to keep models performing well. The partnership helps accelerate training to deployment cycles, builds reliable and scalable models, and scales ML infrastructure as needed. It also addresses the challenges of real-world data dynamics, such as feature drift and distribution changes, by providing monitoring and troubleshooting capabilities through Arize AI's platform. By integrating BentoML and Arize AI, users can enable ML observability, monitor model performance, detect drift and data quality issues, and troubleshoot problems in real-time to improve overall model performance.
Recommendation systems are widely used across industries to provide relevant recommendations based on user preferences. These systems use various methods such as content-based filtering, collaborative filtering, popularity-based, and hybrid approaches. Monitoring recommendation models is crucial once they are in production, as issues can arise due to constant data changes, model decay, or other factors that may impact business results. ML observability helps teams proactively monitor, investigate, and improve the performance of recommendation systems in production by detecting major issues early and ensuring optimal customer experiences.
:` Spark New Zealand, a telecommunications company, has been leveraging machine learning (ML) to improve its customer experience, business processes, and overall competitiveness in the market. The company's ML team, led by Habib Baluwala and Aadil Dowlut, has developed over 50 models in production, with a focus on improving marketing efficiency, predicting churn, and optimizing business processes. Spark New Zealand uses an ML observability platform like Arize to monitor model performance, detect drift, and provide insights into fairness and bias. The company's approach to building versus buying an ML observability solution was initially considered internally but ultimately led to the adoption of a commercial platform due to its ease of use, speed, and reliability. With Arize, Spark New Zealand has been able to speed up time-to-resolution for model performance issues and improve model improvement by better understanding and addressing customer behavior. The company's leadership prioritizes ML observability as a core part of its business, and the MLOps pipeline is at the heart of model development and automation. For those entering their first ML or data science role, Habib Baluwala advises taking time to understand the fundamentals of machine learning, understanding the business problem first, and not skipping any steps in the data science process.
Xander Song, Arize's new Developer Advocate, brings an interdisciplinary background and experience as a machine learning engineer at Test.ai to his role. He is interested in early-stage companies at the intersection of AI and ops and aims to shape processes, techniques, and best practices that define a developer's workflow. As a former researcher who transitioned into machine learning engineering, Song understands the importance of innovation in modern AI and ML, including the development of tools that support model performance. At Arize, he will focus on evangelism and advocacy for machine learning observability, emphasizing its role in improving quality of life for MLEs and data scientists while potentially making a difference between enterprise success and failure. Song brings a customer-obsessed mindset to his new role, having worked with startups and teams that prioritize user experience and value proposition.
This blog post focuses on developing an image classification model using the Fashion MNIST dataset and monitoring its performance over time by analyzing embedding vectors associated with input images. The authors provide a step-by-step guide to automatically surface and troubleshoot the reason for performance degradation, including data preprocessing, training, extracting image vectors and predictions, logging inferences into the Arize Platform, and preparing data to be sent for monitoring. The post highlights the importance of monitoring model performance, especially in industries like healthcare or self-driving cars where safety is paramount, and provides a robust and automated way to stay on top of model performance using tools like Arize.
Shafiq Shivji, Group Product Marketing Manager at mParticle, discusses the importance of real-time data pipelines in an interview. He explains that a customer data platform (CDP) like mParticle simplifies data ingestion, unification, and activation by providing easy ways to manage data pipelines and stream data to downstream destinations. CDPs enable real-time personalization use cases without requiring heavy engineering lifts to instrument and maintain data pipelines. Common use cases include marketing, product management, analytics, growth, and ML modeling. Shivji highlights the difficulty of building data-quality pipelines due to constant change in technology and business realities, and emphasizes the importance of personalization for enhancing user experience and driving sales, loyalty, brand, and customer satisfaction.
Monitoring ranking models is crucial for ensuring high-quality recommendations and maintaining customer satisfaction. Poorly performing ranking models can lead to decreased revenue, increased churn, and reduced user engagement. To monitor these models effectively, it's essential to use rank-aware evaluation metrics such as Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (nDCG). These metrics help gauge the relevancy of predictions and their order. By leveraging machine learning observability, companies can proactively identify performance degradation, uncover the worst-performing features and slices, and quickly root cause model issues to improve overall ranking model performance.
ML observability is an essential part of the MLOps toolchain that helps teams automatically surface and resolve model performance problems before they negatively impact business results. It enables retraining workflows by tracking prediction drift, concept drift, and data/feature drift to know immediately if a model is drifting due to changes between the current and reference distributions. This allows for more efficient model updates and minimizes the risk of introducing new biases or issues. Model version control provides side-by-side analysis of how each version of a model performs, enabling teams to evaluate the efficacy of their optimizations and retraining efforts. Deprecating models is crucial to prevent regression errors and ensure reliable ML environments in production. Fairness checks and bias tracing are critical for determining whether models are exhibiting algorithmic bias, while data labeling can help detect changes in new patterns that emerge in unstructured data. By implementing ML observability best practices, teams can ensure a solid foundation for future success in MLOps.
Arize AI recently hosted an event with the Canada chapter of Women of AI, a global nonprofit dedicated to increasing female representation in AI and machine learning. The panel discussion covered various topics, including personal journeys into the field, advice for women aspiring to pursue careers in AI, and the importance of networking and self-preparation. Panelists discussed their diverse paths into AI, from software engineering and astronomy to postdoc research and recruiting. They emphasized the need for passion, preparation, and a willingness to take calculated risks in order to succeed in the field. The discussion also touched on issues such as pay equity, talent retention, and the differences between big tech companies and startups. Overall, the event aimed to provide valuable insights and advice for women looking to break into the AI industry.
AUC (Area Under the Precision-Recall Curve) is a metric used to evaluate the performance of a classifier, particularly useful for imbalanced datasets where the positive class is rare. It measures the degree of separation between positive and negative classes based on their prediction scores. A perfect PR curve would have an AUC of 1, indicating ideal performance. However, PR AUC fails if teams misclassify the positive class or don't consider True Negatives equally with False Positives. In contrast to ROC AUC, which is more suitable for balanced datasets, PR AUC emphasizes precision and recall on the positive class, making it a better choice for tasks like disease diagnosis or fraud detection where identifying minority events is crucial. Understanding the tradeoffs of different metrics is essential when optimizing model performance.
Arize AI and OpenAI are collaborating to help organizations build and deploy unstructured models, such as natural language processing (NLP) models, more efficiently. Unstructured data is a significant challenge for deep learning, requiring human labeling or annotation to group the data and find trends and insights. OpenAI provides AI building blocks like GPT-3, Codex, and DALL-E, which can generate text or code completion, while Arize is an ML observability platform that enables teams to log structured and unstructured data to detect and resolve model performance issues faster. By combining OpenAI's generative models with Arize's logging capabilities, organizations can monitor and troubleshoot their unstructured models in production, reducing costs and maximizing performance. The collaboration allows teams to proactively identify when their data is drifting and troubleshoot using interactive visualizations like UMAP.
A centralized machine learning (ML) team's purpose is to provide a unified and standardized experience for ML application development, freeing data scientists from creating new tools and processes from scratch. However, debates on ideal team structures are heating up, with some arguing that centralized ML teams are falling out of favor due to the emergence of MLOps and decentralized ML approaches. The author, who has built and scaled ML teams, argues that central ML can still be effective if done right, emphasizing the importance of a hybrid organizational structure, preemptive tooling development, and overcoming common challenges such as tooling lock, getting projects on the roadmap, and fence creation. Successful central ML teams share characteristics like having a centralized component and individual engineers on customer teams, building preemptively, and understanding customer pain points. Ultimately, the key to successful central ML lies in organizational culture, mindset, and how you build matters, with a focus on automation, standardization, and collaboration between subject matter experts.
Arize AI co-founder Aparna Dhinakaran and Monte Carlo CTO Lior Gavish discussed the evolving relationship between data and machine learning (ML) infrastructure. They highlighted key differences between data observability and ML observability, emphasizing that both are necessary for modern data practices. Observability goes beyond monitoring by enabling teams to understand why problems occur and how to resolve them. Building trust in data and ML requires investing in systems that help resolve issues faster. Treating data and ML as real-time products can lead to better value extraction. Service-Level Agreements (SLAs) and reliability benchmarks are becoming more commonplace in the world of data and ML, while troubleshooting will likely become easier over time.
This guide covers how to ingest embedding data and analyze embedding drift for a sentiment classification model using Hugging Face's open source libraries and the Arize platform. The process involves downloading and preprocessing data, training a model, extracting embedding vectors and predictions, logging inferences into the Arize Platform, and preparing data for sending to Arize. The guide also explains how to confirm data is ingested into Arize, track embedding drift, and visualize data using Uniform Manifold Approximation and Projection (UMAP) visualization. By following this guide, teams can monitor their models in production, detect potential performance degradation, and take corrective actions to improve the model's performance.
Arize AI, a machine learning observability platform, has secured $38 million in Series B funding led by TCV with participation from existing investors Battery Ventures, Foundation Capital, and Swift Ventures. The investment marks the largest-ever for an ML observability platform. Founded in 2020, Arize aims to make AI work effectively and fairly. As machine learning models become more complex and widespread, detecting and troubleshooting issues becomes harder. Arize's ML observability platform helps streamline performance monitoring, drift detection, data quality checks, and model validation. The company has also introduced new features such as embedding analysis, embedding drift monitoring, and bias tracing to address these challenges. Arize plans to expand its team across various departments and is currently recruiting for roles in engineering, product, marketing, and sales.
Artificial intelligence is revolutionizing healthcare with AI-focused startups raising over $12 billion last year. However, challenges persist in implementing AI at scale within the industry. Arize, an ML observability platform, recently received certifications from an independent auditor validating its health information security program's compliance with HIPAA Security Rule and the Health Information Technology for Economic and Clinical Health (HITECH) Act. These healthcare-specific certifications supplement Arize's broader SOC 2 Type II compliance. Arize is committed to safeguarding data, especially in healthcare, by focusing on auditability, prevention, and preparedness.
Sid Roy, Manager of Machine Learning Engineering at Devron, explains the concept of federated learning - a machine learning approach that enables training models on inaccessible data while preserving privacy. This technology is particularly useful for situations where companies want to access certain data but cannot due to privacy, regulatory, or jurisdictional reasons. Devron's platform allows data scientists to build, train, and evaluate machine learning models without ever having access to the data, making it a valuable tool in industries with strict privacy regulations. Roy also discusses the potential for federated learning applications in academia and how this technology can mitigate bias in models by improving the variety of data fed into them.
Arize has announced the next generation of machine learning (ML) monitoring to help teams scale their ML needs with increased automation, customizability, and flexibility. As AI technology evolves and matures across all industries, there is a growing need for more advanced ML monitoring systems that can accommodate various use cases and scale. Arize's new platform focuses on three major principles: automation with flexibility, programmatic monitoring access, and native alerting integrations. These features aim to make model monitoring automated and seamless, enabling users to identify and resolve issues faster.
Ray and Arize AI are two technologies that can help streamline the process of productionizing machine learning projects for scale and usability. Ray is an open-source distributed compute framework that enables users to run Python code in a parallel fashion across multiple machines, allowing them to focus on building their ML use case without getting sidetracked by managing distributed technologies. Arize AI is an ML observability platform that helps practitioners tackle issues such as model performance degradation, data drift, and data quality problems. Together, Ray and Arize can help teams scale the infrastructure around ML models while also improving team capabilities and allowing more time to be spent on building newer, better models for the business.
Tying model metrics to business KPIs upfront is paramount for ensuring alignment between ML and product teams. Investing all the way through the ML lifecycle is critical to ensuring AI ROI, as it requires planning for design aspects, human computer interaction, hypothesis development, and eventual retirement of models. Threading the needle with a centralized ML approach can be worth it, especially when blending product focus with broader technical breakthroughs. Assessing talent involves simulating real-world problems, such as giving candidates modeling tasks that tackle actual business needs, to understand how they think and approach challenges. By implementing these best practices, ML leaders can ensure a good foundation for future success in the rapidly evolving AI landscape.
Michael Stefferson, a Staff Machine Learning Engineer at Cerebral, discusses his transition from academia to industry and the key skills he developed along the way. He emphasizes the importance of understanding metrics, learning industry-specific systems, and being able to communicate effectively with both technical and non-technical stakeholders. Stefferson also shares insights on best practices for ensuring a model is ready for production, including establishing monitoring metrics and having contingency plans in place. Additionally, he highlights the challenges of working remotely and the importance of trust among team members.
Liftoff is a mobile app optimization platform for marketing and monetizing apps at scale. The company's main mission is to help mobile apps grow and monetize. Machine Learning Engineer Yunshi Zhao discusses her role in training models, deploying them in production, and monitoring their performance. She emphasizes the importance of scalability in data processing and highlights some challenges faced by Liftoff's system built for a specific use case. The company is currently investigating more standardized tools to improve flexibility and applicability across various ML applications. Zhao shares best practices for model experimentation, training, and deployment, as well as the importance of monitoring models in production and addressing feedback loops in ad tech space. She also discusses her involvement in Liftoff's diversity, equity, and inclusion (DEI) committee, focusing on representation in engineering.
Arize AI has hired Claire Longo as its new Customer Success Lead. Longo, a data scientist by training, has experience leading ML engineering and data science teams at Opendoor and Twilio. In her role at Arize, she will focus on ensuring customer success through hands-on support or educational resources. She believes that machine learning observability is crucial for maintaining the quality of models in production and preventing issues from reaching end users. Longo has also developed a library of metrics for recommendation systems to help measure personalization performance.
Three Pitfalls To Avoid With Embeddings` Embeddings are not static and require monitoring to ensure they continue to be meaningful over time. This includes tracking the loss of meaning as new concepts emerge in the real world, which can lead to a non-trivial problem. Monitoring involves setting up a point of comparison with the initial trained embedding and tracking metrics such as average distance between cluster centroids. Proper versioning of embeddings is essential to avoid heartache during iteration on code, while graphing techniques can provide an understanding of how well the embedding performs. Once in production, appropriate monitoring techniques are necessary to ensure consistent value for customers.
Arize AI's Senior Solutions Architect, Suresh Vadakath, brings over a decade of experience in consulting and technical client-facing roles from companies like Dataiku, DataRobot, and Alteryx. He focuses on presenting Arize's platform and ideating on ML observability examples and integrations into the customer's environment. Suresh emphasizes the importance of ML observability for risk management purposes and ensuring timely resolution of issues in high value use cases with growing prediction volumes and large model portfolios. He highlights that a good ML observability platform should provide insights for scenarios involving structured and unstructured data elements in a centralized place. Suresh has observed unique challenges facing financial services companies deploying models into production, such as skepticism from users, data appropriateness and preparation, and regulatory scrutiny on fairness and bias. He believes that communication is key when working with ML observability systems due to the constant gray areas involved.
Dimension reduction techniques are crucial in data science for visualization and pre-processing in machine learning. Three popular dimensionality reduction techniques are SNE (Stochastic Neighbor Embedding), t-SNE (t-distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold Approximation and Projection). These neighbor graph algorithms follow a similar process, starting with computing high-dimensional probabilities p, then low-dimensional probabilities q. The cost function C(p,q) is calculated by comparing the differences between probabilities, which is then minimized to obtain human-interpretable information from the embedding space.
From Physicist to Machine Learning Engineer Justin Chen transitioned from academia to machine learning after receiving his PhD in Physics from Rice University. He initially struggled to showcase his skills to the industry, but shifted his focus to highlighting mathematical methods and coding work. Chen emphasizes the importance of being able to communicate code with others as a key skill for success in machine learning engineering. He worked on various projects at Manifold AI, including end-to-end ML pipelines, and developed expertise in handling sensitive data in the healthcare space. Chen advises starting with basic models, focusing on relevant metrics, and identifying subject matter experts to narrow down feature sets. In his current role at Google, he focuses on speech recognition and audio processing, using techniques like NLP and explainability to address challenges. Chen stresses the importance of monitoring models regularly, having performance metrics, and having a human in the loop to detect bias and improve fairness. He also notes that working for a startup can provide excitement and flexibility, but may not offer the same level of resources or growth opportunities as a bigger company.
Khyati Sundaram, CEO and Chairperson of Applied, is on a mission to improve diversity in hiring by leveraging lessons from behavioral science and technology. The company's platform aims to provide unbiased hiring solutions, focusing on skills-based testing and decision intelligence systems. By removing noise from resumes and using machine learning to optimize the hiring funnel, Applied seeks to empower humans to make fairer decisions at scale. However, Sundaram acknowledges that biases can occur in various stages of the ML model lifecycle and emphasizes the need for full testing in real-world environments. To address this, Applied is working on optimizing human judgment and machine learning data, aiming to create a platform where everyone cares deeply about quality matches and knows what high ROI looks like. Ultimately, Sundaram's goal is to build a society-wide expression of inclusivity through her company's innovative solutions.
Arize:Observe Unstructured, the first summit dedicated to unstructured data initiatives, recently concluded with nearly 500 technical leaders and practitioners in attendance. The event highlighted four key takeaways: monitoring of unstructured data has arrived; what works in training may not work in production when deciding what to label next; the rise of single, unified models will change MLOps; and cutting-edge machine learning is becoming more accessible. Arize's support for embedding analysis and drift monitoring is now available as part of its free subscription tier, enabling teams to log models with both structured and unstructured data for monitoring purposes.
Malav Shah, a Data Scientist II at DIRECTV, has an interesting career journey in machine learning (ML). He initially studied information technology but developed an interest in AI during his undergraduate years. After completing his Master's degree in Computer Science with a specialization in ML from Georgia Tech, he joined AT&T and later moved to DIRECTV. At DIRECTV, Malav applies modern ML techniques to deliver innovative entertainment experiences. The company's ML organization is structured as a center of excellence responsible for solving problems and developing solutions for stakeholders while defining the infrastructure that these teams will use. Key areas of focus include content intelligence, recommendation engines, computer vision, natural language processing (NLP), and monitoring model performance in real-time to address concept drift issues. Malav advises new data scientists to focus on understanding the underlying data and business impact rather than obsessing over perfect metric scores right away. He also highlights the evolving MLOps and ML infrastructure space as an exciting era for machine learning innovation.
Arize has appointed Matt Wilson as its new Head of Sales, bringing a decade of experience working with large enterprises and establishing product-led growth motions to the role. Wilson joins Arize from Pendo, where he was an early sales hire and most recently served as RVP of Enterprise Sales, helping the company achieve over $100 million in revenue and a valuation of $2.6 billion last year. As Head of Sales at Arize, Wilson aims to accelerate and grow the closed pipeline, increasing revenue through acquisition, retention, and net renewals, while also hiring and growing a world-class sales organization. Wilson believes that machine learning observability is crucial for businesses, having seen firsthand the impact it can have on productivity and financial losses. He also emphasizes the importance of leading with positive intent, creating a culture of collaboration, and being open to feedback and ideas from his team. With experience scaling Pendo and now joining Arize, Wilson sees parallels between the two companies in terms of their focus on product-led growth and delivering value through their products.
Chris Murphy, Senior Director and Data Scientist at Homepoint, discusses how AI plays an important role in their mission of supporting successful homeownership. He shares his background in physics and transition into financial services, emphasizing the transferable skills in modeling techniques and approaching challenges. Homepoint's machine learning use cases span various areas of the business, including operations optimization, predicting refinancing or delinquency rates, outlier detection, text reading, optical character recognition (OCR), and infrastructure building. To ensure success when applying state-of-the-art ML techniques into established processes, Homepoint's data science team focuses on setting up the right processes from the beginning and maintaining constant communication with business partners and operations teams. They use a variety of explainability and bias tracing techniques to ensure fairness across the board.
The modern machine learning (ML) pipeline relies heavily on big data, with applications such as Mobileye's self-driving car efforts processing over 200 petabytes of data or tens of billions of inferences per day. Kafka is a widely used pub/sub framework that powers event-driven pipelines, offering benefits like asynchronous processing, scalability, and reliability. To monitor ML models, Kafka messages can be ingested into the Arize platform using a simple consumer that consumes micro-batches of events, deserializes them, batches them together, and publishes them to Arize for real-time observation. Arize is built to scale, providing easy ways to ingest data, including Kafka event streams, and unlocking ML performance tracing once ground truths are received, which enables improving model performance by understanding the why and how of models.
Observability is crucial for modern software systems as they can be complex and prone to errors due to billions of lines of code and the integration of data and machine learning models, making it impossible to guarantee flawless performance. Automating observability allows for timely and actionable information about a system's performance, similar to NASA's mission control during the Apollo missions. Different components of modern software systems, including infrastructure, data, and machine learning models, require unique approaches to observability.
Stefano Goria, Co-Founder and CTO of Thymia, discusses the company's mission to improve mental health assessments through a combination of video games based on neuropsychology, facial microexpression analysis, and speech pattern examination. Founded in 2020 amidst the growing global mental health crisis exacerbated by COVID-19, Thymia aims to provide clinicians with objective measures for diagnosing mental health issues. Goria brings his expertise in theoretical physics and machine learning from previous roles at Citi and J.P. Morgan to develop AI systems that underpin Thymia's end-to-end solution. The company focuses on addressing the subjectivity of mental health care by improving assessment quality, particularly for symptoms relevant to depression diagnosis. Thymia collects data from three main sources: video, speech, and behavioral patterns during gameplay. These diverse data types push AI models to their limits, requiring a combination of techniques such as deep neural nets, feature engineering, unsupervised learning, and reinforcement learning. The company also emphasizes ethical use of technology, informed data usage, and transparency in communication with patients.
The company Arize has released a beta version of its embedding drift monitoring and analysis product, which is designed to help machine learning teams troubleshoot models and data that contain unstructured data. The product addresses key challenges such as lack of visibility into what's happening to the data when it's put into production, expensive model training, and difficulty in identifying new patterns emerging from unstructured data. With this release, teams can log models with both structured and unstructured data to Arize for monitoring, enabling them to proactively identify drift and troubleshoot issues using interactive visualizations. The product aims to provide actionable insights to help ML teams improve their models and data, and is designed to work with a wide range of deep learning models and architectures.
Unity Software recently revealed that it missed top line expectations due to issues related to machine learning models, causing an estimated impact of approximately $110 million in 2022. This highlights the growing need for companies to better manage AI risk from both organizational and technical perspectives. There are four steps enterprises can take to prevent common issues with ML before they materially impact revenue: (1) know what can go wrong, (2) implement ML observability, (3) invest in the right people, and (4) ensure ML teams are close to the businesses they serve. By adopting these best practices, companies can better manage AI risk and avoid potential pitfalls.
Embeddings are compact representations of high-dimensional data that help in explaining complex relationships between different inputs. They play a crucial role in machine learning, particularly in reducing the dimensionality of input features and facilitating collaboration across teams. Despite their potential to simplify data, embeddings can still be challenging to understand without additional techniques like UMAP.
Arize, an ML observability platform, has introduced its Trust Center and Security Periodic Table as part of its commitment to robust security, compliance, and privacy. The company recently achieved SOC 2 Type II certification and is pursuing other major industry certifications such as HIPAA. The Arize Trust Center provides an interactive resource for customers and partners to understand the company's governance, policies, and security measures. Security at Arize is built on three pillars: auditability, prevention, and preparedness. These principles are inspired by sectors with effective risk management practices, such as the airline industry. The security periodic table offers an interactive overview of each element of security, along with compliance objectives and overlapping certifications and standards.
A founding engineer at Arize shares their journey and insights on what it takes to be effective in this role, which requires wearing multiple hats such as technical individual contributor, customer support, product management, project management, recruiting, and content marketing. The author, Manisha Sharma, joined Arize after having experience in frontend engineering and senior frontend engineering at Pandora and Slack, respectively, and found that the startup environment aligned closely with her engineering goals. She identifies seven essential elements of effective founding engineers, including humility to admit what they don't know, summoning grace amid chaos, taking ownership, thriving under ambiguity, knowing their strengths and weaknesses, making tradeoffs wisely, and trusting their fellow founders.
"Elizabeth Hutton, a lead machine learning engineer at Cisco's Webex Contact Center AI team, leads building in-house AI solutions from research to production. She has three patents pending and works on natural language processing (NLP) tasks such as question-answering and summarization. Her work relies on providing good customer experiences across billions of monthly calls. Hutton transitioned from academia to industry through Insight Data Science, a role that helped her prepare for interviews and understand the industry. She advises aspiring ML engineers to develop research experience, but notes that an advanced degree can be helpful for research roles. In her day-to-day work, Hutton is responsible for data gatekeeping, model development, and software production. Her team uses tools like Snorkel to label data and Weights & Biases for experimentation. Hutton prioritizes understanding end-requirements, such as latency and scale, when developing models for production. She emphasizes the importance of testing and evaluating models in the lab before deployment, using custom metrics and feedback collection to ensure model performance. Her team uses Checklist for unit testing language models and has a cloud-first platform that serves both cloud clients and on-premise users."
Jiazhen Zhu leads the end-to-end data team at Walmart Global Governance DSI, focusing on building a better platform through data-driven decisions and data-powered products. He oversees both data engineering and machine learning, giving him a unique vantage point into the interrelated worlds of DataOps, MLOps, and data science. Zhu emphasizes the importance of trust in AI models, suggesting that simpler models like linear regression are often preferable due to their easier explanations. He also highlights the significance of model explainability and monitoring for successful MLOps.
Arize AI has launched Bias Tracing, a tool designed to help enterprises identify and address algorithmic bias within machine learning models. The solution enables multidimensional comparisons, allowing teams to quickly uncover the features and cohorts contributing to potential biases without time-consuming SQL querying or troubleshooting workflows. Arize Bias Tracing helps data science and machine learning teams monitor and take action on model fairness metrics, ensuring that models do not perpetuate discrimination against marginalized groups. The tool is unique in its ability to provide multidimensional comparisons by default, enabling users to identify the feature-value combinations where parity across sensitive and base groups is most negatively impacting a model's overall fairness metrics.
In this article, Tsion Behailu shares her experience as a Founding Engineer at Arize after leaving Google. She discusses her journey from exhaustion during her undergraduate years to finding stability and growth at Google. However, she felt the need for a new experience and began soul-searching about what she truly wanted in her career and life. She considered various opportunities such as building an engineering stack from scratch, specializing in a specific technological area, or starting her own startup. Ultimately, she joined Arize after evaluating its potential for growth and the opportunity to work with a trusted colleague. Two years later, she is grateful for the decision that has made her a better engineer and leader.
Thomas Huang, a software engineer at LinkedIn, joined the company to work on machine learning (ML) infrastructure, which he believes is a crucial aspect of ML engineering teams' operations. He chose this role after realizing that his previous experience as a machine learning scientist was more aligned with data engineering and software engineering than actual machine learning work. Huang's new role involves working on LinkedIn's feature store, Feathr, which was recently open-sourced. The feature store allows for offline, online, and nearline operations, making it a comprehensive project. At LinkedIn, the company is using Feathr to improve its machine learning models, including those used in detecting abuse, ads, and people you may know features. Huang views the ML engineering role evolving over time, with the role blurring between data science, software engineering, and research. He believes that startups use the machine learning engineer title loosely and that roles can be outside of specific domains. In his previous role at Alectio, Huang worked on active learning as a service, which he found to be challenging due to its reliance on flawed premises. He advises students or others hoping to get into an ML engineering or ML platform type role to be patient, take alternative positions, and stay intellectually stimulated during the job search process.
The industry's largest event on machine learning observability, Arize:Observe, recently took place, featuring multiple tracks and talks from prominent companies such as Etsy, Kaggle, Opendoor, Spotify, and Uber, among others. The event highlighted key takeaways including the announcement of Arize's ML observability platform now available on a self-serve basis, including a free version. Several speakers emphasized the importance of scaling an ML practice by focusing on customer problems rather than just building a platform for its own sake. Machine learning infrastructure is complex and requires consideration beyond algorithms, with dependences on data layers and surrounding systems. Diversity is crucial in ML teams to improve accuracy, development, and retention, while AI ethics needs to be woven into the fabric of organizations from top to bottom. The industry is maturing, with tooling like Arize helping deploy models with confidence. The future holds promise for multimodal machine learning, including aligning different UMAP models and exploring new techniques such as Grumov Wasserstein distance. Overall, it's an exciting time for the AI industry, with global investment expected to reach $200 billion by 2023, and ML teams needing to invest in best practices and foundational investments in ML platforms to navigate the challenges of model issues impacting business results.
The text introduces Arize's ML Performance Tracing and highlights its benefits in enabling ML performance monitoring. It discusses how monitoring alone is not enough to resolve issues, and the need for full-stack observability with ML performance tracing. This helps detect and address problems before they significantly impact the company. The text also refers to a previous series on the evolution of ML troubleshooting, transitioning from no monitoring to monitoring, and now focusing on full stack ML observability.
Doris Lee, CEO and co-founder of Ponder, has recently secured $7 million in seed funding led by Lightspeed Venture Partners. The company aims to improve the usability and scalability of data science tools at scale. Ponder's founding team has made significant contributions to the open source community, including developing Lux, a visualization tool that automatically finds and displays insights from Pandas DataFrames. Another key project is Modin, which offers a more scalable version of Pandas without requiring users to change their code. The company focuses on making data science tools more accessible for professionals who are already familiar with Pandas, helping them scale up their analysis without needing to learn new frameworks or platforms.
Model bias, a systematic error from erroneous assumptions in machine learning algorithms, is a significant concern for AI developers and organizations using ML technology. It can lead to poor customer experience, profitability loss, or even fatal misdiagnoses if not addressed. To prevent biases at various stages of the machine learning pipeline, it's crucial to identify, assess, and address potential biases that may impact outcomes. Techniques for detecting and avoiding biases include data collection, pre-processing, feature engineering, data split/selection, model training, and model validation. By implementing best practices and using relevant examples at each stage of the pipeline, machine learning practitioners can reduce bias in their models and ensure more accurate predictions.
Flávio Clésio is a data and machine learning (ML) engineer based in Berlin, working at Artsy where he deploys and maintains models that recommend artwork to users. He has been involved with ML operations (MLOps) for nearly a decade and currently works on models incorporating art-specific inputs such as period of creation, region, style, medium, and category. Clésio emphasizes the importance of balancing popularity effects in recommendation systems and measuring their impact on revenue. He also highlights the challenges faced when putting ML models into production, including data drift, privacy concerns, and regulatory issues. Clésio believes that MLOps is a response to the need for cross-functionality among data scientists, data analysts, and analytics engineers, and encourages those starting out in the industry to study software engineering practices.
ML Troubleshooting Is Too Hard Today (But It Doesn’t Have To Be That Way) The stakes for model performance are higher than ever as teams deploy more models into production, and mistakes costlier. A modern approach to ML troubleshooting is needed, shifting from no monitoring to full stack ML observability. Monitoring, at its core, requires data on system performance, which must be made storable, accessible, and displayable. To monitor model performance, one must begin with a prediction and actual, comparing them using the right metric. The correct metric depends on the use case, such as recall, false negative rate, or mean absolute percentage error for fraud models, and mean squared error for demand forecasting models. Establishing thresholds is crucial to determine when a good accuracy rate has become bad enough. Machine learning practitioners must rely on relative metrics and establish a baseline performance to define what is considered good enough. Monitoring alone is not enough, as it's essential to have a modern approach to assessing and troubleshooting model performance, including full stack ML observability with ML performance tracing.
As Director of Engineering and Data Science at Shopify, Wendy Foster leads the development and deployment of sophisticated AI systems that empower millions of merchants worldwide to market and grow their retail businesses. Foster's background in game development and humanities informs her perspective on AI ethics, emphasizing the importance of understanding the impact of technology on users' lives. She prioritizes collaboration with business counterparts to ensure governance goals, responsible AI, and AI risk management. Foster also highlights the need for observability over explainability, arguing that accountability drives operational excellence. Her focus on representation is critical, recognizing that diverse makers and datasets are essential for building a world that technology serves. Ultimately, Foster's work is driven by her passion for empowering entrepreneurs and small businesses to thrive through AI-powered solutions.
Arize AI's recent survey of 945 data scientists, ML engineers, technical executives, and others highlights key challenges faced by MLOps teams. Troubleshooting model issues remains a significant problem for many, with 84.3% of respondents reporting delays in detecting and diagnosing problems at least some of the time. Additionally, communication between ML teams and business executives is often hindered, with over half of data scientists and ML engineers encountering issues with quantifying ROI or explaining machine learning concepts to stakeholders. While explainability remains important, it should not be relied upon solely; instead, a proactive approach to model performance management is recommended.
In the context of machine learning observability, "drift" refers to changes in the statistical properties of data or models over time. Model drift occurs when a model's predictions change without any modification to the underlying model itself. Concept drift is characterized by shifts in the statistical properties of the target variable, while data drift involves changes in the independent variables and their correlations. Upstream drift results from alterations in the data pipeline that can lead to missing values or changes in feature cardinality. Monitoring and diagnosing these various forms of drift are crucial for maintaining optimal model performance and mitigating future performance degradation. Arize is an ML observability platform designed to help teams manage model performance, monitor drift, and troubleshoot issues in production environments.
Stefan Kalb, CEO of Shelf Engine, discusses his company's mission to eliminate food waste and revolutionize the grocery business through AI-driven technology. Founded in 2016, Shelf Engine has diverted over 4.5 million pounds of food waste from landfills and helped clients achieve an average gross margin dollar expansion of more than 15%. The company's unique Results-as-a-Service (RaaS) model directly takes on inventory risk and guarantees outcomes for its customers, including major retailers like Kroger, Target, and Whole Foods. Kalb highlights the persistent problem of product availability, quality, and consistency in grocers due to supply chain disruptions and emphasizes Shelf Engine's ability to address these issues through AI-driven order automation. The company also tackles shrink discrepancies, which occur more than 75% of the time for large and small grocers, by providing accurate waste data that helps retailers understand their true shrink problems. In response to COVID-19's impact on businesses, Shelf Engine assists customers in adapting by relieving pressure on labor through AI-driven ordering systems. The company's success is attributed to its highly capable AI team and the unique advantage of capturing data from stores and vendors that helps inform accurate predictions for each store's performance. Lastly, Shelf Engine uses Arize AI for model monitoring and ML observability, which provides valuable insights into error metrics, lineage, versioning, and proactive detection of model drift. The company plans to scale its technology further in the coming year to work with more retailers and help them capture growth opportunities.
Aman Khan, Arize's newest product manager, brings experience in pioneering ML infrastructure and tooling from his roles at Cruise and Spotify. He will help drive product development of Arize's rapidly-growing ML observability platform, partnering closely with the marketing team to ensure clear customer communication and streamlining sales and customer success processes. With a background in mechanical engineering and a passion for building and coding, Aman has transitioned into an ML/PM role, advising companies on the build-versus-buy calculus for model monitoring and observability, highlighting the importance of load testing and scalability in real-time applications. He offers advice to those starting their careers, emphasizing the need to play to one's strengths and finding a supportive organization with leaders who enable growth.
AUC (Area Under the Receiver Operating Characteristic Curve) is a widely used metric in machine learning that measures the degree of separation between positive and negative classes in a dataset, calculated as the area under a staircase-like curve generated by varying threshold values for prediction scores. It's useful across various use-cases, particularly when models output scores, providing a single-number heuristic of how well a model can differentiate data points with true positive labels from those with true negative labels. AUC ranges from 0 to 1, where 1 indicates perfect separation and 0.5 suggests no separation, and it's often used in data science competitions and when accuracy is insufficient. However, AUC may not be the best metric for all problems, especially those involving probabilities or business outcomes, as it doesn't account for calibrated predicted probabilities or false positive rates. Ultimately, understanding the tradeoffs of using AUC and other model metrics is crucial for selecting the right metric to evaluate a model's performance in a specific context.
To ensure the long-term success and sustainability of AI initiatives, companies should focus on five key areas: diverse teams and representative datasets; ethical and risk governance frameworks for AI; modernized data policies granting access to protected data where needed; monitoring and troubleshooting ML model performance in real-world scenarios; and growing internal visibility, opening the black box, and quantifying AI ROI. By addressing these areas, companies can balance the power and potential peril of AI, maximizing positive outcomes for customers and society at large.
Remi Cattiau is the Chief Information Security Officer (CISO) at Arize AI, a company currently hiring over ten positions. With nearly two decades of experience in cloud security for large enterprises, Remi is responsible for ensuring high standards for Arize's security posture and safeguarding customer data. His career journey includes working with open source projects, leadership roles at startups, and consulting for companies on building out their cloud security. Remi believes that machine learning is trending within the security industry and highlights Arize's ability to help teams visualize and address problems in model monitoring and observability.
America First Credit Union is leveraging machine learning (ML) observability to stay ahead in a competitive market by prioritizing speed and model monitoring. The credit union's Data Science Manager, Richard Woolston, emphasizes the importance of identifying proxy metrics such as drift, delinquency, and fair lending regulations to ensure portfolio health and mitigate bias. Arize AI's platform helps America First Credit Union troubleshoot issues and make data-driven decisions by providing automated alerts, feature slicing, and collaboration with product teams. As ML models become increasingly prevalent in lending, Woolston envisions a future where access to credit becomes easier for traditionally excluded groups and regulations become more streamlined through self-documenting systems.
Customer lifetime value (LTV) is a crucial metric to evaluate a company's overall sales motion, especially in non-contractual sectors like consumer packaged goods or retail. LTV models predict future purchasing behavior and help increase profitability by identifying valuable customers. These models use machine learning algorithms to analyze patterns of engagement based on predictions. Monitoring and observability are essential for LTV models as they must iterate and quickly estimate long-term value with delayed or no ground truth data. ML observability platforms should set up baseline monitors, evaluate feature, model, and actual/ground truth drift, and measure model performance to improve overall business outcomes.