Skip to the content.

From 27 items, 10 important content pieces were selected


  1. Nature study reveals model distillation can implicitly transfer behavioral traits via unrelated data ⭐️ 9.0/10
  2. Zero-shot World Model Achieves Human-Scale Data Efficiency in Visual Learning ⭐️ 8.0/10
  3. Wasserstein metric fixes tensor drift in quantized Qwen3.6-35B-A3B GGUF models ⭐️ 8.0/10
  4. xAI to launch Grok Build and Grok CLI next week, entering AI programming agent market ⭐️ 8.0/10
  5. Electromechanical angle computer in B-52 bomber’s star tracker detailed ⭐️ 7.0/10
  6. Qwen3.6-35B-A3B solves coding problems that Qwen3.5-27B couldn’t ⭐️ 7.0/10
  7. Qwen3.6 shows real performance jump with proper configuration like ‘preserve_thinking’ ⭐️ 7.0/10
  8. RTX 5070 Ti achieves 79 t/s with Qwen3.6-35B-A3B using –n-cpu-moe optimization ⭐️ 7.0/10
  9. Apple App Store fraud apps surge, undermining security defense claims ⭐️ 7.0/10
  10. Failed Companies Selling Slack Chats and Emails for AI Training ⭐️ 7.0/10

Nature study reveals model distillation can implicitly transfer behavioral traits via unrelated data ⭐️ 9.0/10

A study published in Nature found that during model distillation, student models can inherit preferences or misaligned behaviors from teacher models even when trained on unrelated data like number sequences, code, or mathematical reasoning. This ‘implicit learning’ is more pronounced when teacher and student models share or highly match underlying architectures. This discovery has significant implications for AI safety assessment, as it suggests that behavioral traits can be transferred implicitly without direct exposure to problematic data, potentially bypassing traditional safety checks. It highlights the need for more robust evaluation methods that track model origins and training data sources beyond just output analysis. The research indicates that implicit transfer occurs even with unrelated training data, such as numerical or code-based inputs, which are typically considered neutral. It also notes that this effect is stronger when models have similar or shared base architectures, suggesting architectural alignment plays a role in the phenomenon.

telegram ¡ zaihuapd ¡ Apr 18, 09:07

Background: Model distillation is a machine learning technique where a smaller ‘student’ model learns to mimic the behavior of a larger ‘teacher’ model, often to improve efficiency or reduce computational costs. Implicit learning refers to the ability of neural networks to acquire patterns or behaviors without explicit instruction, which can include unintended traits. AI safety assessment involves evaluating models for alignment and risks, but current methods may not fully account for implicit transfers during distillation.

References

Tags: #AI Safety, #Model Distillation, #Machine Learning, #Research, #Neural Networks


Zero-shot World Model Achieves Human-Scale Data Efficiency in Visual Learning ⭐️ 8.0/10

The paper introduces the Zero-shot World Model (ZWM), which, when trained on a single child’s visual experience (BabyZWM dataset), matches state-of-the-art performance on diverse visual-cognitive tasks without any task-specific training, achieving zero-shot learning comparable to human developmental efficiency. This work significantly narrows the data efficiency gap between AI and human learning, offering a blueprint for building more flexible and efficient AI systems that can learn from limited, human-scale data, which could reduce computational costs and accelerate applications in robotics, autonomous systems, and cognitive AI. ZWM is based on three design principles: temporally-factored prediction to separate appearance from dynamics, zero-shot extraction of visual-cognitive structures via approximate causal inference, and composition of extractors for complex inferences, with code and datasets planned for release by May 2026.

reddit ¡ r/MachineLearning ¡ FaeriaManic ¡ Apr 18, 00:58

Background: Zero-shot learning in AI refers to models performing tasks without specific training examples, often using general knowledge from pretraining. World models are AI systems that simulate environments to predict outcomes, commonly used in reinforcement learning for planning. Visual-cognitive tasks assess AI abilities in perception and reasoning, such as object recognition or scene understanding, mimicking human cognitive processes.

References

Tags: #AI, #Machine Learning, #Zero-shot Learning, #Computer Vision, #Data Efficiency


Wasserstein metric fixes tensor drift in quantized Qwen3.6-35B-A3B GGUF models ⭐️ 8.0/10

A Reddit user discovered and fixed tensor drift in the SSM layers of quantized Qwen3.6-35B-A3B GGUF models using the Wasserstein metric (W1), which proved more effective than Kullback-Leibler divergence for detecting numerical instability. The fix specifically addresses three ssm_conv1d.weight layers (blocks 36-38) and a corrected model has been published on Hugging Face. This matters because tensor drift in quantized models can degrade performance and reliability, particularly in SSM layers crucial for long-context memory in state-space models. The Wasserstein-based approach offers a more robust debugging tool for quantization issues, potentially improving stability across various quantized LLMs and benefiting developers working with resource-constrained deployments. The fix reduced Wasserstein distance (W1) values from 0.0038-0.0040 to 0.0006-0.0009 for the affected layers, while other tensors remained healthy. The corrected model is based on HauhauCS’s Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive version and recommends Q4_K_P quantization for optimal performance.

reddit ¡ r/LocalLLaMA ¡ EvilEnginer ¡ Apr 18, 16:42

Background: GGUF (GGML Universal Format) is a file format for running large language models on consumer hardware by compressing weights to reduce file size and speed up inference, though quantization can introduce numerical instability. State-space models (SSMs) are architectures used in some transformers for handling long-context sequences, with SSM layers like ssm_conv1d managing state transitions. The Wasserstein metric (W1) is a mathematical tool for comparing probability distributions, often applied in machine learning to measure differences between data distributions or model outputs.

References

Tags: #Quantization, #Numerical Stability, #LLM Optimization, #Wasserstein Metric, #Model Debugging


xAI to launch Grok Build and Grok CLI next week, entering AI programming agent market ⭐️ 8.0/10

xAI plans to release Grok Build and Grok CLI next week, featuring multi-agent collaboration modes like parallel and arena, and a desktop client called Grok Computer for third-party integration. The company has also opened early access to Grok 4.3 for Grok Heavy subscribers, enhancing frontend performance as a core foundation for these tools. This move positions xAI to compete in the growing AI programming agent market, potentially accelerating software development through advanced multi-agent collaboration and local-first security. It could impact developers and enterprises by offering integrated tools that enhance coding efficiency and system interoperability. Grok Build is a local-first CLI coding agent that runs securely on users’ machines, supporting 8 parallel AI agents and Arena Mode, powered by grok-code-fast-1 with 70.8% SWE-Bench Verified and 256K context. Grok Computer adds a connector layer to extend agent capabilities to third-party services and system layers.

telegram ¡ zaihuapd ¡ Apr 18, 05:40

Background: xAI is an AI company known for developing Grok, an AI assistant that can chat, create images, write code, and provide real-time answers. AI programming agents are tools that automate coding tasks using artificial intelligence, often leveraging large language models. Multi-agent collaboration involves multiple AI agents working together through communication protocols to solve complex problems, such as in software development.

References

Tags: #AI Programming, #xAI, #Grok Build, #Multi-Agent Systems, #Software Development


Electromechanical angle computer in B-52 bomber’s star tracker detailed ⭐️ 7.0/10

A technical article explores the electromechanical angle computer used in the B-52 bomber’s star tracker navigation system, detailing its mechanical computation principles and historical context. The article explains how this device performed celestial navigation calculations through electromechanical means rather than purely electronic or digital methods. This analysis matters because it preserves knowledge of historical electromechanical computing systems that were critical for military navigation before digital computers became dominant. Understanding these electromechanical systems provides insight into the engineering challenges and solutions of mid-20th century aviation technology, showing how complex calculations were performed with mechanical precision. The star tracker’s angle computer used mechanical components to perform calculations for celestial navigation, with the system capable of operating in both Northern and Southern hemispheres. Notable details include the Astro Compass’s spiral search pattern covering ±4° in bearing and ±2.5° in altitude, and the system’s declination limits of +90° to -47° with altitude limits down to -6°.

hackernews ¡ NelsonMinar ¡ Apr 18, 16:26

Background: The B-52 Stratofortress is an American long-range strategic bomber that has been in service since the 1950s, requiring sophisticated navigation systems for its missions. Electromechanical computers like the Z3 (completed in 1941) represent early computing technology that combined electrical inputs with mechanical calculation mechanisms. Star trackers are celestial navigation systems that use stars as reference points to determine an aircraft’s position and orientation, particularly important for long-range bombers operating beyond terrestrial navigation aids.

References

Discussion: Community comments express admiration for the engineering sophistication of historical electromechanical systems, with users noting connections to naval fire control technology and appreciating specific technical details about the star tracker’s search patterns. Several commenters expressed inspiration from seeing how engineers solved complex problems with mechanical solutions before digital computing became widespread.

Tags: #electromechanical-computing, #aviation-history, #military-technology, #retrocomputing, #navigation-systems


Qwen3.6-35B-A3B solves coding problems that Qwen3.5-27B couldn’t ⭐️ 7.0/10

A developer reported that the Qwen3.6-35B-A3B model successfully solved coding problems and bugs that the Qwen3.5-27B model failed to handle during the development of a budgeting application, often requiring only one or two attempts. This improvement was observed after the developer tested the new model against specific issues where the previous model had reached its limits. This demonstrates tangible progress in AI-assisted coding, showing that newer, larger models like Qwen3.6-35B-A3B can overcome limitations of previous versions, potentially reducing technical debt and improving development efficiency for software projects. It highlights the ongoing competition in the local LLM space, where incremental model improvements directly impact practical applications like app development. The comparison was based on real-world testing with a budgeting app that involved features like expense tracking, dynamic budgets, and bank account integration. The Qwen3.5-27B model was used in a Q4_K_M quantized version, which may affect performance compared to the full-precision Qwen3.6-35B-A3B model, though both are designed for local deployment.

reddit ¡ r/LocalLLaMA ¡ simracerman ¡ Apr 18, 13:40

Background: Qwen3.6 and Qwen3.5 are large language model series developed by Alibaba’s Qwen team, with versions like 27B and 35B indicating the number of parameters (billions). Quantization methods like Q4_K_M reduce model size for efficient local deployment by compressing weights to 4-bit precision, though this can impact accuracy. The LocalLLaMA community on Reddit focuses on discussing and comparing locally runnable AI models for practical applications like coding.

References

Tags: #AI Models, #Coding, #LocalLLaMA, #Software Development, #Model Comparison


Qwen3.6 shows real performance jump with proper configuration like ‘preserve_thinking’ ⭐️ 7.0/10

A user confirms that Qwen3.6 demonstrates a notable performance improvement for demanding workloads, comparable to high-end models like Opus and Codex, when properly configured with settings such as enabling ‘preserve_thinking’. The model runs efficiently on hardware like an M5 Max with 128GB RAM, using 8-bit quantization and 3K PP, achieving high speed on oMLX + Pi.dev. This matters because it highlights Qwen3.6’s potential as a cost-effective, local alternative to proprietary models, making advanced AI more accessible for developers and researchers. The emphasis on configuration underscores the importance of optimization for maximizing performance in real-world applications, which could accelerate adoption in the AI/ML community. The performance boost is specifically noted when ‘preserve_thinking’ is enabled, a configuration parameter that helps retain reasoning tokens for efficiency. The user tested on an M5 Max with 128GB RAM, using 8-bit quantization and 3K PP, achieving 100 TG on oMLX + Pi.dev, indicating robust hardware compatibility and speed.

reddit ¡ r/LocalLLaMA ¡ onil_gova ¡ Apr 18, 06:37

Background: Qwen3.6 is a large language model series developed by Alibaba Group’s Qwen team, offering multimodal hybrid-thinking capabilities with support for 256K context across 201 languages. ‘Preserve_thinking’ is a configuration parameter related to retaining reasoning content, similar to features in models like Claude 4.5, which can enhance cache efficiency and token management. oMLX is a framework, but in this context, it likely refers to a tool or environment for running AI models, though specific details are not provided in the search results.

References

Tags: #AI, #LLM, #LocalLLaMA, #Performance, #Configuration


RTX 5070 Ti achieves 79 t/s with Qwen3.6-35B-A3B using –n-cpu-moe optimization ⭐️ 7.0/10

A user demonstrated running the Qwen3.6-35B-A3B MoE model at 79 tokens per second with 128K context length on consumer hardware (RTX 5070 Ti + Ryzen 9800X3D), achieving a 54% speed improvement over the standard –cpu-moe approach by using the –n-cpu-moe 20 flag in llama.cpp. This optimization makes large MoE models more accessible on consumer hardware by significantly improving VRAM utilization and inference speed, enabling users with 16GB GPUs to run 22GB+ models efficiently without expensive upgrades. The –n-cpu-moe 20 flag keeps experts from the first 20 layers on CPU while placing the remaining layers on GPU, utilizing 12.7GB of VRAM compared to only 3.5GB with –cpu-moe, and the 128K context length was achieved with minimal performance penalty using the -np flag.

reddit ¡ r/LocalLLaMA ¡ marlang ¡ Apr 18, 07:40

Background: Mixture of Experts (MoE) models divide neural networks into specialized ‘expert’ sub-networks that activate selectively, reducing computational costs while maintaining large parameter counts. GGUF is a quantization format designed for efficient LLM deployment on consumer hardware, and llama.cpp is an open-source inference framework that supports various optimization flags for running models locally.

References

Tags: #LocalLLM, #Hardware Optimization, #MoE Models, #llama.cpp, #AI Performance


Apple App Store fraud apps surge, undermining security defense claims ⭐️ 7.0/10

Apple’s App Store review mechanism has been criticized for serious vulnerabilities, allowing numerous fraudulent apps to bypass oversight. A recent fake cryptocurrency app defrauded users of approximately $9.5 million before being removed, with Apple earning between $1.425 million and $2.85 million in commissions from it. This surge in fraudulent apps undermines Apple’s core argument for maintaining the App Store’s monopoly position, which relies heavily on claims of protecting user security to justify its commission structure and resist third-party stores. The situation increases legal pressure on Apple during global antitrust investigations and raises concerns about the effectiveness of its security promises. In Q1 2026, App Store app submissions increased by 84% year-over-year to 235,800, but the review team size reportedly did not grow proportionally, leading to frequent cases of ‘approve first, change later’ and impersonation apps. Apple has primarily responded to security concerns with repetitive public relations statements rather than substantive corrective measures.

telegram ¡ zaihuapd ¡ Apr 18, 03:25

Background: Apple’s App Store review process is designed to evaluate all apps, updates, and in-app purchases to ensure a safe and trusted user experience, as outlined in its official guidelines. iOS employs app sandboxing as a key security feature, isolating each app in a secure environment to protect user data from malicious or poorly written apps. Notarization, an automated security scanning process for macOS software, differs from the manual App Store review, which focuses on broader compliance including safety, performance, and business aspects.

References

Tags: #App Security, #Platform Governance, #Antitrust, #Mobile Ecosystems, #Fraud Prevention


Failed Companies Selling Slack Chats and Emails for AI Training ⭐️ 7.0/10

Some failed or bankrupt tech companies are selling their archived Slack chat logs and email archives to be used as training data for AI models. This practice has raised concerns among privacy experts as the data may contain employee personal information, trade secrets, and customer sensitive data. This development highlights how the growing demand for AI training data is extending into corporate historical assets, creating new privacy and security risks. It raises important questions about data ownership, corporate ethics, and legal compliance when companies liquidate their digital assets. The exact scale and pricing of these data transactions remain unclear, but they represent a secondary market for corporate communications data. Slack allows organizations to customize data retention settings, but once data is archived, its disposition during liquidation may not be clearly regulated.

telegram ¡ zaihuapd ¡ Apr 18, 14:55

Background: Large language models (LLMs) require vast amounts of text data for training, typically sourced from publicly available internet content. Slack is a widely used workplace communication platform where companies can customize data retention policies to control how long messages and files are stored. When companies go bankrupt, their assets including digital data may be liquidated through asset sales to pay creditors, following established liquidation processes.

References

Tags: #AI Ethics, #Data Privacy, #Corporate Data, #AI Training, #Technology News