32 private links
Statistical arbitrage is a prevalent trading strategy which takes advantage of mean reverse property of spread of paired stocks. Studies on this strategy often rely heavily on model assumption. In...
We are very excited to finally share a bit more about what we are building: a full-stack hardware platform to harness the natural fluctuations of matter as a computational resource for Generative AI.
Advanced Strategy to Account for Correlations, Risk, and Returns in your Portfolio Leveraging Hierarchical Structures
What are ML artifacts?
Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
To better control for risk, we construct a novel machine learning based value factor and find that it outperforms existing value factors while earning less from risk and more from mispricings.
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
We document return predictability from deep-learning models that cannot be explained by common risk factors or limits to arbitrage.
statistical arbitrage portfolios with graph clustering algorithms
Our online approach requires less memory as data is processed continuously. Moreover, our network learns from each data sample only once, significantly reducing energy use and making the process highly efficient.
"Unlike in CV and NLP, the field of time series lacks publicly accessible large-scale datasets."
The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”
With a new Fill-in-the-Middle paradigm, GitHub engineers improved the way GitHub Copilot contextualizes your code. By continuing to develop and test advanced retrieval algorithms, they’re working on making our AI tool even more advanced.
Source Latent Space Podcast Ep. 2: Why you are holding your GPUs wrong OpenAI just rollicked the AI world yet again yesterday — while releasing the long awaited ChatGPT API, they also priced it at $2 per million tokens generated, which is 90% cheaper than the text-davinci-003 pricing of the “GPT3.5” family. Their blogpost on how they did it is vague: Through a series