Working with sparse, high-dimensional graphs can be challenging in machine learning. Traditional ML architectures can only learn weights for a limited number of features, and even highly flexible neural networks can become overburdened if there isn't enough data. We'll discuss neural graph embeddings, used extensively in unsupervised dimensionality reduction of large graph networks. They can also be used as a transfer learning layer in other ML tasks. We'll explain our proposed architecture, which is highly scalable while keeping node types in their own distinct embedding space. Our approach involves a massively scalable architecture for distributed training in which the graph is distributed across a number of computational nodes that can fetch and update parameters of the embedding space. The parameters are served from several servers that scale with the number of parameters. We'll share our preliminary results showing training acceleration and reduced time to convergence.