14个高性能Java持久性技巧

文章索引目录

猪猪最近在研究数据库持久层的性能优化,做了大量的知识储备,今天分享一篇著名的flexy-pool作者在2019年1月22日更新的该领域的技术文章。我们先看译文,再看英文原文。翻译不当的地方,大家可以直接看英文原文哈。

一个高性能的数据访问层需要大量关于数据库内部、JDBC、JPA、Hibernate的知识,本文总结了一些可用来优化企业应用程序的重要的技术。

1. SQL语句日志

如果您用了生成符合自己使用习惯的语句的框架,则应始终验证每个语句的有效性和效率。测试时使用断言机制验证更好,因为即使在提交代码之前,也可以捕获N + 1个查询问题。

2.连接管理

数据库的连接开销非常大,因此您应该始终使用连接池机制。

由于连接数由底层数据库集群的功能给出,所以您需要尽可能快地释放连接。

在性能调优中,你总是要测量、设置出正确的连接池,池的大小又是差不多的。 但像FlexyPool这样工具可以帮助您找到合适的大小,即使您已经将应用程序部署到生产环境中。

3.JDBC批处理

JDBC批处理允许我们在单个数据库往返中发送多个SQL语句。性能增益在驱动程序和数据库端都很重要。PreparedStatements 非常适合批处理,而某些数据库系统(例如 Oracle)仅支持用于预处理语句的批处理。

由于JDBC为批处理定义了独特的API(例如PreparedStatement.addBatch和PreparedStatement.executeBatch),如果您手动生成语句,那么您应该从一开始就知道是否应该使用批处理。 使用Hibernate,您可以切换到使用单个配置的批处理。

Hibernate 5.2 提供了会话级别的批处理,所以在这方面更加灵活。

4.语句缓存

语句缓存是您可以轻松利用的最鲜为人知的性能优化之一。 根据基础的JDBC驱动程序,可以在客户端(驱动程序)或数据库端(语法树甚至执行计划)上缓存PreparedStatements。

5.Hibernate 标识符

当使用Hibernate时,IDENTITY生成器不是一个好的选择,因为它禁用了JDBC批处理。

TABLE生成器更糟糕,因为它使用一个单独的事务来获取新的标识符,这会对底层事务日志以及连接池造成压力,因为每次我们需要一个新的标识符时都需要单独的连接。

SEQUENCE是正确的选择,甚至从2012版本就开始支持SQL Server。对于SEQUENCE标识符,Hibernate一直提供优化器,如 pooled 或 pooled-lo,这可以减少获取新的实体标识符值所需的数据库往返次数。

6.选择正确的列类型

您应该始终在数据库端使用正确的列类型。 列类型越紧凑,数据库工作集中可容纳的条目越多,索引将更好地适应于内存。 为此,您应该利用特定于数据库的类型(例如PostgreSQL中的IPv4地址的inet),尤其是在实现新自定义类型时,Hibernate非常灵活。

7 .关系

Hibernate 带有许多关系映射类型,但并不是所有的关系映射类型在效率上都是相等的。

应该避免单向集合和 @ManyToMany 列表。如果您确实需要使用实体集合,则首选双向 @OneToMany关联。对于 @ManyToMany 关系,使用 Set(s),因为在这种情况下它们更高效,或者简单地映射链接的多对多表,并将 @ManyToMany 关系转换为两个双向的 @OneToMany 关联。

然而,与查询不同,集合不够灵活,因为它们不易分页,这意味着当子关联的数量相当高时,我们不能使用它们。出于这个原因,你应该考虑一个集合是否真的有必要。 在许多情况下,实体查询可能是更好的选择。

8.继承

就继承而言,面向对象语言和关系数据库之间的不匹配变得更加明显。 JPA提供了SINGLE_TABLE,JOINED和TABLE_PER_CLASS来处理继承映射,每个策略都有其优缺点。

SINGLE_TABLE在SQL语句方面表现最好,但由于我们不能使用NOT NULL约束,所以我们在数据完整性方面失败了。

当同时提供更复杂的语句时,JOINED采用数据完整性限制。 只要你不使用基本类型的多态查询或@OneToMany关联,这个策略就没有问题。 它的真正的作用在于对数据访问层上由策略模式支持的多态@ManyToOne关联。

应该避免使用TABLE_PER_CLASS,因为它不会生成有效的SQL语句。

9.持久性上下文的大小

在使用 JPA 和 Hibernate 时,应该始终关注持久性上下文的大小。 出于这个原因,您不应该过多地使用托管实体。 通过限制托管实体的数量,我们可以获得更好的内存管理,并且默认的检查机制也将更加高效。

10.只抓取必要的东西

获取太多的数据可能是导致数据访问层性能出问题的首要原因。 一个问题是,即使是只读的 Projections,实体查询也是专用的。

DTO projections更适合于获取自定义视图,而实体只能在业务流需要修改时才能获取。

EAGER抓取是最糟糕的,您应该避免反模式(Anti-Pattern),例如 Open-Session in View。

11.高速缓存

关系数据库系统使用许多内存缓冲区结构来避免磁盘访问。 数据库缓存经常被忽视。 我们可以通过适当调整数据库引擎来显着降低响应时间,以便工作集驻留在内存中,而不是一直从磁盘中获取。

应用程序级缓存对于许多企业应用程序来说是不可选的。 应用程序级缓存可以减少响应时间,同时为数据库关闭以进行维护或由于某些严重系统故障提供只读辅助存储库。

二级缓存对于减少读写事务响应时间非常有用,特别是在主从复制体系结构中。 根据应用程序的要求,Hibernate允许你在READ_ONLY,NONSTRICT_READ_WRITE,READ_WRITE和TRANSACTIONAL之间进行选择。

12.并发控制

在性能和数据完整性方面,事务隔离级别的选择是非常重要的。 对于多请求Web流程,为避免丢失更新,您应该对分离的实体或 EXTENDED 持久性上下文使用 optimistic 锁定。

为避免optimistic locking误报,您可以使用无版本 optimistic 并发控制或基于读写的属性集来拆分实体。

13.释放数据库查询功能

仅仅因为您使用JPA或Hibernate,并不意味着您不应该使用原生查询。 您应该利用窗口函数,CTE(公用表表达式),CONNECT BY,PIVOT 查询。

这些构造允许您避免获取太多的数据,以便稍后在应用程序层进行转换。 如果可以让数据库进行处理,那么只能获取最终结果,因此可以节省大量的磁盘I / O和网络开销。 为避免主节点重载,可以使用数据库复制和拥有多个从属节点,这样数据密集型的任务就会在从属节点而不是主节点上执行。

14.横向扩展和纵向扩展

关系数据库的伸缩性非常好。如果Facebook、Twitter、Pinterest或StackOverflow可以扩展他们的数据库系统,那么很有可能您可以将企业应用程序扩展到其特定的业务需求。

数据库复制和分片是提高吞吐量的很好的方法,您应该完全可以利用这些经过测试的架构模式来扩展您的企业应用程序。

结论

高性能数据访问层必须与底层数据库系统互相响应。 了解关系数据库和正在使用的数据访问框架的内部工作原理可以使企业高性能应用程序和几乎没有crawls的应用程序之间产生差异。

原文:https://vladmihalcea.com/2016/06/28/14-high-performance-java-persistence-tips/

14 High-Performance Java Persistence Tips

(Last Updated On: January 22, 2019)

Introduction

A high-performance data access layer requires a lot of knowledge about database internals, JDBC, JPA, Hibernate, and this post summarizes some of the most important techniques you can use to optimize your enterprise application.

1. SQL statement logging

If you’re using a framework that generates statements on your behalf, you should always validate each statement effectiveness and efficiency. A testing-time assertion mechanism is even better because you can catch N+1 query problems even before you commit your code.

2. Connection management

Database connections are expensive, therefore you should always use a connection pooling mechanism.

Because the number of connections is given by the capabilities of the underlying database cluster, you need to release connections as fast as possible.

In performance tuning, you always have to measure, and setting the right pool size is no different. A tool like FlexyPool can help you find the right size even after you deployed your application into production.

3. JDBC batching

JDBC batching allows us to send multiple SQL statements in a single database roundtrip. The performance gain is significant both on the Driver and the database side. PreparedStatements are very good candidates for batching, and some database systems (e.g. Oracle) support batching only for prepared statements only.

Since JDBC defines a distinct API for batching (e.g. PreparedStatement.addBatch and PreparedStatement.executeBatch), if you’re generating statements manually, then you should know right from the start whether you should be using batching or not. With Hibernate, you can switch to batching with a single configuration.

Hibernate 5.2 offers Session-level batching, so it’s even more flexibile in this regard.

4. Statement caching

Statement caching is one of the least-known performance optimization that you can easily take advantage of. Depending on the underlying JDBC Driver, you can cache PreparedStatements both on the client-side (the Driver) or databases-side (either the syntax tree or even the execution plan).

5. Hibernate identifiers

When using Hibernate, the IDENTITY generator is not a good choice since it disables JDBC batching.

TABLE generator is even worse since it uses a separate transaction for fetching a new identifier, which can put pressure on the underlying transaction log, as well as the connection pool since a separate connection is required every time we need a new identifier.

SEQUENCE is the right choice, and even SQL Server supports since version 2012. For SEQUENCEidentifiers, Hibernate has long been offering optimizers like pooled or pooled-lo which can reduce the number of database roundtrips required for fetching a new entity identifier value.

6. Choosing the right column types

You should always use the right column types on the database side. The more compact the column type is, the more entries can be accommodated in the database working set, and indexes will better fit into memory. For this purpose, you should take advantage of database-specific types (e.g. inet for IPv4 addresses in PostgreSQL), especially since Hibernate is very flexible when it comes to implementing a new custom Type.

7. Relationships

Hibernate comes with many relationship mapping types, but not all of them are equal in terms of efficiency.

Unidirectional collections and @ManyToMany List(s) should be avoided. If you really need to use entity collections, then bidirectional @OneToMany associations are preferred. For the @ManyToManyrelationship, use Set(s) since they are more efficient in this case or simply map the linked many-to-many table as well and turn the @ManyToMany relationship into two bidirectional @OneToMany associations.

However, unlike queries, collections are less flexible since they cannot be easily paginated, meaning that we cannot use them when the number of child associations is rather high. For this reason, you should always question if a collection is really necessary. An entity query might be a better alternative in many situations.

8. Inheritance

When it comes to inheritance, the impedance mismatch between object-oriented languages and relational databases becomes even more apparent. JPA offers SINGLE_TABLEJOINED, and TABLE_PER_CLASS to deal with inheritance mapping, and each of these strategies has pluses and minuses.

SINGLE_TABLE performs the best in terms of SQL statements, but we lose on the data integrity side since we cannot use NOT NULL constraints.

JOINED addresses the data integrity limitation while offering more complex statements. As long as you don’t use polymorphic queries or @OneToMany associations against base types, this strategy is fine. Its true power comes from polymorphic @ManyToOne associations backed by a Strategy pattern on the data access layer side.

TABLE_PER_CLASS should be avoided since it does not render efficient SQL statements.

9. Persistence Context size

When using JPA and Hibernate, you should always mind the Persistence Context size. For this reason, you should never bloat it with tons of managed entities. By restricting the number of managed entities, we gain better memory management, and the default dirty checking mechanism is going to be more efficient as well.

10. Fetching only what’s necessary

Fetching too much data is probably the number one cause for data access layer performance issues. One issue is that entity queries are used exclusively, even for read-only projections.

DTO projections are better suited for fetching custom views, while entities should only be fetched when the business flow requires to modify them.

EAGER fetching is the worst, and you should avoid anti-patterns such as Open-Session in View.

11. Caching

Relational database systems use many in-memory buffer structures to avoid disk access. Database caching is very often overlooked. We can lower response time significantly by properly tuning the database engine so that the working set resides in memory and is not fetched from disk all the time.

Application-level caching is not optional for many enterprise application. Application-level caching can reduce response time while offering a read-only secondary store for when the database is down for maintenance or because of some serious system failure.

The second-level cache is very useful for reducing read-write transaction response time, especially in Master-Slave replication architectures. Depending on application requirements, Hibernate allows you to choose between READ_ONLY, NONSTRICT_READ_WRITE, READ_WRITE, and TRANSACTIONAL.

12. Concurrency control

The choice of transaction isolation level is of paramount importance when it comes to performance and data integrity. For multi-request web flows, to avoid lost updates, you should use optimistic locking with detached entities or an EXTENDED Persistence Context.

To avoid optimistic locking false positives, you can use versionless optimistic concurrency control or split entities based write-based property sets.

13. Unleash database query capabilities

Just because you use JPA or Hibernate, it does not mean that you should not use native queries. You should take advantage of Window Functions, CTE (Common Table Expressions), CONNECT BYPIVOT.

These constructs allow you to avoid fetching too much data just to transform it later in the application layer. If you can let the database do the processing, you can fetch just the end result, therefore, saving lots of disk I/O and networking overhead. To avoid overloading the Master node, you can use database replication and have multiple Slave nodes available so that data-intensive tasks are executed on a Slave rather than on the Master.

14. Scale up and scale out

Relational databases do scale very well. If Facebook, Twitter, Pinterest or StackOverflow can scale their database system, there is good chance you can scale an enterprise application to its particular business requirements.

Database replication and sharding are very good ways to increase throughput, and you should totally take advantage of these battle-tested architectural patterns to scale your enterprise application.

发表评论